A Sight for Sore Eyes: Computer Vision Startups
Computer Vision startups wield great promise, mainly in aerial and geospatial image collection and photo tagging - the real winners are high tech startups facilitating the growth of CV technologies
Farmers in the state of Mato Grosso, Brazil (twice the size of California) - whose land includes parts of the Amazon rainforest - are faced with an unfortunate challenge to their livelihoods. As a response to pressure from environmental conservationists, the Brazilian government has asked that these farmers keep a significant share of their land (over 20%) as intact forest. Deforestation has decreased considerably as a result, but as one farmer, Ilson Redivo, puts it, “Who is going to leave (most) of their land covered in trees? No one’s going to do that.” They lose $20-40 per acre per year in forfeited crops if they do. To address this problem, players in this Brazilian agriculture ecosystem have toyed with a couple mitigating strategies…
One solution is for the farmers to continue clearing their land, and instead purchase more forest hundreds of miles away to stay in compliance and leave it covered in trees. A farmer employing this strategy has set up cameras to watch over the land - looking out for jaguars and trespassers. Another strategy being investigated by Sao Paulo-based private company Carbonex is to leverage satellite imagery and data around tree dimensions to map out the farmers’ carbon footprint, creating a carbon market for them to receive financial credits in turn for maintaining forest. Both solutions are meant to alleviate the farmers’ burden.
It turns out there’s potential for computer vision to facilitate both these approaches. To the extent that more farmers buy and monitor distant forest-covered acreage, computer vision-assisted cameras can save these farmers hours of inspecting footage and monitoring costs. Furthermore, computer vision technologies pioneered by Mobius Labs can hasten the Carbonex process of data collection - we’ll later discuss how technology pioneered by Mobius elevates speed of visual data processing.
In this week’s post, we’re going to distill some of the key emergent opportunities for entrepreneurs in the computer vision space. Here’s our agenda:
I. What exactly is computer vision?
II. Applications of computer vision
III. Case Study: Activeloop and Mobius Labs
I. What exactly is computer vision?
To understand computer vision, we need to first understand what artificial intelligence is. Artificial intelligence is a science, like biology or physics, that is focused on building machines and computer programs that can creatively solve problems usually reserved for humans. AI programmers study how computers can dynamically respond to changing environments, how they can process human-like language and behavior (for example, Google’s Babel project to translate human languages), and how they can simulate human creativity (if a robot can do pottery, for example). The scope of AI is broad. It includes knowledge graphs and other symbolic logic, which fall under AI but don’t qualify as machine learning.
Machine learning is a sub-field of AI, focused on teaching computers how to learn without the need to be programmed for those specific tasks. In response to more data (training data), ML algorithms dynamically modify their strategy (unlike knowledge graphs, which are static). Machine learning is at the root of various technologies we are familiar with - from route guidance at Uber to content discovery at Pinterest. Some computer scientists focus on neural networks and deep learning, which constitute a class of learning models within machine learning.
Now computer vision has been variously described as both a subfield of machine learning and alternatively, a subfield of AI bearing significant overlap with machine learning. We’ll opt for the latter to cover any side cases, but we’re not interested in jumping too deeply into that semantic argument; for our purposes, an entrepreneur selling computer vision capabilities assumes that her customer is interested in automating the following processes:
Image capture: with an imaging device under optimal lighting
Image processing: includes object identification, segmentation, and sometimes billing (e.g. automated grocery checkout), often times based on ML algorithms and training data
Decision-making: Categorization, recommendation, prediction
60% of the world’s data consists of images and videos - it’s time to analyze it. As one Forbes columnist and venture capitalist put it, computer vision is “the automation of human sight.” Startups in the computer vision space sell their wares into all sorts of sectors (agriculture, medicine, forestry, etc.), and we’ll look at these applications of computer vision more deeply in the next section.
II. Applications of computer vision
In this afore-linked article, VC Rob Toews describes computer vision startups in various segments of the economy (agriculture, retail, insurance, construction and security). Here I’ll focus on why I’m bullish on drone-based applications in agriculture and mining; and why I’m not as bullish on retail applications in the near-term.
Agriculture and Mining
Drone-mounted image capture devices can help farmers determine which crops are in need of greater irrigation, fertilization, or pest control. A key to unlocking this value is capturing high resolution images, which is directly linked to the quality of image capture. Take Ceres Imaging as a leader in this space. The benefits of high resolution images they capture lay in identifying precise locations in large acreage farms to address irrigation sites of improvement.
“There are other imagery providers available, but they aren’t as detailed or high resolution. With Ceres Imaging, we can direct employees to the correct row, even down to the tree, to address the issue we find in the imagery.”
— Craig and Sheridan Alm, Owners, Yatco Farms
How large of an improvement are we talking about here? In one customer testimonial from CMV Farms in Australia, Ceres Imaging was able to unearth 50-80% of the deficiencies in the pistachio and almond irrigation monitoring systems in play. In other cases, farmers can save 20 labor-hours and mitigate land damage by identifying exact trees due for replacement. Prospera and Sentera are agtech startups that also help farmers maximize yields via computer vision.
What makes agriculture a prime target for computer vision is that it is difficult to cover hundreds of hectacres of farmland, even with regular inspections. Computer vision eases that human endeavor. A similarly challenging place to capture data is in the dust-filled air of mineral extraction and mining. Hard rock mining involves crushing large rocks with explosives, collecting them in haul trucks, and then smashing them into small rocks before chemical extraction. Sometimes, you end up with “jams” and if not noticed soon enough, you can end up with a layer of rock over the bottlenecked jam (this entire larger jam is called a ‘bridge-over’). Across 3 mining sites, bridge-overs can cost $50m per year. By using a powerful enough smart camera, which could process images in real-time, World Wide Technology (a larger technology and consulting firm) was able to develop a mining solution that beat human jam perception.
In case other computer vision clients or entrepreneurs are interested in a camera that is able to operate through dust and heat while facilitating real-time processing, WWT went with the Adlink Neon203B-JT2-X smart camera. An industrial Basler camera sensor coupled with a NVIDIA Jetson TX2 computer made computer vision algorithms coded in Python possible. Most smart cameras on the market don’t allow for this processing; neither do they have an internal cooling fan feature to mitigate the harsh dry heat. For agriculture and mining clients in arid climates (as we see in some African and Asian countries), this camera model might be a good fit.
If image capture capabilities are limited, then the processing needs to be strong. We see this strength with Datarock, a computer vision startup that serves the mining industry with a computer vision enabled SaaS platform. Datarock applies ML to drill core imagery that often times bears difficult-to-interpret shadows. In laypeople’s terms, mining and exploration companies extract samples deep in the earth’s surface via drilling, and place them on trays for geologists and engineers to painstakingly inspect and investigate. Imagine feeding these rocks to a CV-enabled camera instead. In Australia, tech consultancy Solve Geosolutions is helping tech organizations build cloud-based platforms to automate the analysis of drill core imagery.
What we’re seeing by taking the mining industry as an example is that tech consulting firms like Solve and WWT are adding incredible value, given their experience with implementing computer vision solutions in the industry. Any startups that are facing issues in this space can consult with players like Solve - and perhaps by marketing these capabilities more widely, a geo-focused firm like Solve can expand their customer base to industries beyond mining.
Retail and Grocery
Rob Toews also believes in retail and grocery based applications of computer vision. The holy grail for retail and grocery lies in cashierless grab and go technology that enables a seamless checkout experience. Amazon led the way with Amazon Go, but there have since been CV-based startups: Standard Cognition raised $150m from SoftBank earlier this year and is valued at half a billion. The startup has launched small stands and kiosks as a foot-in-the-door approach to attracting new customers. Other startups include Grabango and Trigo Vision. Like other CV startups, Trigo emphasizes its privacy-by-design architecture.
This addresses the key hindrance to CV in consumer settings. How do we ensure privacy when the technology necessitates constant image capture? Is it enough for a company like Trigo to say that they are “privacy compliant”? The company line is that they are using encrypted software to enable grab and go purchasing. However, if Standard AI (Cognition) is one of the biggest in the space and Trigo raised $104m total, I think it’s fair to say that crossing the privacy barrier will be difficult - especially in the U.S. That is, we don’t have unicorns in the space yet and I contend it will take at least another year.
Standard’s CEO Jordan Fisher recognizes the difficulties. Although he hopes that CV tech can be implemented in grocery stores, gyms, and other consumer experiences, he notes that his technology avoids facial and biometric detection. He hopes that the technology does not evolve to a point that individual routines are identified (e.g. the same person is detected at the gym ten minutes after they leave the grocery store). Though I personally don’t mind if CV is implemented in grocery stores, I think people would care that images of their body are being photographed. This would be an interesting survey to conduct - I’d be curious to see the results. My thesis is that people would protest widespread grocery store adoption of computer vision as overstepping boundaries if they knew video footage of their torso was being digitally logged (just imagine if a photo collection of many women’s torsos is hacked). If computer vision does take off in grocery and retail outlets, it will be important to have images stored on-prem (similar to standard surveillance footage) rather than floating in the cloud, prone to cyberattack.
III. Case Study: Activeloop and Mobius Labs
In my search, two computer vision companies caught my eye as a useful case study: Activeloop and Mobius Labs.
ActiveLoop is borne of Davit Buniatyan’s frustrations working as a computer scientist at Princeton. Often times, it can take a long time to download training data and images to your computer before applying computer vision, audio processing, and NLP algorithms. Davit decided it would make sense to provide an online storage layer to computer scientists and engineers - that stores unstructured data like videos and images and streams these on an as needed basis for processing. The company is still in early innings with 15 employees, fresh off a $5m raise.
Activeloop is an interesting case because it offers a service to companies that are already buying into and using computer vision capabilities. As computer vision heats up in the B2B space, customers will lean on Activeloop similar to how B2B companies leaned up Stripe for payments processing or Mailchimp for email campaigns - Activeloop is a remora fish latching onto the shark that is computer vision tech more broadly.
For example, one customer success story from Activeloop is Intelinair, a crop intelligence company that had 1500 terabytes of aerial imagery to analyze - it is also multi-spectral and multi-sensor (thermal, topography, soil composition, etc.). In fact, 1500 terabytes equates to over 8% of all photos ever uploaded to Facebook - this is a huge amount of data. Activeloop brought down their data storage costs by 30% by building and scaling a data pipeline for Intelinair.
Activeloop has a calculator that helps prospective customers run something akin to an ROI analysis. You can input overall data size, number of data scientists on the team, and number of ML models in production and the calculator returns total annual cost savings by switching to Activeloop. I tried to get a sense of these savings by taking Intelinair as an example. I estimated 1500 TB of data as the size, and based on an overall employee count of 45, half of that in engineering, and a significant number of data scientists among those engineers, I landed on 15 data scientists. Finally, I wasn’t sure about the number of ML models in production, so I ran the Activeloop calculator for 5, 10, 15, and 20 models. You need to pick your primary data type (I selected video and aerial in this case for the crop intelligence company). Here are the results:
Off the bat you realize that the costs are linear in compute (since we’re altering the number of ML models) - with each incremental ML model adding $7,500 in superfluous costs that Activeloop can mitigate. It also makes sense that the headcount losses are the largest portion of the costs; Activeloop can greatly reduce the need for a 15-strong engineering team. By saving data in the cloud, Activeloop brings down storage costs.
Activeloop would pair well with a company that addresses another common computer vision problem: how do we package image processing capabilities into a small, and tractable image capture device? Berlin-based Mobius Labs had done a solid job of solving this problem: essentially mounting these capabilities on a small software development kit (SDK), that can be affixed on satellites and sensors. A hallmark of Mobius Labs is that the SDK is on-prem so there is no privacy concern, as we were discussing earlier, around customer data being sent back to Mobius. The SDK also allows users to feed it unique training data rather than providing some off the shelf coarse tool for the job.
Mobius now has over 30 customers and is looking to double its footprint in the next 12 months, fresh off $6m in funding led by Ventech. Mobius is focused on geospatial and aerial imagery at the moment, which based on my search, seems to be the darling of computer vision these days. However, the company has had success with more down-to-earth pursuits such as serving EyeEm, a Berlin-based photography marketplace.
To be specific about Mobius’ value prop: Mobius was able to tag EyeEm photos with difficult concepts (e.g. “love”), and tag moments in videos that were good for commercial placement. With Mobius Labs, the client was able to realize tangible saved time in photo tagging, which translated into a better experience for EyeEm customers, photographers and brand managers looking for images.
How exactly is Mobius looking to grow beyond analyzing more geospatial data? We see from their hiring page (as of 11/8/21), that they’re hiring an enterprise AE in the UK to handle media/video, and a few BD reps in Berlin. I think it will be a while before Mobius expands into the U.S., and catering to marketing and creative agencies in Europe is still high up on their priority list. However, I’m excited for whether they can enter the U.S. via an Activeloop acquisition, and whether they can start experimenting with new industries such as medicine, in which laparoscopic surgery requires heavy visual interpretation in real-time. The ability to bring down tagging time by fractions of a second is exactly what’s needed during surgery. In the meantime, startups like Activ Surgical are at the forefront of leveraging computer vision for laparoscopic (camera-assisted) surgery.
Conclusions
Let’s recap. Computer vision is a subsegment of AI that applies machine learning algorithms to troves of video/image data for the purposes of object identification, segmentation, and interpretation. We’re seeing image data captured aerially and geospatially - for agriculture purposes in the main. This is where computer vision startups Ceres Imaging and Sentera are having a field day. Re-read that bit above for exactly how that’s happening. The point is computer vision has a role to play in solving larger agricultural issues as well - for example, the deforestation efforts in Mato Grosso, Brazil.
It’ll be tougher for computer vision to carve its niche in the retail space, where pictures of people’s torsos cannot be floating on the internet. It’s doubly important there that CV startups offer on-prem solutions, potentially partnering with Mobius Labs, a computer vision SDK provider that largely parries the typical privacy concerns. Mobius Labs and Activeloop (which streams images for computer vision analysis and obviates local data downloads) are promising young startups, fresh off capital infusions that are going one step beyond traditional computer vision startups by considering how to make the computer vision process itself more efficient. Companies like these embody the next wave of CV startups.
I’m excited for what’s to come. With over one quintillion bytes of image/video data generated daily, it is high time we synthesize insights while making our lives easier. A picture’s worth a thousand bytes.