It’s easy to forget that even with the fanciest of machine learning models, we still need humans in the trenches cleaning input data. Descartes Labs, a startup that combines satellite imagery with data about our planet to produce insights and forecasts, knows this all too well. The company ended up building its own cloud-based parallel computing infrastructure to clean and process its massive corpus of satellite imagery. Today it’s giving a handful of developers and early customers access to this system.
Companies like Descartes Labs cannot just throw raw satellite imagery into machine learning models to extract insights. Images captured contain clouds, cloud shadows and other atmospheric aberrations that make it impossible to compare images taken at different times. A small cloud over a field, for example, that wasn’t present in previous images, could completely throw off a model attempting to predict crop yields.
To overcome this challenge, engineers can use composite images to optimize for the best pixels across a collection of images. Google Maps employs composite imagery to remove clouds and create representations of the globe that are evenly lit by the sun.
The problem with combining dozens of satellite captures of the entire earth is that it’s incredibly computationally intensive. This is where Descartes Labs’s processing engine comes into play to convert into composites the petabytes of geospatial data it has.
Raw pixels collected, which include clouds, shadows and haze
Same image atmospherically corrected with clouds and shadows removed
Composite of best pixels from 24 processed images
In the images above, you can contrast the low quality of an initial capture with the high quality of a composite — the latter being far better for feeding into deep learning models. To get from start to finish, Descartes used mathematical techniques to convert the light in the first image to a certain base level before masking out clouds. After finishing that basic cleaning, the team then had to formally create the composite. Descartes turned the problem into a massive parallel compute job.
The result? The first-ever composites created using images captured by the Sentinel-1 and 2A satellites. Descartes grouped these composites with a third made from images captured by the LandSat 8 to create an online tool for exploring traditional imagery alongside unique Synthetic Aperture Radar (SAR) and red-edge imagery. Descartes uses a variety of satellite bands to monitor vegetation and other changes happening on the earth’s surface.
One major shortcoming of composite images is that they’re hard to contrast against each other over short time intervals. Because multiple captures have to be combined, they can really only be used for comparison across months or quarters. This still has value, particularly when thinking about the financial world’s reliance on the quarter system, but it makes it hard to use for week-to-week agricultural predictions.
Descartes is limiting the group with access to the underlying infrastructure for the time being. But anyone can spend time this afternoon examining the Sentinel and LandSat composites the team created.