Is more bands always better?
Unpacking the black box with Eoin Hickey
Around the time I joined the Fieldy ML team, there were some big questions about pipeline optimisation that needed answering. One such question pertained to the data generated by the Sentinel-2 satellites during its Earth observations. When it came to classifying specific crops from space, which of Sentinel’s 13 wavelength bands were doing the heavy work in our classifier models, and which were just slowing them down?
Brute-force testing any significant fraction of the 8190 possible combinations of 13 bands was not an option, so some creative thinking was required. The geospatial dataset I used (sourced via Radiant Earth) split a region in Benin into classes such as cashew plantations, cropland, residential areas, etc. For each class, I plotted frequency distributions of the possible pixel values per band, giving a kind of “spectrum” with peaks that could be attributed to that particular band and class.
Our analysis of these spectra revealed a few key bands that showed the more inter-class variability than the rest. So, I trained a few dozen models on the cashew data using different combinations of these key bands, with some random combinations used as controls. The randomised controls performed significantly worse than the baseline (model trained on 12 bands), while the models trained on the key bands performed almost as well as the baseline, using only a quarter of the training data! Also, the lighter models struggled a lot less with overfitting, showing much cleaner learning curves.
Ultimately, we concluded that maintaining 12 bands was best, as it slightly outperformed even the best alternatives within my small experiment. The architecture of the model I used meant that the training time was the independent of the number of bands, but an argument could be made for using less bands to save time in data handling and processing in earlier steps of the pipeline. It appears more data is always better, except when time is of the essence!