>

MMEarth-Bench:
Global Model Adaptation via
Multimodal Test-Time Training

1 Harvard University 2 University of Copenhagen

Overview

Overview figure
Recent research in geospatial machine learning has demonstrated that models pretrained with self-supervised learning on Earth observation data can perform well on downstream tasks with limited training data. However, most of the existing geospatial benchmark datasets have few data modalities and poor global representation, limiting the ability to evaluate multimodal pretrained models at global scales. To fill this gap, we introduce MMEarth-Bench, a collection of five new multimodal downstream tasks with 12 input modalities, globally distributed data, and both in- and out-of-distribution test splits. We benchmark a diverse set of pretrained models on MMEarth-Bench and find that multimodal models generally perform best. While pretraining tends to improve model robustness in limited data settings, geographic generalization abilities remain poor and using multimodal inputs at test time can sometimes lead to geographic overfitting. In order to facilitate model adaptation to new downstream tasks and geographic domains, we propose a model-agnostic method for test-time training with multimodal reconstruction (TTT-MMR) that uses all the modalities available at test time, regardless of whether the pretrained model accepts them as input. We show that TTT-MMR improves model performance on both random and geographic test splits, and that geographic batching leads to a good trade-off between regularization and specialization during TTT.

Motivation

Self-supervised multimodal pretraining promises to overcome the grand challenges in Earth observation. Crucial applications like the SDGs have to rely on limited and sparse (e.g., field measurements) and geographically biased (i.e., no labels for large parts of the world) training data. Furthermore, models conditioned on multiple modalities may resolve ambiguities inherent to modeling biophysical quantities with remotely sensed data.

Limited data
Sparse data visualization
Domain shifts
Globe visualization Histogram visualization
Ambiguities
Multimodal
input
Vector encoder
3D modalities visualization Precipitation Geolocation Satellite

Explorer

Explore the MMEarth-Bench dataset interactively. Zoom in to view pixel-level data, hover over tiles to see tile-level data, and filter by task, split, and species.

Species

The species task in MMEarth-Bench contains 100 terrestrial mammals, shown below in order of their prevalence in the dataset.

Tiles per species

Benchmarking

We benchmark 7 pretrained models on MMEarth-Bench:
3 RGB: Scale-MAE DINOv3 Web DINOv3 Sat
2 Sentinel-2: SatlasNet MPMAE
2 multimodal: TerraMind Copernicus-FM
We rank these models after finetuning on all training data.

Split Rank All tasks Biomass Soil N Soil OC Soil pH Species
Random 1 Copernicus-FM MPMAE TerraMind Copernicus-FM TerraMind Copernicus-FM
2 TerraMind Copernicus-FM Copernicus-FM MPMAE Copernicus-FM TerraMind
3 MPMAE TerraMind MPMAE TerraMind MPMAE MPMAE
4 DINOv3 Sat DINOv3 Sat DINOv3 Sat SatlasNet DINOv3 Web SatlasNet
5 SatlasNet SatlasNet DINOv3 Web DINOv3 Web DINOv3 Sat DINOv3 Sat
6 DINOv3 Web DINOv3 Web SatlasNet Scale-MAE SatlasNet DINOv3 Web
7 Scale-MAE Scale-MAE Scale-MAE DINOv3 Sat Scale-MAE Scale-MAE
Geographic 1 Copernicus-FM MPMAE DINOv3 Sat Copernicus-FM TerraMind Copernicus-FM
2 MPMAE TerraMind Copernicus-FM MPMAE SatlasNet TerraMind
3 TerraMind Copernicus-FM MPMAE SatlasNet MPMAE MPMAE
4 SatlasNet DINOv3 Sat TerraMind TerraMind DINOv3 Web DINOv3 Sat
5 DINOv3 Sat SatlasNet DINOv3 Web DINOv3 Web DINOv3 Sat DINOv3 Web
6 DINOv3 Web DINOv3 Web SatlasNet Scale-MAE Copernicus-FM SatlasNet
7 Scale-MAE Scale-MAE Scale-MAE DINOv3 Sat Scale-MAE Scale-MAE

Method

The geospatial machine learning community has embraced multimodal data for self-supervised pretraining of geospatial foundation models. Leveraging multimodal data at inference time as well provides more context for the model when making a prediction. A pretrained model can use multimodal data at inference time by taking it as input, but this typically requires a large multimodal encoder that was or can be trained with a lot of data. Inspired by both the multi-pretext pretraining paradigm employed by MMEarth and the test-time adaptation framework, we propose using multiple modalities as auxiliary tasks at test time. In particular, we use multiple modalities as reconstruction targets to provide a test-time adaptation signal for the encoder. The reconstruction tasks can be solved with a lightweight, linear decoder, and they do not require the encoder to accept the modalities as input. In the animation below, we illustrate our method for test-time training with multimodal reconstruction (TTT-MMR) for improving model performance at test time.

Applying TTT to batches of test tiles acts as a regularizer since it results in less noisy gradients. However, this also allows for less specialization to any particular tile, which could limit its benefits. To balance regularization and specialization, rather than forming batches randomly as in TTT-MMR, we also propose geographic batching, in which test tiles are grouped into non-overlapping batches that are contiguous geographic regions. Our TTT-MMR-Geo method batches the test tiles based on geographic proximity as a proxy for tile similarity using recursive spatial partitioning.

TTT-MMR Geo Partitioning Animation

Download

Run the following command in the command line to download the MMEarth-Bench data.

mkdir -p mmearth-bench-data/{biomass,soil_nitrogen,soil_organic_carbon,soil_pH,species} && for task in biomass soil_nitrogen soil_organic_carbon soil_pH species; do wget -c -P "mmearth-bench-data/$task" "https://sid.erda.dk/share_redirect/cbMhbwV1yP/mmearth-bench-data/$task/$task.h5" "https://sid.erda.dk/share_redirect/cbMhbwV1yP/mmearth-bench-data/$task/${task}_split_data.json"; done && wget -c -P mmearth-bench-data/species "https://sid.erda.dk/share_redirect/cbMhbwV1yP/mmearth-bench-data/species/species_labels.json" && wget -c -P mmearth-bench-data "https://sid.erda.dk/share_redirect/cbMhbwV1yP/mmearth-bench-data/no_data_values.json"

This will take about 2 hours and yield the following folder structure, occupying 59 GB:

mmearth-bench-data/
β”œβ”€β”€ biomass/
β”‚   β”œβ”€β”€ biomass_split_data.json
β”‚   └── biomass.h5
β”œβ”€β”€ soil_nitrogen/
β”‚   β”œβ”€β”€ soil_nitrogen_split_data.json
β”‚   └── soil_nitrogen.h5
β”œβ”€β”€ soil_organic_carbon/
β”‚   β”œβ”€β”€ soil_organic_carbon_split_data.json
β”‚   └── soil_organic_carbon.h5
β”œβ”€β”€ soil_pH/
β”‚   β”œβ”€β”€ soil_pH_split_data.json
β”‚   └── soil_pH.h5
β”œβ”€β”€ species/
β”‚   β”œβ”€β”€ species_labels.json
β”‚   β”œβ”€β”€ species_split_data.json
β”‚   └── species.h5
└── no_data_values.json

BibTeX

@article{gordon2026mmearth-bench,
  title={Your Paper Title Here},
  author={First Author and Second Author and Third Author},
  journal={Conference/Journal Name},
  year={2024},
  url={https://your-domain.com/your-project-page}
}