Remote Sensing 2023

Sentinel Imagery Selection Tool

Selection tool that auto-filters 83% of cloud-contaminated Sentinel-2 tiles before human review, cutting ML training dataset curation from 3 weeks to 4 days per project.

Client

Digital Agriculture Services

Sentinel Imagery Selection Tool - Satellite view of Earth

Key Results

3wk→4d

Curation Time

83%

Auto-Filtered Tiles

Concurrent Reviewers

17K+

Tiles Processed/Project

The Challenge

Each ML training project required reviewing 17,000+ Sentinel-2 tiles to find cloud-free captures. A single analyst took 3 weeks per project, and inconsistent quality judgments between reviewers were degrading downstream model accuracy.

Key challenges included:

17,000+ tiles per project requiring individual visual inspection
Reviewers disagreeing on cloud contamination 22% of the time
3-week curation cycle delaying ML model training schedules
No assignment tracking, causing duplicate reviews and missed tiles
Selected images had no metadata trail for reproducibility audits

ML Data Quality

ML models for crop classification are only as good as their training imagery. A single cloud-contaminated tile in the training set can bias predictions across an entire region, making rigorous image selection a prerequisite for accurate agricultural insights.

Training Data Quality

Critical for model accuracy

Cloud Contamination

Major source of bad training data

Our Solution

Automated cloud detection eliminates 83% of unusable tiles before a human ever sees them. Teams of up to 8 reviewers work in parallel with assignment tracking, reducing curation time from 3 weeks to 4 days while improving inter-reviewer consistency by 34%.

Automated Cloud Detection

Cloud probability masks pre-filter 83% of unusable tiles, cutting the human review queue from 17K to ~2,900 images.

Visual Inspection Interface

Keyboard-driven review UI averaging 4.2 seconds per tile decision, 3x faster than previous GIS-based workflow.

Sentinel API Integration

Direct Copernicus access pulling tiles for any date range and region, with automatic retry on API rate limits.

Workflow Management

Assignment tracking for up to 8 concurrent reviewers with progress dashboards and conflict detection.

D3.js Quality Dashboard

Interactive temporal heatmaps showing cloud coverage by region, helping teams target the cleanest acquisition windows.

ML Pipeline Export

One-click export to training data formats with full provenance metadata for reproducibility audits.

Project Impact

Workflow Efficiency

Curation cycle cut from 3 weeks to 4 days per project
Automated pre-filtering removes 83% of cloud-contaminated tiles
Standardised review criteria reduced inter-reviewer disagreement by 34%
Assignment dashboard eliminated duplicate reviews and missed tiles entirely

Data Quality

Consistent cloud-free thresholds enforced across every project
Downstream ML model accuracy improved 7% from cleaner training data
Rework from bad image selection dropped from 18% to under 3% of tiles
Full provenance trail on every selected image for reproducibility audits

Previous: Precision Agriculture Next: Solar Footprint Mapping