Remote Sensing 2023

Sentinel Imagery Selection Tool

Selection tool that auto-filters 83% of cloud-contaminated Sentinel-2 tiles before human review, cutting ML training dataset curation from 3 weeks to 4 days per project.

Client

Digital Agriculture Services

Sentinel Imagery Selection Tool - Satellite view of Earth

Key Results

3wk→4d

Curation Time

83%

Auto-Filtered Tiles

8

Concurrent Reviewers

17K+

Tiles Processed/Project

The Challenge

Each ML training project required reviewing 17,000+ Sentinel-2 tiles to find cloud-free captures. A single analyst took 3 weeks per project, and inconsistent quality judgments between reviewers were degrading downstream model accuracy.

Key challenges included:

  • 17,000+ tiles per project requiring individual visual inspection
  • Reviewers disagreeing on cloud contamination 22% of the time
  • 3-week curation cycle delaying ML model training schedules
  • No assignment tracking, causing duplicate reviews and missed tiles
  • Selected images had no metadata trail for reproducibility audits

ML Data Quality

ML models for crop classification are only as good as their training imagery. A single cloud-contaminated tile in the training set can bias predictions across an entire region, making rigorous image selection a prerequisite for accurate agricultural insights.

Training Data Quality

Critical for model accuracy

Cloud Contamination

Major source of bad training data

Our Solution

Automated cloud detection eliminates 83% of unusable tiles before a human ever sees them. Teams of up to 8 reviewers work in parallel with assignment tracking, reducing curation time from 3 weeks to 4 days while improving inter-reviewer consistency by 34%.

Automated Cloud Detection

Cloud probability masks pre-filter 83% of unusable tiles, cutting the human review queue from 17K to ~2,900 images.

Visual Inspection Interface

Keyboard-driven review UI averaging 4.2 seconds per tile decision, 3x faster than previous GIS-based workflow.

Sentinel API Integration

Direct Copernicus access pulling tiles for any date range and region, with automatic retry on API rate limits.

Workflow Management

Assignment tracking for up to 8 concurrent reviewers with progress dashboards and conflict detection.

D3.js Quality Dashboard

Interactive temporal heatmaps showing cloud coverage by region, helping teams target the cleanest acquisition windows.

ML Pipeline Export

One-click export to training data formats with full provenance metadata for reproducibility audits.

Project Impact

Workflow Efficiency

  • Curation cycle cut from 3 weeks to 4 days per project
  • Automated pre-filtering removes 83% of cloud-contaminated tiles
  • Standardised review criteria reduced inter-reviewer disagreement by 34%
  • Assignment dashboard eliminated duplicate reviews and missed tiles entirely

Data Quality

  • Consistent cloud-free thresholds enforced across every project
  • Downstream ML model accuracy improved 7% from cleaner training data
  • Rework from bad image selection dropped from 18% to under 3% of tiles
  • Full provenance trail on every selected image for reproducibility audits