Sentinel Imagery Selection Tool
Selection tool that auto-filters 83% of cloud-contaminated Sentinel-2 tiles before human review, cutting ML training dataset curation from 3 weeks to 4 days per project.
Client
Digital Agriculture Services
Key Results
Curation Time
Auto-Filtered Tiles
Concurrent Reviewers
Tiles Processed/Project
The Challenge
Each ML training project required reviewing 17,000+ Sentinel-2 tiles to find cloud-free captures. A single analyst took 3 weeks per project, and inconsistent quality judgments between reviewers were degrading downstream model accuracy.
Key challenges included:
- 17,000+ tiles per project requiring individual visual inspection
- Reviewers disagreeing on cloud contamination 22% of the time
- 3-week curation cycle delaying ML model training schedules
- No assignment tracking, causing duplicate reviews and missed tiles
- Selected images had no metadata trail for reproducibility audits
ML Data Quality
ML models for crop classification are only as good as their training imagery. A single cloud-contaminated tile in the training set can bias predictions across an entire region, making rigorous image selection a prerequisite for accurate agricultural insights.
Training Data Quality
Critical for model accuracy
Cloud Contamination
Major source of bad training data
Our Solution
Automated cloud detection eliminates 83% of unusable tiles before a human ever sees them. Teams of up to 8 reviewers work in parallel with assignment tracking, reducing curation time from 3 weeks to 4 days while improving inter-reviewer consistency by 34%.
Automated Cloud Detection
Cloud probability masks pre-filter 83% of unusable tiles, cutting the human review queue from 17K to ~2,900 images.
Visual Inspection Interface
Keyboard-driven review UI averaging 4.2 seconds per tile decision, 3x faster than previous GIS-based workflow.
Sentinel API Integration
Direct Copernicus access pulling tiles for any date range and region, with automatic retry on API rate limits.
Workflow Management
Assignment tracking for up to 8 concurrent reviewers with progress dashboards and conflict detection.
D3.js Quality Dashboard
Interactive temporal heatmaps showing cloud coverage by region, helping teams target the cleanest acquisition windows.
ML Pipeline Export
One-click export to training data formats with full provenance metadata for reproducibility audits.
Project Impact
Workflow Efficiency
- Curation cycle cut from 3 weeks to 4 days per project
- Automated pre-filtering removes 83% of cloud-contaminated tiles
- Standardised review criteria reduced inter-reviewer disagreement by 34%
- Assignment dashboard eliminated duplicate reviews and missed tiles entirely
Data Quality
- Consistent cloud-free thresholds enforced across every project
- Downstream ML model accuracy improved 7% from cleaner training data
- Rework from bad image selection dropped from 18% to under 3% of tiles
- Full provenance trail on every selected image for reproducibility audits