Back to Journal
Machine Learning 22 November 2024 10 min read Sheece Gardezi

ML Inference on 250mW: TinyML for Geospatial Sensors

TinyML runs object detection and anomaly classification on microcontrollers drawing under 250mW. Real-time inference at the sensor, no cloud round-trip required.

Edge AITinyMLIoTEmbedded SystemsSensors
Close-up of electronic circuit board and microprocessor
Alexandre Debieve on Unsplash

A wildlife camera in remote Australia generates terabytes of video annually. Streaming that to the cloud requires bandwidth that doesn't exist and costs that don't make sense. Compressed to a 286KB TinyML model, the same camera classifies species on-device in 3ms per frame, transmits only confirmed sightings, and runs for years on a coin cell battery. Edge AI doesn't improve the cloud pipeline — it eliminates it.

256KB of RAM, Milliwatt Power, Real ML Inference

TinyML refers to machine learning inference on microcontrollers with milliwatt power budgets — devices with 256KB of RAM and 1MB of flash storage. These aren't Raspberry Pis; they're the chips inside sensors, wearables, and IoT devices that operate on coin cell batteries for years.

Through aggressive model compression — quantization, pruning, knowledge distillation — researchers have squeezed vision models into 286-536KB deployable footprints. Inference takes 3-15ms per frame with energy consumption measured in microjoules. Quantized MobileNet variants maintain accuracy at 0.85 or above while fitting in embedded flash budgets.

TinyML systems do not transfer data to any server for inference, as machine learning functions are executed on the device itself. This makes TinyML very appropriate for real-time applications requiring immediate feedback.
Scientific Reports, 2025

From 100MB to 286KB: Five Compression Techniques

Key Optimization Approaches

8-bit post-training quantization

3-4× storage reduction with minimal accuracy loss

4-bit and 2-bit k-means quantization

Up to 90% memory reduction

2:4 sparsity patterns

50% weight pruning with hardware acceleration

Knowledge distillation

Train small student models from large teachers

Neural Architecture Search

Automatically find efficient architectures for constraints

For many classification and detection tasks, the compressed models are indistinguishable from full-precision versions. 4-bit quantization alone delivers up to 90% memory reduction with minimal accuracy loss.

Five Geospatial Use Cases That Only Work at the Edge

For sensor networks in remote locations without reliable connectivity, TinyML enables capabilities that cloud-dependent architectures cannot deliver:

Edge AI for Geospatial

  • Wildlife monitoring — On-device species classification from camera trap images
  • Seismic detection — Real-time earthquake classification at the sensor
  • Agricultural sensing — Crop disease detection from multispectral cameras
  • Water quality — Anomaly detection from turbidity and chemical sensors
  • Infrastructure monitoring — Crack detection on bridge strain sensors

The bandwidth savings are orders of magnitude. Instead of streaming continuous sensor data to the cloud, edge devices transmit only detections or anomalies — reducing data transfer from terabytes to kilobytes annually.

Hardware: From $5 ESP32 to 100 TFLOPS Jetson Orin

Purpose-built silicon spans four orders of magnitude in compute and price:

Edge AI Hardware Options

NVIDIA Jetson Orin

AI-optimized SoC with 100+ TFLOPS for edge servers

Google Edge TPU v3

INT8 inference under 10ms for small model blocks

ARM Ethos-U85 NPU

Designed for Cortex-M microcontrollers

Syntiant NDP

Ultra-low-power audio/sensor inference

ESP32-S3

WiFi microcontroller with vector instructions for ML

For our seismic sensor networks, we've evaluated ESP32-based designs running TensorFlow Lite Micro. The combination provides sufficient compute for waveform classification while maintaining the power efficiency needed for solar-powered remote deployments.

Sub-9B Language Models Running On-Device

Small language models under 9B parameters with aggressive quantization now run on edge devices, enabling natural language interaction with sensor data. A remote monitoring station can answer "Has there been unusual seismic activity in the last 24 hours?" entirely on-device.

Technologies like uTensor with CMSIS-NN kernels enable LLM blocks on ARM Cortex-M devices with under 256MB DRAM. This isn't GPT-4, but for constrained inference tasks — summarizing sensor logs, classifying alert severity — edge LLMs are viable today.

Frameworks and Tooling

tflite_conversion.py
import tensorflow as tf

# Convert model for TensorFlow Lite Micro
converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
converter.representative_dataset = representative_dataset_gen

tflite_model = converter.convert()

# Deploy to microcontroller
open("model.tflite", "wb").write(tflite_model)

TensorFlow Lite Micro remains the dominant framework, but alternatives are emerging. TensorFlores generates platform-agnostic C++ code for embedded systems. Edge Impulse provides end-to-end workflows from data collection to deployment.

Start With Signal Processing, Not Neural Networks

For organizations operating remote sensor networks, edge AI is transformative. Running inference at the sensor changes the economics of environmental monitoring — from continuous cloud streaming costs to near-zero transmission budgets.

But for many applications, simple threshold-based logic still outperforms ML. The right question isn't "can we run a neural network on this sensor?" but "does ML provide meaningfully better decisions than classical signal processing?" Start simple. Add ML complexity only when the value proposition is clear and measurable.

Have a project in mind?

Location

  • Canberra
    ACT, Australia