Back to Journal
Infrastructure 15 November 2024 12 min read Sheece Gardezi

MQTT to Kafka: Streaming 1M Sensor Events per Second

MQTT handles device ingestion; Kafka handles backpressure, replay, and fan-out. The architecture for bridging OT sensor networks to IT analytics at industrial scale.

KafkaMQTTIoTStreamingSensors
Industrial sensors and monitoring equipment
ThisisEngineering RAEng on Unsplash

Audi ingests connected car telemetry in real time. Deutsche Bahn runs train information systems across Germany. Quarterhill adjusts toll pricing within sub-second latency from sensor to decision. All three run the same core architecture: MQTT handles the last mile to constrained devices on unreliable networks, Kafka handles everything after. The pattern works because neither protocol tries to do the other's job.

8-Bit Microcontroller Meets Enterprise Data Platform

MQTT was designed for constrained devices and unreliable networks. A client runs on an 8-bit microcontroller, supports QoS levels for guaranteed delivery, and handles sensors that go offline and reconnect. Kafka was built for enterprise data streaming — high throughput, durability, exactly-once semantics, and integration with the broader data platform.

Since Kafka was not built for IoT communication at the edge, the combination of Apache Kafka and MQTT together are a match made in heaven for building scalable, reliable, and secure IoT infrastructures.
Confluent IoT Architecture Guide

Trying to use Kafka clients directly on constrained devices is impractical — they're resource-intensive and assume reliable connectivity that IoT environments can't guarantee. The protocols are complementary, not competing.

Five-Layer Architecture: Edge to Analytics

IoT Streaming Stack

Edge Layer

Sensors publish to MQTT broker (EMQX, Mosquitto, HiveMQ)

Bridge Layer

MQTT-Kafka connector subscribes and produces to Kafka topics

Processing Layer

Kafka Streams or Apache Flink for real-time analytics

Storage Layer

Time-series database (TimescaleDB, InfluxDB) for hot data

Analytics Layer

Data warehouse (BigQuery, Snowflake) for historical analysis

The MQTT broker handles the complexity of device connections—authentication, session state, last-will messages for detecting disconnected sensors. The Kafka connector creates a clean interface between edge chaos and enterprise order.

Production Deployments: Automotive, Rail, Energy, Steel

The MQTT-Kafka pattern runs in production across industries at massive scale:

Notable Deployments

  • Audi — Connected car infrastructure for real-time ingestion and analysis
  • Deutsche Bahn — Real-time train information systems across Germany
  • E.ON — IoT cloud platform for smart homes and energy grids
  • Bosch Power Tools — Real-time alerting dashboards for industrial equipment
  • Severstal — Edge analytics for predictive maintenance in steel production

Quarterhill's intelligent traffic system demonstrates sub-second decision-making: adjusting toll rates based on real-time congestion is only possible with data streaming that maintains sub-second latency from sensor to pricing engine.

OT/IT Convergence: Event-Driven Replaces Polling

Traditional OT (Operational Technology) middleware — vendor-locked, polling-based systems — is giving way to event-driven architectures built on Kafka, MQTT, and OPC-UA. This is an integration pattern, not just a technology upgrade.

Kafka serves as the central event backbone, MQTT enables lightweight device communication, and OPC-UA ensures secure industrial data exchange. Together, they allow organizations to scale dynamically without vendor lock-in.

Zero Data Loss Through Network Partitions

Kafka's advantage in IoT isn't just throughput — it's resilience. If connectivity between edge and cloud is interrupted, Kafka's storage semantics guarantee that records aren't lost and will be delivered once connection is reestablished.

For seismic sensor networks, where remote stations may lose satellite connectivity during storms, this durability is essential. Sensors buffer locally, the edge gateway buffers to disk, and when connectivity returns, everything flows through to the central platform without data loss.

EMQX: Millions of Concurrent Connections, Native Kafka Bridge

emqx-kafka-bridge.yaml
# EMQX Kafka integration configuration
bridges:
  kafka:
    servers: "kafka:9092"
    topic: sensor_data
    message_key: "${clientid}"
    value_encoder: json
    ssl:
      enable: true
      cacertfile: /etc/emqx/certs/ca.crt

EMQX has emerged as the enterprise MQTT broker of choice, offering native Kafka integration without custom connectors. Its clustering capability handles millions of concurrent connections, making it suitable for large-scale IoT deployments.

Topic Sprawl and Operational Cost

Kafka has limitations for IoT-specific patterns. Managing a large number of topics (common when each device has multiple data streams) creates overhead. Design topic hierarchies carefully — one topic per sensor type rather than per device.

Cost is the other constraint. Kafka clusters aren't cheap, especially managed cloud offerings. For smaller deployments, simpler alternatives (Redis Streams, NATS) deserve evaluation before committing to the full Kafka ecosystem.

Under 100 Sensors? Skip Kafka.

The MQTT-Kafka pattern is the default architecture for IoT projects at scale. The separation of concerns is clean: MQTT handles the messiness of device communication; Kafka provides the enterprise integration layer.

For projects with fewer than a hundred sensors and straightforward analytics requirements, simpler stacks (MQTT direct to TimescaleDB, for example) are more appropriate. Kafka shines at scale; at smaller scales, it's overhead without proportional benefit.

Have a project in mind?

Location

  • Canberra
    ACT, Australia