The role of message brokers in IoT and microservice scenarios

12 Oct 2020 - tsp
Last update 12 Oct 2020
Reading time 13 mins

As one can guess from the title this blog post will tough two somewhat distinct areas - deployments of distributed internet of things (IoT) devices as well as microservice infrastructure most commonly found inside data center applications. Even though they seem to have not much in common they’re even more similar than they look at the first glance - except for network reliability and depending on how one considers the trust relationships inside ones data center in trustability.

What is a message broker?
Gain from using a message broker
Problems that one might want to keep an eye on
Common pitfalls

What is a message broker?

Ok so let’s start of with a short description what message brokers are. Basically they perform some pretty simply task: They accept messages, queue them and pass them on to subscribers. They handle either reliable or unreliable message transfer, offer different commit strategies and delivery guarantees and come in a huge variety of implementations. They normally are also capable of caching messages as long as services are offline or restarting and provide a nice way to scale services as long as they scale by themselves.

They can be used to:

Provide loose coupling between various components (as for example is the case in microservice architecture) or provide the backend for IPC
Provide load balancing
Allow work queuing and scheduling for batch processing
Might queue messages for devices that are not active all the time

There are implementations that are serverless like ZeroMQ, implementations that are built around single or sharded servers like the famous RabbitMQ, distributed cloud services like Amazon’s simple queue service (SQS) also provide a similar task, Mosquitto MQTT that’s particular prominent in IoT applications or a little bit different in it’s usage Apache Kafka. Other examples would be Apache ActiveMQ, the Microsoft Azure Service Bus or services like Redis that also include databases and processing capabilities inside the message broker framework.

The protocols that are used will also be different depending on the implementation. Some of the most popular protocols are the Advanced Message Queuing Protocol (AMQP) that has initially been developed in the finance sector but is one of the most advanced message passing protocols currently out there as well as the somewhat more lightweight Message Queuing Telemetry Transport (MQTT). Fully implementing AMQP is a major task - even only for the client - because of it’s high flexibility and possibilities. MQTT is easy to implement on the client side as long as one doesn’t try to implement TLS oneself. It’s basic protocol only supports a few commands passed as ASCII text lines - on the other hand AMQP requires a full blown XML parser. Because of this difference MQTT is particular prominent for embedded devices like IoT sensors that run on microcontrollers like the ESP8266/ESP32 since they’ve simply not enough memory to easily handle XML parsing and processing. On the other hand AMQP is particular prominent for microservice deployments due to it’s high flexibility and maturity. There exist some other non standardized solutions like the ZeroMQ implementation that uses it’s own network protocol though. There are also protocols like SCRIBE built on top of the Pastry DHT - in this case the distributed nature of the distributed hashtable allows for a fault tolerant and realtime salable implementation of message passing without having any centralized server.

As one can see message brokers are a major building component of modern cluster and cloud infrastructure - most of the time also the self scaling behavior of these systems is built around message buses. One of the most exposed parts where one can see this fact is the configuration of Amazon Lambdas. In case of these small microservice fragments the message bus delivers messages and receives actions also in form of messages to and from the microservice code fragments (that might be implemented in a variety of languages like JavaScript or Java) - on the other hand they also control launching of non running service or the scaling of Lambdas in case of high loads.

Usually these message brokers implement some kind of publish subscribe semantic in which clients can publish messages into queues or into exchanges (that are then bound to queues) and on the other hand where clients are capable of creating queues that subscribe to given topics. Depending on the configuration of the exchange different fanout for messages can be built like:

Work queues in which work is enqueued as a request into a single queue and the exchange delivers it (reliable or unreliable depending on configuration) to a single worker process. When used with RabbitMQ messages will be delivered to the next available client in a round robin fashion. This can for example be used when triggering long running tasks from web applications that get executed on a bunch of application servers.
Publish/Subscribe is a basic pattern by which every subscriber to the same queue received messages published into this queue. In this case a copy will be delivered to every subscriber
Topic based publish/subscribe is an extension to publish subscribe. In this case messages of different topics get published into the same exchange. Every topic can be hierarchical like a domain name or filesystem path. Clients are capable of subscribing to given patterns. For example one might have a hierarchy like zone/sensortype and deliver message from zone 1 as well as zone 2 for temperature and humidity measurements. Temperature measurements from zone 1 will be published to zone1/temperature, humidity measurements from zone 2 to zone2/humidity. Clients are now capable of either subscribing to one of these topics directly or use some kind of filter like zone1/# for all measurements inside zone 1 or #/temperature to subscribe to all temperature measurements.
Remote procedure calls work similar to work queues. In this case requests are expected to lead to a result or return value. In this case the clients simply signal the services into which queue they want their requests to be delivered as well as a correlation key between request and response.

As usual with any technology message brokers are pretty cool - but they don’t solve every problem equally well.

Gain from using a message broker

Loose coupling (Microservices)

Since microservices are currently somewhat hyped just a word of warning: They are a nice method to design one’s system but they are not a substitute for clean planning and architecturing. The basic idea is that there is a bunch of small services who all subscribe to the topics they’re responsible for (or to the load balancing queues they’re responsible for) who process a small set of requests and respond with their results without knowing any other component of the system. Basically this would totally decouple the services from each other except for the message format and topics used during requests. Which also leads to the common pitfall: In case one has to change the message types or logical behavior of a service one might break other parts of the system that one hasn’t even thought of being coupled to the system - this is different to the situation in which applications link to the same shared library since then one would detect signature changed even at compile time.

Another really important thing to think about when using microservices is effective monitoring - these setups are particular prone of developing undetected bottlenecks in case they don’t auto scale (or a resource hog in case they do). One should monitor message rate and processing time for all services used and keep an overview of message types exchanged.

On the other hand deployment of microservices is normally particularly easy and exchanging components in a loosely coupled system is normally pretty easy due to the fact that one just has to terminate this single service and relaunch it without having much service interruption - normally other components don’t even see a restart. One might even replace different service versions incrementally in case they’re downwards compatible and avoid service interruptions at a whole.

Loose coupling also makes message brokers and message buses a briliant backbone for any automation project (home/lab/garden/etc.) - one can view hardware components as distinct microservices the same way as one can with any distributed project.

Load balancing and fail-over (Microservices)

As already mentioned is totally possible to subscribe to the same queue or topic using many different client services. Depending on the message broker one can then do a round robin fanout to all available clients. This allows one to load balance onto different application servers. In case one of them goes offline the remaining application servers still get messages delivered reliable so failure of a part of nodes can be compensated without any further consideration during implementation. Of course as usual one has to be careful about shared state between application servers (like databases or backend storage).

This is also useful during upgrade and deployment scenarios since taking offline single nodes isn’t a problem in any way. Just shutdown the service cleanly or simply kill it, redeploy the new instance and during the downtime other nodes will keep processing requests.

Queuing (Microservices, IoT)

Queuing is particular interesting in case requests are asynchronous and are enqueued in huge batches. In this case it would be total waste of resources to keep processing power available to process all requests at one. During message queuing messages containing requests or events will simply be queued till processing time is available. One might even go further and just run the processing jobs on a periodic basis like once a day or week and collect jobs inside the message queue for the remaining time - for example when one is billed by runtime of the processing machines.

On the other hand IoT devices are often connected via unreliable network connections or are powering down their networking hardware due to power or cost reasons. In this case messages sent towards the IoT devices might be queued on a message queue until the devices get available again. In case one wants to send a downstream message from an IoT backend to an LoRAWAN node for example one has to wait for the node to come online and then transmit the message inside a tight timing window towards the detected gateway. To realize such a scenario one usually creates a queue for each and every IoT device. Messages that should be sent into the downstream direction are simply enqueued and as soon as the device gets online messages are fetched by the given network server or gateway directly from the queue. In the other direction of course message queuing allows absorbing message spikes as well as load balancing and fault tolerant processing as long as the message broker is also fault tolerant.

Problems that one might want to keep an eye on

Reliability of the message broker

Having a distributed and fault tolerant application structure due to the usage of microservices is nice and great - but it doesn’t matter in case the message broker is not reliable or not redundant. In case one wants to follow that route one really has to consider reliability and distributed character of the message broker itself even before planning the microservice architecture. This is for example particularly interesting in industrial or home automation projects in which sometimes one has a great infrastructure (similar to Amazon Lambdas) but only a single message broker. In case this message broker goes offline all services die.

Load on the message broker

As many messages are processed the requirements on message brokers rise - for example memory usage, disk space usage as well as processor usage and network occupation. One has to monitor each and every of this parameters - in case of congestion on the message distribution system all other scaling is simply worthless.

Because of this distributing the message broker might be a good idea - but one should even keep an eye on the parameters in such a case, especially when using sharded designs.

Number of queued messages

Many message brokers work pretty well till one queues a few tens or hundred of thousands messages in the same queue. One should tightly monitor message load on the brokers.

Security

Normally it should be totally self-evident but sometimes one encounters message brokers with open access permissions. Of course configuration of message broker infrastructure is critical since it’s one of the main components gluing together the whole service infrastructure. Critical data is exchanged over this systems. One should really care about system configuration, service configuration and typical minimal access permissions for all services (i.e. there is no need for a service that just publishes some events to be capable of subscribing to any queue; there is no need for a service to subscribe to anything else than their required topics; Services should never share credentials; One should use strong authentication and normally also encryption even in one’s own trusted network; etc.).

Common pitfalls

Just a word of caution - there are some common pitfalls when developing using message brokers:

One should of course take care that message brokers are usually accessed using network technologies that are unreliable so one has to handle stuff like reconnecting, etc.
Messages are only adhering to delivery guarantees after they’ve been successfully deployed to a message broker and if they’re handled in a reliable way at the client. Only using a message broker might be a component of a reliable system but they’re no guarantee for a reliable system
One should never do synchronous stuff using message brokers - for example there is many code out there that tries to mimic synchronous RPC using message brokers - which is for example even seen in IoT world. One should never use synchronous methods while using distributed systems. Of course this does not exclude asynchronous mechanisms that look synchronous like blocking background processing thread or fibers while waiting for a response - but keep error handling in mind.
Keep an eye on the performance metrics of your message broker
It’s a single point of failure
Use caution that deployment of critical services works fully automatic from scratch as with all infrastructure services (and in best case with all services and software components - no human interaction should be necessary). If you don’t and an old message broker deployment dies you might have broken millions of microservices with a single failure.