The MQTT Bug Hunt!

February 15th, 2016 5 min read

Now that I’ve got my home fully MQTT connected, it’s time to use the data I gather from my sensors. Since I’m a big Node.js lover, I wrote some scripts that handle the data and do some magic. Unfortunately I ran into an annoying bug …

Of course there are many ways to connect to my MQTT server en subscribe to my ‘HomeSensor’ topics. But with the ease of Node.js, the power of MQTT.js and the comfort of the PM2 process manager. It was an easy choice to use this powerful combination.

For the time being I’ll be running three node scripts:

  1. timestamp.js: A script that adds timestamps to specific MQTT topics.
  2. dbstore.js: A script that stores all my HomeSensor state changes into a database.
  3. apn.js: A script that allows me to push Apple Push Notifications whenever a sensor fires a state change.

None of these scripts are very complicated and are mostly based on existing open source node.js modules. In future posts, I’ll probably explain in depth what they do and how they work …

With the above scripts up and running, I noticed I receive double notification whenever a sensor recorded a state change. My first guess was this was based on the fact that my ESP8266 module was incorrectly sending MQTT messages, but after inspecting the serial debugger, I could rule this issue out. The problem wasn’t on the publishing side. It was on the subscribe side. In other words, there was a problem with one of the above scripts, in this case with the apn.js script.

Adding debugging information.

Luckaly, MQTT.js allows you to quickly add some feedback whenever the MQTT connections has any issues:

client.on('reconnect', function() {
    console.warn('Reconnecting to MQTT server.');
});

client.on('close', function() {
    console.error('Connection to MQTT server closed.');
});

client.on('offline', function() {
    console.error('MQTT connection offline.');
});

client.on('error', function(error) {
    console.error('Error connecting to MQTT server.');
    console.error(error);
});

And with this in place, the problem soon became very clear: every 10 ~ 20 seconds, the apn.js MQTT connection disconnects and reconnects. Since the sensor states are retained on the MQTT sever, apn.js script will resend the notifications whenever it reconnects and receives the latest state. Of course, there are way to prevent sending a notification after a reconnect, but better yet I wanted to solve the connection issue.

Unfortunately, my Mosquitto MQTT server wasn’t very helpful in it’s logs:

1455282675: Client apn has exceeded timeout, disconnecting.

But altough this didn’t say much, it did tell me this wasn’t due to a client id that was already connected. One possible cause that could be ruled out.

Local testing vs. online testing.

Since doing some real debugging on my web server wasn’t very practical, I copied the scripts to my local machine (a Mac) and started doing some local testing. Lo and behold: the connection was stable when running locally! Keep in mind that I was still connecting to my online Mosquitto server, running on my web server. In other words, I now had the odd two scenarios:

  1. Mosquitto running on my webserver, apn.js running on my local machine: no problem.
  2. Mosquitto and my apn.js script both running on my webserver: connectivity issues.

To make sure this wasn’t a firewall issue, I disabled the firewall on my CentOS webserver. Unfortunatly, without any result.

In great confusion, I spent some time running thru the Mosquitto config, searching for settings that could affect this situation. But other than updating Mosquitto to the latest version, there wasn’t much to do. Even with this update there still was no solution for my weird problem.

Resorting to external help.

After countless hours of using my magic google-fu, and banging my head into my desk, I broke down and decided to try and get some help from people who probably know more about this issue then I do. I opened an issue (solution spoiler alert) on the MQTT.js GitHub Repository.

Since other suggested it might have something to do with the connection keep-alive ping requests, they suggested to check if the ping request are actually fired.

Debugging MQTT.js

The best way to test this, is to make a local copy of a MQTT.js module and add some logging into the ping functions. To do se, I cloned the MQTT.js repository and added some logging to the suggested functions. It turned out the ping requests were send as required and on time.

More importantly: the connection was now stable and did not disconnect.

So then it occurred to me: with the freshly pulled library, everything works fine. Am I just running an old version of MQTT.js on my web server? It couldn’t be that simple, could it?!

Trying the magic solution …

With sweaty palms, I typed the one simple possible solution:

npn update mqtt

And indeed … the problem was solved. DOH!

TL;DR

So kids, before spending days (yes, it actually took me more than 2 days to solve this) of debugging your scripts and servers, make sure everything is up-to-date!

You can now proceed to the comments and start mocking me! I deserve it.

Loading comments …
©2021 - MichaelTeeuw.nl