A few days ago I finally closed the OMEMO encryption ticket on Github.
Opened in 2015 it had many twists and turns along the years but I finally found a proper way of integrating it in Movim.
In this article I'll explain why adding #E2EE (End to End Encryption) was not as easy as with the other #XMPP clients (and more generaly all the chat clients that are using a similar encryption protocol) and how I addressed the issue.
But before going in the details I'd like to thanks the NLNet Fundation for its financial support in this project. With their help I was able to free-up some time to work on the problem and propose a proper architecture (detailled bellow) for it.
The result of this work will be released with the upcoming 0.20 version of #Movim. There is still some quirks and whims about it but the base is there and works pretty well.
End to End encryption in XMPP, a quick overview
The introduction of Signal in 2015 brought a small revolution into the encryption protocols in the IM ecosystem. The Double Ratchet Algorythm (see the dedicated technical documentation on the Signal website) allowed users to exchange messages between different clients in an “end to end encrypted” way (only user devices themselves know how to encrypt and decrypt messages) with some technical improvements (not detailed here) that made the new protocol a “must have” for all the others IM solutions.
Today the Double Ratchet Algorithm is used in applications such as WhatsApp or Matrix.
In the XMPP ecosystem it was primarily pushed by Daniel Gultsch in the Conversations.im client and standardized along the way in the OMEMO XMPP Extension XEP-0384: OMEMO Encryption. Throughout the years many XMPP clients implemented OMEMO, their status can be tracked on the following website Are we OMEMO yet?.
The OMEMO architecture
Without going too deep into the technical details the general idea about OMEMO is to generate some keys on each of the user's devices and publish the public ones on their account server.
Using the keys published on the XMPP user's servers, anyone can then start an encrypted session at any time (the servers are always available) and start to send messages to the desired contact without having to wait.
If one of the user's contacts wants to start an encrypted discussion they will first start to get those keys, then build sessions with their secret one and encrypt a message using the freshly built sessions.
If a user receives a new encrypted message and doesn't have an encryption session to that device, their device will then retrieve the contact keys, build the encryption sessions and start decrypting messages.
This can be done automatically if the contact trusts blindly the key used or in a more “trusted” way by accepting manually each keys to build the encryption sessions on.
All the existing XMPP clients are using this simple architecture. XMPP servers are storing their users' #OMEMO public keys and the users are connecting directly using their different devices to build their encrypted sessions.
The Movim particularity
But Movim is kind of special. The XMPP connection is actually not maintained on user devices but by the Movim server (built in PHP and running behind a web server such as Apache or nginx, see Movim General architecture on the Wiki). Movim is then processing everything server side, saving the information (articles, contacts and messages) in a SQL Database (PostgreSQL or MySQL) and then showing the result to the Movim users through a lightweight website.
If a user is connecting on the same Movim instance through several browsers using the same XMPP account all the browsers are then “merged” into one unique XMPP session (called "resource") and all the browsers are synchronized in real time by the Movim server. This is pretty useful to save memory and to prevent Movim to maintain several XMPP connections at the same time for a unique user. This also allows quick disconnection/reconnections, the users can close and reopen their tabs without having to reload the whole XMPP state when they come back after a while (Movim is closing the XMPP session after a day of inactivity).
End to end encryption actually requires to encrypt and decrypt messages on the user device, this brings several issues:
- For Movim, the user device is actually a “dumb” browser that only display the messages pre-processed by the Movim server, there is no logic whatsoever browser side
- A user can use simultaneously several browsers with the same XMPP connection on the same Movim instance
- All the message processing logic is done server side
This unique architecture requires a very unique way of adressing the E2EE situation. Hopefully OMEMO offers all the tools needed to handle those cases.
Split the logic
The OMEMO extension is actually talking about devices, for a large majority of the XMPP clients a device is connected through a unique XMPP session (one device equal one current XMPP resource in those cases).
The fact that Movim is sharing a unique session (resource) with several devices (browsers) is actually not an issue in the end. Each browser will then be considered as a unique device on its own, with its own key and its own OMEMO encrypted sessions.
This brings some interesing results. When a user is connected using the same XMPP account using two different browsers on the same Movim server (also called instance, or pod), an encrypted message sent by the browser Firefox will then directly be decrypted by the browser Chrome without even having to travel through the XMPP network.
The term “browser” is also defining more than actual browsers (like Firefox, Chrome or Opera). Since we can have private navigation or containers (in Firefox) each time it is seen as a different “browser” on the Movim side (because each context is separated, with a different cookie and different local data).
So the global idea is to continue to handle the messages server side, push the encrypted message object to the browser, and then implement only the key handling and message encryption-decryption flow browser side. When doing this implementation I actually looked at the Converse.js and JSXC OMEMO implementations, the Movim implementation is really close to the one done on those two clients (I am also re-using the libsignal JavaScript implementation).
This architecture actually works for the current version of OMEMO (0.3.0) where only the body is encrypted. The upcoming versions are looking to encrypt a larger part of the XML stanza. This will be way more difficult to handle for Movim, as it will require to decrypt messages browser side and then implement a second parser, this time in JavaScript (everything is parsed in PHP using libxml at the moment).
if (textarea.dataset.encryptedstate == 'yes') {
// Try to encrypt the message
let omemo = ChatOmemo.encrypt(jid, text, Boolean(textarea.dataset.muc));
if (omemo) {
omemo.then(omemoheader => {
...
xhr = Chat_ajaxHttpDaemonSendMessage(jid, tempId, muc, null, replyMid, mucReceipts, omemoheader);
...
});
}
} else {
xhr = Chat_ajaxHttpDaemonSendMessage(jid, text, muc, null, replyMid, mucReceipts);
...
}
This little JavaScript Movim code extract presents the differences in handling encrypted and unencrypted messages. The text
variable is containing the clear text version of the message. When the body is encrypted it is then calling the same method as for a clear text message.
This method is actually a wrapper generated by the RPC (Remote Procedure Call) Movim server core. Once this function is called an Ajax called is made and the rest of the flow is handled server side. The encrypted body, and generated OMEMO headers passed will be injected in a freshly generated XMPP XML <message>
.
Keep the messages in the local database
With the separation of the logic it was then required to keep a copy of the decrypted messages browser side.
To do that an IndexedDB database is used. This database is quite simple and only contains a key-value store, where the key is the message id (the same as the one in the Movim server SQL server databse) and the value the plaintext message.
- When a message is decrypted the plaintext body is then stored in this database.
- When the user sends an encrypted message, the original text is also saved in this database.
- If a message cannot be decrypted, the message key is still saved in the browser database with a
false
value. This prevents Movim to try to decrypt several times a message, knowing that the decryption will fail each time in the end.
Using this database, when a chat is loaded, all the messages are then sent chronologically from the server, passed trough a little bit of code that will lookup the state of all messages and then decrypt the ones that are not decrypted yet, the already decrypted messages are then shown, or an error is displayed for those that cannot be decrypted.
To sum up
In this article I tried to present you what limitations I faced when trying to implement end to end encryption within Movim and what architectural and technical solutions were used to address them.
The current solution seems to fit and bring all the desired features to Movim without too much downsides. The feature can now be considered as done and will be released soon. And as always, lots of small fixes and adjustments will be integrated to polish it afterward.
That's all folks!
edhelas