What is a chat and instant messaging system
So before digging into the matter of Jabber/XMPP and what it exactly is letās first
recall what the problem is that one usually wants to solve: Chat and instant messaging.
Usually these two are mixed - and to be honest chat is an subset of instant messaging.
Chat is basically a service that allows instantaneous exchange of text messages
between participants as long as theyāre online. Chat tools are usually rather lightweight
like the old Unix talk
utility that was a direct successor of systems
like CTSS
or PLATO
that has been developed in the early 1980s. It allowed
to send a text message directly into the terminal of another networked machines
user. In case the user currently was logged in the user was directly able
to read the message and use talk
to send a text message back. The Netiquette guidelines
specific to talk are still valid as of today - also for instant messaging applications.
Chat applications usually do come in two flavors - offering chatting between
two online users or into a group - for which a custom protocol called IRC
provides the best solution (way better than using any other solution such as web interfaces).
As soon as internet connections became more and more available companies like AOL
and Mirabilis took the concept and built applications like the AOL messenger and
the well known ICQ in the early 1990s. These application provided instant messaging
services: As long as the presence status of an user was any of the online states
they received messages instantaneous, in case the machines have been offline a
centralized server cached the messages and delivered them later on. In contrast
to traditional chat applications instant messaging thus also provided presence
status information as well as server side storing of messages in case the recipient
is offline.
Centralized, federated and distributed
The main difference between todays instant messaging services is their service
topology. There are three main messaging topologies:
- Centralized
- Federated
- Distributed
The services mostly in use by common people are - unfortunately - centralized
services such as WhatsApp, Telegram, Facebookās Messenger, ICQ, Signal, etc.
These services have a single provider that runs the server infrastructure and in
most circumstances theyāve even a single provider for the client applications
and web interfaces. A notable example is the ICQ network for which there have
been numerous alternative clients that usually supported many different messaging
protocols so one only had to use a single client for multiple networks.
The main drawbacks of a centralized service are:
- Depending on a single company or organization. As soon as the company goes insolvent
or is simply not seeing any gain in running the service they might shut down
the messaging service (which is was happened to many previously well known
services as AOL Instant Messaging (AIM) or Yahoo messenger.
- A central censorship and metadata collection point. Such services are prone
for governmental control (for the good or mostly for the bad) since theyāre
hosted on a well defined computer network by a well defined provider that one
can raise against legal claims - or that one can easily block
- Dependence on the good will of the developers of that company to develop into
a direction that is wanted by the users. For example they might decide to
build in surveillance functions or advertising into their clients and messengers
or start to sell specific data about traffic patterns to advertising companies - something
thatās done by many popular messaging applications
Advantages by a centralized approach:
- One doesnāt have to administer anything and can simply buy a solution.
- Some people claim itās better to have a centralized service since oneās more agile
while performing updates or fixing systematic security problems. This is of course
partially true but then on the other side oneās usually bound to a specific implementation
which makes attacks way more desirable since one can compromise the whole network
as well as allows an malicious developer to circumvent protection of a whole
network at once.
In a federated approach - that works exactly as E-Mail does - anyone can operate
one owns server. Servers are exchanging information whenever necessary and users
are only communicating with their own servers. If anyone wanting to talk to another
user on a different server theyāre transmitting their message to their own server
that then forwards the message to the destination. This is the approach thatās also
taken by Jabber/XMPP.
The main advantage is:
- No central control
- Anyone can run a service
- There is a well established consensus on the minimum features that all services
should support thatās usually supported in a stable way (often deemed to be
slow or being an old protocol but in fact it provides stability as shown in
case of IRC or XMPP as well as other stable internet protocols - for example E-Mailās
SMTP)
- Less prone to large scale outages. If a single node fails all others continue
to operate. This also makes censorship and governmental control way harder since
there is no single entity that one can attack by law or force.
The main disadvantage:
- Of course someone has to operate the different nodes - usually this is done
without any commercial gain and the administrators are many times not as experienced
as when working for a large company. With this of course one gets higher single
instance failure rate (by overall lower network failure rate) and should choose
oneās nodes carefully with someone who wants to operate the node as a long term
project and not only over a few weeks or years.
- Development of new features will happen slower which goes hand in hand with
stability of the networks.
- Itās of course harder to fix systematic bugs in the protocol - E-mail for example
is still fully compatible to systems from the early 1990āth. All systems are downwards
compatible.
The in theory best approach is a totally decentralized one. In this case there
wouldnāt even be any servers that are ran by anyone. Every client would join a
peer to peer overlay network (such as for example Pastry or
Kadmelia - networks that are also used by peer-to-peer file sharing solutions)
and forward as well as store messages for any other node in a statistical fashion.
As of today there is no established messaging network working on this principles
since itās really hard to develop a distributed fault tolerant and manipulation
resilient system. The main gain would be that there would be no need for anyone
to operate an server (but anyone would share their own Internet connection to
keep the system up - as long as a proportion of users stays online and reachable
by the outside world all the time the network would continue to operate) so no one
could be forces to pull the service down or perform some kind of manipulation.
Combined with metadata anonymization services such as TOR
such an network would provide an nearly uncontrollable stable and resilient messaging
network. The main disadvantage is of course that such an network requires enough
nodes that are externally reachable - in a world that is built more and more (instead
of less and less) around network address translation that poses somewhat of a
problem.
Jabber/XMPP
Jabber/XMPP is one of the older instant messaging protocols built around the
federated approach. It dates back to the early 1990s - but is nonetheless a modern
messaging protocol. Itās built around the concept of XML data streams so all messages
are human readable. It has been deployed also by a myriad of different messaging
services (even WhatsApp seems to be built around a variant using some proprietary
stream compression) and many services have been - until lately - capable of
federating using XMPP such as Googleās Talk/Hangouts network and even Facebookās
Messenger - unfortunately theyāve converted their networks to closed networks lately.
Since XMPP is federated itās built around servers. Users are addresses by addresses
that look somewhat like E-Mail addresses (i.e. user@domain
). The domain part
identifies the server that the userās account is located on - for example userA@example.com
would be located on a server found via a DNS lookup at example.com
. XMPP
supports a variety of solutions of locating the real hostname and IP address of
the specific server - DNS SRV
records being the most common ones which is
totally transparent for the user.
Besides simple text message exchange XMPP offers:
- Presence notification. Users can publish their online status (online, away,
extended away, offline, etc.) to signal their ability to chat. This is something
thatās usually expected from a messaging service but strictly speaking itās
optional. Not showing online status might be interesting in case one wants to
use a service like SMS since showing online status also allows people to gather
information about oneās behavior, etc.
- Offline messages are queued by servers. Strictly speaking this is optional for
the servers to support - but up to now all servers in operation are up to my
knowledge capable of doing so.
- Group messages (chatroom) called multi-user chat
- Server side message histories for non end to end encrypted channels (optionally
supported by clients and servers)
- Multi-client support that works really well in case one doesnāt use end to end
encryption. This allows automatic failover or receiving messages on multiple
devices. One can use priorities for example to get all messages on the mobile
phone except when oneās logged in on the desktop computer.
- Storing contact lists on the server side. Technically speaking optional but
all servers up to my knowledge support this.
- Of course transport encryption for all data using TLS.
- On top of XMPP usually end to end encryption using OTR, OpenPGP or OMEMO (more on that later)
- Voice and video calls via Jingle. This is not supported by many clients and
usually uses (Z)RTP for voice and video streams. The main clients supporting VV
are Pidgin on any desktop platform other than MS Windows
and Jitsi.
- File transfers (also using Jingle)
- Many extensions (not supported by all clients) such as:
- Publish and subscribe interfaces for IoT or notifications
- Querying data forms by extension services
- Location services
- Avatar pictures
- Publishing oneās mood
- Publishing oneās activity
- Tunneling XMPP over HTTP (BOSH)
- Delayed delivery
- ā¦
These extensions are of course optional - all useful clients support at least presence
notification, offline messages, server side roasters (i.e. storing contact lists
on the server), multi user chats, transport encryption and usually file transfers
if not running on mobile clients.
The message payload is usually only text/plain
without formatting - some
clients do support formatting by simply transmitting HTML snippets inside
the messages. This usually works pretty well but as soon as one uses cryptography
layers such as OTR or OMEMO one should refrain from using formatted text since
then there is much heuristics involved of detecting if a message is formatted or
not.
Cryptography layers
XMPP itself only offers the use of transport encryption. Transport encryption means
that messages are encrypted on their route between the client and the server - but
the server would have full access to messages - in contrast to end to end encryption
in which one also doesnāt have to trust the server. Luckily there is a bunch
of encryption mechanisms available on top of XMPP - but usually they also have
some minor drawbacks like lack of multi-client support (i.e. not being able
to run multiple clients at the same time on the same account - for example
on the desktop and on a mobile device).
Off the record messaging (OTR)
This is the most common used cryptography layer on top of XMPP. It is - of course -
totally independent of the used instant messaging system and could also be used
over any other network.
It basically offers:
- Confidentiality by encrypting all messages with a randomized session key thatās
rotated dynamically as with any block stream cipher. This rotation will also be
used for another property. The session keys are generated in an Diffie-Hellman
approach.
- Authenticity by exchanging signatures of random challenges after the encrypted
tunnel has been set up. Of course this requires one to know the other sides
public key so one is required to have done or do some kind of authentication
on the first key exchange. During a communication one can be sure that the other
side is in possession of the given key pair thatās associated with the identity
they claim they are. OTR usually supplies some methods out of the box:
- Doing manual fingerprint verification. This is of course the most reliable
authentication method for keys but requires on to exchange the whole
cryptographic checksum of the public key of the communication partner
either in person or over an already secure side channel.
- Using a common used secret - or a one time password both sides know. This
doesnāt really solve the key exchange problem but it might be used
in case people meet and decide on an password thatās easier for them to remember.
- A question-answer challenge response system. In this case both parties ask
a question to the other side that has an expected answer that only the
other party can know. This is not as easy as it sounds since the answer
has to be really only known to the other party, it has to be simple and
you have to type them totally equally.
- Deniability (or plausible deniability) is guaranteed by publishing a last
iteration of the block ciphers rotated session key with a session teardown
message. This is also one of the causes why session management has to be done
manually. By publishing this message any party can simply argue that any third
party would have been capable of writing another message into the stream at any
point in time without anyone being able to proof who had written that message.
By default the clients also donāt store messages locally in a logfile so in theory
no one can really proof who has written which message
- Forward secrecy which means that in case the keys that are used during authentication
are published after all sessions have ended no one can read the previous chat
sessions. This is achieved by using a different keypair for encryption and authentication
and deriving the encryption keys via a Diffie-Hellman approach in a random way.
They are not stored anywhere.
As already mentioned session management for OTR has to be done manually. Since
OTR requires both sides of a private messaging session to participate in challenge
response mechanisms this only works while both sides are actively online
or are at least storing state. This is also the largest problem with OTR when
used in day to day settings. People usually forget to run the session end, exit
their messenger clients or shut down their machines and any further message sent
then will be sent to the void since no one knows the encryption keys any more.
Usually clients also silently drop messages without correct authentication since
notifying would open up the path for some denial of service attacks. So one really
has to follow a strict procedure:
- If one wants to start a private conversation (thatās not multi client capable)
the other side has to be in an online state. Then one can initiate session
buildup.
- On really needs to verify the keys of the other side via one of the supported
mechanisms (comparing fingerprints, using a challenge response system, etc.)
to make sure there is no man in the middle attack happening. This is not really
automatable in a trustable fashion and in case this feature is missing itās usually
a sign that one cannot trust a given solution.
- Whenever anyone wants to go offline, send oneās computer to hibernation or
leaves oneās workplace one really has to close the session manually. There is
not possible automatic solution for that that wouldnāt open up many more
problems unfortunately.
There is one major drawback: OTR does not support group chats.
OpenPGP
The OpenPGP encryption system is well known
from E-Mail - and is in fact currently the only useful and secure cryptography system
for E-Mail thatās in place and used since S/MIME had been totally cracked. On the
other hand OpenPGP has not been designed for chat systems. There is an XMPP
extension protocol that allows one to use OpenPGP over XMPP - but up to my knowledge
there is no client out there that really implements using OpenPGP.
OMEMO
OMEMO has been designed as a successor of OTR. Itās based on the same double
ratchet system thatās also used in Signal and some other messengers. Itās pretty
well designed though there have been a number of possible cryptoattacks on the
protocol. Itās not as actively developed as OTR and not as widely used with XMPP
though it would offer some more advanced features:
- It would allow multi client support. On the other hand the idea of OTR is that
one encrypts the session with a person on the other hand - and having the same
person on multiple endpoints is usually not possible. This opens up an
attack vector that has already been used by state level actors who simply
added their own clients to an given identity and received all messages
encrypted to their own keys.
- It would allow offline messages by pre-publishing a number of half Diffie-Hellman
key exchanges. This can work in particular well despite being prone to denial
of service attacks. But if one doesnāt rely on the working of such a mechanism
it allows one to send encrypted messages towards an offline client without having
a cached previous session running which is really a nice feature, especially
for mobile clients.
Unfortunately the support in some Clients is rather buggy or even more cumbersome
than OTR so usually itās currently not a simple choice to make.
Clients
Desktop
Pidgin
Pidgin is the messaging client that Iām personally using most.
It offers multi protocol support - though Iām only using XMPP as of today. Itās
robust, offers voice and video on all supported platforms except Microsoft Windows,
it runs on a huge number of platforms including Windows, Linux, BSDs, MacOS, etc.
OTR is implemented via an external plugin that has
to be installed separately.
On FreeBSD Pidgin is available in the net-im/pidgin
package, the OTR
plugin can be found in security/pidgin-otr
.
Jitsi
This client has been developed to be an alternative
to existing voice and video solutions using XMPP and the Jingle extension. It was
one of the first ones supporting encrypted video chatting using ZRTP as the carrier
protocol for video and voice streams. Itās supported on many desktop and mobile
platforms. Unfortunately the development focused more on the WebRTC based conferencing
solution - Jitsi Meet - thatās a nice alternative for
group video conferences and the client is somewhat unstable.
Profanity
The profanity client will be not of interest
for most people. Itās a command line client useful on systems that do not
use a graphical user interface. It works rock solid but doesnāt have support
for off the record (OTR) messaging.
Android
Xabber
Again Xabber
is the client that Iām using. It has builtin support for OTR, multi account
support and just works in a stable fashion.
Conversations
Another Android client is Conversations. This
client supports voice and video calls using Jingle but doesnāt support OTR
any more.
This article is tagged: