Privacy - TOR, VPN, DOH, Encryption. What's effective and what does it provide?

22 Dec 2019 - tsp
Last update 31 Mar 2020
Reading time 21 mins

Note: Since I’ve written that blog post somewhere deep in the night please don’t freak out because of grammar- and orthography mistakes …

First what’s this post about? Since I often hear that there is much confusion about the level or type of privacy when I talk with people about onion routing services like TOR, tunneling services like VPN, DNS encryption via DOH, using proxyservers, using alternate DNS services, using encryption and signature, etc. I’ve decided to write a short blog post about that different stuff and what it provides - and what not (from my own personal point of view and experience).

Understanding the different threats
Countermeasures and what they can provide

Understanding the different threats

First one has to understand the different threats that one’s exposed to when using the network.

Attackers sniffing the network traffic and gaining information from inside the traffic (like billing information, passwords, etc.). The normal defense against this type of attack is using strong encryption and strong authentication (i.e. using safe data transfer to an trusted entity - the major pitfall being here how one builds up the trust relationship)
Attackers looking at network traffic and gathering information via metadata. This is much less understood - but the threat mostly tried to be avoided by techniques like public VPNs and public proxyservers. When gathering metadata (i.e. information about who sends traffic to which destination) the third party tries to gather information about social network structure and also behaviour of single people as well as groups. It’s really interesting to see which parts of a network communicate with whom, which parts are social or communicational hubs (i.e. have a high rank inside the social network), which subgroups do exist inside the networks and of course also who behaves untypical. To decide who behaves untypical it’s of course required to determine what’s typical as a ground truth - by gathering as much data as one is capable of. The same method is also applied by governmental investigators when they try to understand criminal organizations or try to determine if a given person or group poses a threat to the state.
The last type of threat is being afraid of a currently lawful entity (like the government) to collect much data about the citizens and either exploiting that information at the current time or at a later time when the government has replaced by some extremist party to silence and attack their political opponents. The same type of attack is of course possible in economic space to locate potantial competitors, gathering as much information about them and trying to outperform them (especially as a large company to keep competition as far away as possible - which is normally called bussines intelligence these days)

As one can see part of the fear is about an attacker being capable of reading or spoofing the content of communications. The other part of the attacks is about being threatened by exposing metadata (i.e. social network structure) and the ability to infer information about entities by having collected huge amounts of information about similar entities. The defense against all of that is of course done using different technologies (or not using some services).

Countermeasures and what they can provide

Note that - in my opinion - laws do not provide any defense against data collection and manipulation. They might help to mitigate the resulting effects whenever something goes wrong but one can be sure that anywhere on the worldwide network under some legislation data gathering is either legal or people simply are doing it illegal. The only area where laws are somewhat effective is preventing the buildup of such datasets by the public sector - in any other case technical solutions are - in my opinion - way more effective.

Basics

Encryption: People not being capable of reading one’s stuff

Basically encryption simply prevents a third party of sniffing the network and reading traffic content. As long as one finds a way to exchange keys for this encrypted stream the attacker can see that traffic is flowing from a given source to a given destination (i.e. can gather traffic metadata) and is with most systems also capable of detecting when traffic is produced (i.e. is capable of gathering chronological metadata) as well as how much traffic is produced (which also lead to some interesting types of data leaks).

Encryption of course also requires authenticity (i.e. digital signatures) to proof the entity who receives data is who they claim they are.

There are basically two classes of encryption:

Symmetric methods where both ends have to know the same secret to encrypt and decrypt data. This is mostly used to encrypt larger datastreams with random session keys that have been exchanged via asymmetrical protocols during session buildup.
Asymmetric methods (sometimes called public key encryption) use a keypair per entity. To transmit data to another entity one fetches the publically available public key and encrypts data with that key. This encrypted data is only decryptable via the secret private key that’s not published. Normally a symmetric session key is encrypted via an asymmetric method for larger payloads. Of course the problem of initial key exchange and trust relationship still exists (i.e. is the publically read key really the key published by the given entity or was it injected by a third party). This trust problem is normally solved by either a centralized certificate authority service or an web of trust approach.

Signature: Being sure the other entity is who the claim to be

Encryption is worthless without the ability to know that traffic really originates from the source it’s claimed to. Signatures solve that problem. They are similar to asymmetric encryption methods - the message is hashed and the hash is encrypted using the private key of the sender. Anyone can then decrypt the signature using the public key from a public directory and check if the hashes match (at least that’s the idea - please look at specific signature and encryption schemes before implementing such stuff).

Signatures are most of the time effective (but cryptosystems have to be crafted with care) - but to be effective someone has to establish a trust relationship to a given signature keys. There are currently three major approaches:

Directly building trust towards a key by comparing fingerprints of the used signature keys manually (or via a challenge-response mechanism). This is the most relieable way of building trust but also the most inconvenient and sometimes impossible (when contacting someone one has never met in person for example) one.
Using centralized certification authorities. This is what’s used with X.509 for example to establish trust with websites when using https or signing and encrypting mail with S/MIME. One has to fully trust all certificate authorities that one has configured (look into your system settings and many will be suprised how many companies they in fact trust to certify identities). A single compromised certificate authortity can establish keys for nearly all identities (there are mechanisms for example in DNS that reduce the attack surface but they do depend on DNSSEC to be established and secure too - which is a special kind of highly hierarchical certificate authority that starts with the root zone and trust into the local domain zone operators).
Using a web of trust approach. In this case one trusts signatures of their friends with a given weight. As soon as enough people whom they directly trust trust a given third party one also established a trust relationship towards them - and the same for the next hops. This is indeed effective since everyone knows everyone over less than 7 hops in general - but on the other hand one has to choose whom one trust when they sign others keys and establish trust relationships with them. Of course any damage that’s done is only done in the local social sphere of influence of the attacker that has certified a malicious entity.

Onion Routing (TOR): Hiding who communicates with whom

The onion router is the currently most effective tool against metadata collection that is available. The basic idea is that there is a collection of community operated nodes (in fact it would be ideal if every user operatates his node in that way when not hosting an own hidden service) that pass data from one node to another - and a bunch of nodes that passes received traffic into the public network. When routing into the clearnet the basic idea is to encrypt the IP packet that should be routed into the internet with the public key of an exit node and attach it’s adress to the packet. Then this packet is encrypted again with the key of an intermediate node and sent to this intermediate node. The intermediate node never is capable of accessing the real IP packet inside and only sees which node the traffic originates from and to which exit node it should be sent. The exit node only sees the intermediate node as originator and which target it should send the traffic to. Since the exit node sees public traffic content the content should be encrypted and authenticated as usual (there are malicious exit nodes but since the internet itself is inherently untrustable and insecure this doesn’t matter in any way - one has to protect against that anyways).

To protect against traffic analysis TOR relays normally forward traffic of third parties and additionally random traffic. This prevents timing and traffic correlation attacks (which is also the reason why normal clients should always operate in relay mode). Of course normal clients should not run in exit node mode - since these nodes are (because TOR is also used by people doing illegal stuff) exposed to inquiries by law enforcment - one should never run an exit node from home (in most countries house searches are the logical step when one traces traffic for major crimes back to them) and only operate them with enough legal background (i.e. consult a lawyer before doing so).

There is another kind of service provided by the onion router - hidden services. These work similar but instead of the public internet via an exit node an internal node is the target. They introduce themself to an relay node and publish an service descriptor. All nodes connecting to the service connect to the relay node. This provides protection of metadata from the client and from the service operator. On the other hand hidden services provide authenticity by establishing the trust relationship via an fingerprint directly in the URI as well as authenticating the clients for stealth hidden services.

A more detailed explaination of TOR for endusers and and guide on how to run hidden services and why one would want to do that can be found at the previous linked blog articles.

Of course TOR is only effective as more and more people use it - also for legitimate traffic and not only for illegal stuff. Since TOR provides so much more than just anonymization there are many reasons (see above linked articles for some of them).

What can TOR provide?

It (at least statistically) effectively protects against metadata collection
Protect against traffic correlation attacks and timing attacks
Proof the identity of hidden services
Proof the identity of users when using stealth hidden services

What it cannot do?

Protect one against an attacker using massive resources (i.e. running a large number of relays and exit nodes). This type of attack of course cannot be done long living and doesn’t work targeted. It would be discovered really fast and only affect a random bunch of users.
Protect against all side channel attacks. There are creative attacks that have been proofen to work that use - for example - the heat production of servers hosting hidden services under heavy load that influenced the clock skew in nearby clearnet servers.
Protect the user against stuff like browser fingerprinting, storing tracking information or voluntarily leaking private data. It also doesn’t protect people from correlations between their user accounts when used for all services on the same machine (TOR browser implements some measures against correlating the user on different webpages by building different circuits based on different SOCKS5 usernames but it’s far safer to use a different session for different actions that require anonymity)
In many countries running TOR makes one somewhat suspicious and a target for some directed scans against ones machines. In other countries TOR is already tolerated by law enforcement and the most problematic contact one might have using TOR is the question if one’s using TOR or if illegal traffic originated from them.

Public VPNs and public proxies: Tunneling traffic into a different central network

First off - a VPN is nothing more than a virtual private network that allows to bridge different private networks by the means of a public network. It’s normally realized by an encrypted and authenticated tunnel between two routers (in the source and destination network). The best comparison to a VPN is a virtual network cable between two locations. Techniques that can be used for VPN start with simple unencrypted generic routing encapsulation (GRE) tunnels that simply add an additional IP header to the packets and pass them over IP to the other router up to encrypted solutions like IPSEC (AH and ESP) and custom VPN protocols like OpenVPN or my personal favorite (tinc)[https://www.tinc-vpn.org/]. The most important part to remember is: A VPN just transports your traffic. Most of the time encrypted and authenticated but potentially also not.

If one thinks about the big VPN providers often seen in advertisements the idea is slightly different. They provide the service to be the destination network and allow you to set your default route through them - i.e. you hand them all traffic that you would previously directly hand over to your internet connection and they route it into the internet instead of your provider or your own border router. There is exactly one thing that you gain from that: Your entry point into the inherently untrusted and unrelieable internet is a different one - instead of your ISP or the network you’re currently running inside you choose that everything should be passed over the VPN provider. Nodes on the internet then see this VPN providers network as originator of traffic that reaches them instead of your ISP.

Why should one do that:

The target VPN provider lies in a different juristical area so different laws on evesdropping on traffic apply. This is the most common cause why people who illegaly share pirated content choose these services. And it allows one in theory to route traffic outside of a repressive regime (well only in theory because they normally block well known big providers; one can of course run an own node in a different country or use one of the smaller ones)
Circumventing geoblocking. Of course works as long as the service employing geoblocking doesn’t block the circumvention service. But let’s be clear - not using services that employ geoblocking would be a better solution (because geoblocking really doesn’t work, doesn’t prevent anyone accessing content who really wants to and just bugs the ordinary user because IP ranges are not geographically bound in any way)

Why one shouldn’t do that:

It doesn’t provide additional protection for your traffic. The internet itself is untrusted and potentially malicious. It doesn’t matter from which source you route your traffic into the internet. You have to expect the network to be hostile and protect your traffic in any way. The same goes for the argument about using a VPN provider in case of using an untrusted wireless network - since the internet is inherently insecure there is no gain in tunneling the traffic to some provider that you might trust and into the internet from their end on. You only gain additional protection when directly tunneling into your target network (i.e. your home or your companies network).
It doesn’t really help with privacy. You are simply choosing a different entity that is capable of sniffing into your traffic and who is capable of collecting metadata. Some of these providers promise that they don’t collect userdata but they normally don’t exist in lawless spaces. And most countries that host larger VPN providers simply require network providers to allow governmental access as well as collection of basic metadata for law enforcement. On the other hand it even might hurt privacy because there are far less public VPN providers than network operators on the Internet - so they get a huge share of traffic from different networks - over a single set of points of presence administered by a single entity.

DOH - Encrypting DNS queries: Trying to hide some metadata from sniffing entities

This is something one currently hears on a large scale since many larger browser vendors work on DNS over HTTPS support. The claim is that DoH uses HTTPS an additionally encrypts DNS queries. The signature part of DoH would also be supplied by techniques like DNSSEC (but in a different way). What DoH can provide is an encrypted connection to your DoH server and it is capable of authenticating the DoH server (i.e. you know the response really comes from your selected DoH server). Note that it doesn’t authenticate that the response is valid - this is something DNSSEC does. Trust is established either by using standard X.509 certificates (DoH) or via a hierarchical approach (DNSSEC). They are not a substitute for each other. The major addition DoH provides is transport encryption between the client and the DoH server which prevents evesdropping onto DNS queries (i.e. leaking metadata) by other users on the same network or on any network that’s passed by traffic till the DoH server.

What can DoH provide?

It does transport encryption of DNS queries. This prevents other people on the same network (or an untrusted wireless network) to intercept DNS queries.
The DoH server is authenticated. This provides some protection against against spoofing DNS responses on the local network.

What doesn’t DoH provide?

When using your ISPs DoH server your ISP of course can still see which names you are going to resolve.
When using one of the major DoH servers it’s the same as when using one of the major public VPN servers - you are providing exactly that information to a large centralized service. And many other people do too. This allows a single entity to collect much more metadata than any small ISP would be capable of.
It doesn’t proof that the DNS responses are valid. This is still done using DNSSEC when doing recursive authenticated queries. If one doesn’t operate a recursive resolver one has to trust the DNS server (as with classical DNS) who should have done this validation from the root zone down to the given DNS database entry.

Browser’s private mode tabs: Somewhat trying to keep persistent state away from one’s browser

Short summary: They don’t really help. They just clear your browser history when closing them and normally they create a separate set of cookie store and history when queried by webpages or performing requests. This provides some basic protection against webpages tracking users (but not against advanced methods like browser fingerprinting - the best and most effective thing one can do against that is disabling scripting and only storing session cookies - oh and of course disabling stuff like java and flash plugins).

What they can do:

Help with data hygiene
Clean some tracking cookies

What they can’t do:

Anything more than that. They are not a magical privacy tool.

Using Linux distributions like Tails: Keeping persistent state away from the machine

Tails is The Amnesic Incognito Live System. It provides a live Linux distribution that doesn’t store any data outside of a memdisk by default and can be downloaded for free.

This is similar to the browsers private mode tab. Since no information is stored by default (if not using an external storage device) the device always looks clean and doesn’t transmit any tracking information from previous sessions. Of course the hardware doesn’t change - but the default settings of the software are somewhat more secure than with other operating systems. And of course the routing of all network traffic via the onion router (TOR) provides the previously mentioned advantages - and also disadvantages. Another advantage of using tails is that - since everything is by default read only except the memory disk - potential malware is not kept over more than a single session. One should of course limit the time one’s using such a distribution per session.

What it can do for privacy:

It provides a preconfigured environment that avoids the most common mistakes
It normally routes all traffic via the onion router (TOR). This is - if well used - an effective measure against metadata collection
It doesn’t persistently store any data. This protects against tracking information and malware being stored for a longer duration than the current session. Note that the current session is of course still susceptible to catch some tracking data like cookies or being infected by malware as every other system.

What it cannot do:

Protect the user from usage errors - like doing stuff that shouldn’t be correlated in the same session
Protect against recognition of the hardware (ex.: ethernet adresses when attaching to a network, etc.)

Encrypting and signing your mail

This is definitely worth the effort. It doesn’t protect any metadata and with the most used schemes also not the subject of your mails but it effectively protects message payload and provides (using signatures) authenticity. There are currently two well established methods for protecting mail:

Using S/MIME and X.509 certificates. This is broken by design beyond any way of repair but commonly used in busines environments.
Using OpenPGP and the web of trust.

Encrypting and signing mail provides way more protection than any safe or claimed to be privacy preserving mail provider can. And it removes the requirement of any mailprovider to be trustworthy (except metadata collection).

What it can do:

Protect message content in a save way over any unrelieable network
Authenticate the identity of the sender of a message if the sender signs the message

What it cannot do:

Remove the manual work of building trust (i.e. signing keys, checking fingerprints, etc. - any system that claims it can do so simply is less trustable).
Remove the requirement that one has to trust the device that software is ran on or that keys are stored on (Storing keys on an HSM like the YubiKey 5 (note: Amazon affilate link, this pages author profits from qualified purchases) is a really good idea).