Privacy - TOR, VPN, DOH, Encryption. What's effective and what does it provide?
22 Dec 2019 - tsp
Last update 31 Mar 2020
21 mins
Note: Since Iâve written that blog post somewhere deep in the night
please donât freak out because of grammar- and orthography mistakes âŚ
First whatâs this post about? Since I often hear that there is much confusion
about the level or type of privacy when I talk with people about onion routing
services like TOR, tunneling services like VPN, DNS encryption via DOH, using
proxyservers, using alternate DNS services, using encryption and signature,
etc. Iâve decided to write a short blog post about that different stuff and
what it provides - and what not (from my own personal point of view and
experience).
Understanding the different threats
First one has to understand the different threats that oneâs exposed to when
using the network.
- Attackers sniffing the network traffic and gaining information from inside the
traffic (like billing information, passwords, etc.). The normal defense against
this type of attack is using strong encryption and strong authentication (i.e.
using safe data transfer to an trusted entity - the major pitfall being here
how one builds up the trust relationship)
- Attackers looking at network traffic and gathering information via metadata.
This is much less understood - but the threat mostly tried to be avoided by
techniques like public VPNs and public proxyservers. When gathering metadata
(i.e. information about who sends traffic to which destination) the third
party tries to gather information about social network structure and also
behaviour of single people as well as groups. Itâs really interesting to see
which parts of a network communicate with whom, which parts are social or
communicational hubs (i.e. have a high rank inside the social network), which
subgroups do exist inside the networks and of course also who behaves untypical.
To decide who behaves untypical itâs of course required to determine
whatâs typical as a ground truth - by gathering as much data as one is capable
of. The same method is also applied by governmental investigators when they
try to understand criminal organizations or try to determine if a given
person or group poses a threat to the state.
- The last type of threat is being afraid of a currently lawful entity (like
the government) to collect much data about the citizens and either exploiting
that information at the current time or at a later time when the government
has replaced by some extremist party to silence and attack their political
opponents. The same type of attack is of course possible in economic space
to locate potantial competitors, gathering as much information about them
and trying to outperform them (especially as a large company to keep competition
as far away as possible - which is normally called bussines intelligence these
days)
As one can see part of the fear is about an attacker being capable of reading
or spoofing the content of communications. The other part of the attacks
is about being threatened by exposing metadata (i.e. social network
structure) and the ability to infer information about entities by having
collected huge amounts of information about similar entities. The defense
against all of that is of course done using different technologies (or not
using some services).
Countermeasures and what they can provide
Note that - in my opinion - laws do not provide any defense against data
collection and manipulation. They might help to mitigate the resulting
effects whenever something goes wrong but one can be sure that anywhere on the
worldwide network under some legislation data gathering is either legal or
people simply are doing it illegal. The only area where laws are somewhat effective
is preventing the buildup of such datasets by the public sector - in any other
case technical solutions are - in my opinion - way more effective.
Basics
Encryption: People not being capable of reading oneâs stuff
Basically encryption simply prevents a third party of sniffing the network
and reading traffic content. As long as one finds a way to exchange keys
for this encrypted stream the attacker can see that traffic is flowing from
a given source to a given destination (i.e. can gather traffic metadata) and
is with most systems also capable of detecting when traffic is produced (i.e.
is capable of gathering chronological metadata) as well as how much traffic
is produced (which also lead to some interesting types of data leaks).
Encryption of course also requires authenticity (i.e. digital signatures)
to proof the entity who receives data is who they claim they are.
There are basically two classes of encryption:
- Symmetric methods where both ends have to know the same secret to encrypt
and decrypt data. This is mostly used to encrypt larger datastreams with
random session keys that have been exchanged via asymmetrical protocols
during session buildup.
- Asymmetric methods (sometimes called public key encryption) use a keypair
per entity. To transmit data to another entity one fetches the publically
available public key and encrypts data with that key. This encrypted
data is only decryptable via the secret private key thatâs not published.
Normally a symmetric session key is encrypted via an asymmetric method
for larger payloads. Of course the problem of initial key exchange and
trust relationship still exists (i.e. is the publically read key really
the key published by the given entity or was it injected by a third party).
This trust problem is normally solved by either a centralized certificate
authority service or an web of trust approach.
Signature: Being sure the other entity is who the claim to be
Encryption is worthless without the ability to know that traffic really
originates from the source itâs claimed to. Signatures solve that problem.
They are similar to asymmetric encryption methods - the message is hashed
and the hash is encrypted using the private key of the sender. Anyone
can then decrypt the signature using the public key from a public directory
and check if the hashes match (at least thatâs the idea - please look
at specific signature and encryption schemes before implementing such stuff).
Signatures are most of the time effective (but cryptosystems have to be crafted
with care) - but to be effective someone has to establish a trust relationship
to a given signature keys. There are currently three major approaches:
- Directly building trust towards a key by comparing fingerprints of the
used signature keys manually (or via a challenge-response mechanism). This
is the most relieable way of building trust but also the most inconvenient
and sometimes impossible (when contacting someone one has never met
in person for example) one.
- Using centralized certification authorities. This is whatâs used with X.509
for example to establish trust with websites when using
https
or
signing and encrypting mail with S/MIME. One has to fully trust all
certificate authorities that one has configured (look into your system
settings and many will be suprised how many companies they in fact trust
to certify identities). A single compromised certificate authortity can
establish keys for nearly all identities (there are mechanisms for example
in DNS that reduce the attack surface but they do depend on DNSSEC to be
established and secure too - which is a special kind of highly hierarchical
certificate authority that starts with the root zone and trust into the
local domain zone operators).
- Using a web of trust approach. In this case one trusts signatures of their
friends with a given weight. As soon as enough people whom they directly trust
trust a given third party one also established a trust relationship towards
them - and the same for the next hops. This is indeed effective since
everyone knows everyone over less than 7 hops in general - but on the
other hand one has to choose whom one trust when they sign others keys and
establish trust relationships with them. Of course any damage thatâs done
is only done in the local social sphere of influence of the attacker that
has certified a malicious entity.
Onion Routing (TOR): Hiding who communicates with whom
The onion router is the currently most effective
tool against metadata
collection that is available. The basic idea is that there is a collection
of community operated nodes (in fact it would be ideal if every user
operatates his node in that way when not hosting an own hidden service)
that pass data from one node to another - and a bunch of nodes that passes
received traffic into the public network. When routing into the clearnet
the basic idea is to encrypt the IP packet that should be routed into
the internet with the public key of an exit node and attach itâs
adress to the packet. Then this packet is encrypted again with the key
of an intermediate node and sent to this intermediate node. The intermediate
node never is capable of accessing the real IP packet inside and only sees
which node the traffic originates from and to which exit node it should be
sent. The exit node only sees the intermediate node as originator and which
target it should send the traffic to. Since the exit node sees public
traffic content the content should be encrypted and authenticated as usual (there
are malicious exit nodes but since the internet itself is inherently untrustable
and insecure this doesnât matter in any way - one has to protect against that
anyways).
To protect against traffic analysis TOR relays normally forward traffic of third
parties and additionally random traffic. This prevents timing and traffic
correlation attacks (which is also the reason why normal clients should
always operate in relay mode). Of course normal clients should not run in
exit node mode - since these nodes are (because TOR is also used by people
doing illegal stuff) exposed to inquiries by law enforcment - one should
never run an exit node from home (in most countries house searches are the
logical step when one traces traffic for major crimes back to them) and only
operate them with enough legal background (i.e. consult a lawyer before doing so).
There is another kind of service provided by the onion router - hidden services.
These work similar but instead of the public internet via an exit node an internal
node is the target. They introduce themself to an relay node and publish an
service descriptor. All nodes connecting to the service connect to the relay node.
This provides protection of metadata from the client and from the service operator.
On the other hand hidden services provide authenticity by establishing the
trust relationship via an fingerprint directly in the URI as well as authenticating
the clients for stealth hidden services.
A more detailed explaination of
TOR for endusers and
and guide on how to run hidden services and why one would want to do that
can be found at the previous linked blog articles.
Of course TOR is only effective as more and more people use it - also for
legitimate traffic and not only for illegal stuff. Since TOR provides so much
more than just anonymization there are many reasons (see above linked articles
for some of them).
What can TOR provide?
- It (at least statistically) effectively protects against metadata collection
- Protect against traffic correlation attacks and timing attacks
- Proof the identity of hidden services
- Proof the identity of users when using stealth hidden services
What it cannot do?
- Protect one against an attacker using massive resources (i.e. running
a large number of relays and exit nodes). This type of attack of course
cannot be done long living and doesnât work targeted. It would be discovered
really fast and only affect a random bunch of users.
- Protect against all side channel attacks. There are creative attacks
that have been proofen to work that use - for example - the heat production
of servers hosting hidden services under heavy load that influenced the
clock skew in nearby clearnet servers.
- Protect the user against stuff like browser fingerprinting, storing
tracking information or voluntarily leaking private data. It also doesnât
protect people from correlations between their user accounts when used
for all services on the same machine (TOR browser implements some measures
against correlating the user on different webpages by building different
circuits based on different SOCKS5 usernames but itâs far safer to use
a different session for different actions that require anonymity)
- In many countries running TOR makes one somewhat suspicious and a target
for some directed scans against ones machines. In other countries TOR
is already tolerated by law enforcement and the most problematic contact
one might have using TOR is the question if oneâs using TOR or if illegal
traffic originated from them.
Public VPNs and public proxies: Tunneling traffic into a different central network
First off - a VPN is nothing more than a virtual private network that allows
to bridge different private networks by the means of a public network. Itâs normally
realized by an encrypted and authenticated tunnel between two routers (in the
source and destination network). The best comparison to a VPN is a virtual network
cable between two locations. Techniques that can be used for VPN start with
simple unencrypted generic routing encapsulation (GRE) tunnels that simply add
an additional IP header to the packets and pass them over IP to the other router
up to encrypted solutions like IPSEC (AH and ESP) and custom VPN protocols
like OpenVPN or my personal favorite (tinc)[https://www.tinc-vpn.org/].
The most important part to remember is: A VPN just transports your traffic.
Most of the time encrypted and authenticated but potentially also not.
If one thinks about the big VPN providers often seen in advertisements
the idea is slightly different. They provide the service to be the destination
network and allow you to set your default route through them - i.e. you hand them
all traffic that you would previously directly hand over to your internet connection
and they route it into the internet instead of your provider or your own border
router. There is exactly one thing that you gain from that: Your entry point into
the inherently untrusted and unrelieable internet is a different one - instead
of your ISP or the network youâre currently running inside you choose that
everything should be passed over the VPN provider. Nodes on the internet then
see this VPN providers network as originator of traffic that reaches them instead
of your ISP.
Why should one do that:
- The target VPN provider lies in a different juristical area so different
laws on evesdropping on traffic apply. This is the most common cause why
people who illegaly share pirated content choose these services. And it
allows one in theory to route traffic outside of a repressive regime (well
only in theory because they normally block well known big providers; one
can of course run an own node in a different country or use one of the
smaller ones)
- Circumventing geoblocking. Of course works as long as the service employing
geoblocking doesnât block the circumvention service. But letâs be clear - not
using services that employ geoblocking would be a better solution (because
geoblocking really doesnât work, doesnât prevent anyone accessing content who
really wants to and just bugs the ordinary user because IP ranges are not
geographically bound in any way)
Why one shouldnât do that:
- It doesnât provide additional protection for your traffic. The internet itself
is untrusted and potentially malicious. It doesnât matter from which source
you route your traffic into the internet. You have to expect the network
to be hostile and protect your traffic in any way. The same goes for the
argument about using a VPN provider in case of using an untrusted
wireless network - since the internet is inherently insecure there is no
gain in tunneling the traffic to some provider that you might trust and
into the internet from their end on. You only gain additional protection
when directly tunneling into your target network (i.e. your home or
your companies network).
- It doesnât really help with privacy. You are simply choosing a different
entity that is capable of sniffing into your traffic and who is capable
of collecting metadata. Some of these providers promise that they donât collect
userdata but they normally donât exist in lawless spaces. And most countries
that host larger VPN providers simply require network providers to allow
governmental access as well as collection of basic metadata for law enforcement.
On the other hand it even might hurt privacy because there are far less
public VPN providers than network operators on the Internet - so they get
a huge share of traffic from different networks - over a single set of
points of presence administered by a single entity.
This is something one currently hears on a large scale since many larger
browser vendors work on DNS over HTTPS support. The claim is that DoH
uses HTTPS an additionally encrypts DNS queries. The signature part of
DoH would also be supplied by techniques like DNSSEC (but in a different way).
What DoH can provide is an encrypted connection to your DoH server and it
is capable of authenticating the DoH server (i.e. you know the response
really comes from your selected DoH server). Note that it doesnât authenticate
that the response is valid - this is something DNSSEC does. Trust is established
either by using standard X.509 certificates (DoH) or via a hierarchical
approach (DNSSEC). They are not a substitute for each other. The major
addition DoH provides is transport encryption between the client and the
DoH server which prevents evesdropping onto DNS queries (i.e. leaking metadata)
by other users on the same network or on any network thatâs passed by traffic
till the DoH server.
What can DoH provide?
- It does transport encryption of DNS queries. This prevents other people on the
same network (or an untrusted wireless network) to intercept DNS queries.
- The DoH server is authenticated. This provides some protection against
against spoofing DNS responses on the local network.
What doesnât DoH provide?
- When using your ISPs DoH server your ISP of course can still see which
names you are going to resolve.
- When using one of the major DoH servers itâs the same as when using one
of the major public VPN servers - you are providing exactly that information
to a large centralized service. And many other people do too. This allows
a single entity to collect much more metadata than any small ISP would be
capable of.
- It doesnât proof that the DNS responses are valid. This is still done
using DNSSEC when doing recursive authenticated queries. If one doesnât operate
a recursive resolver one has to trust the DNS server (as with classical DNS)
who should have done this validation from the root zone down to the
given DNS database entry.
Browserâs private mode tabs: Somewhat trying to keep persistent state away from oneâs browser
Short summary: They donât really help. They just clear your browser history
when closing them and normally they create a separate set of cookie store and
history when queried by webpages or performing requests. This provides some
basic protection against webpages tracking users (but not against advanced
methods like browser fingerprinting - the best and most effective thing one can
do against that is disabling scripting and only storing session cookies - oh and
of course disabling stuff like java and flash plugins).
What they can do:
- Help with data hygiene
- Clean some tracking cookies
What they canât do:
- Anything more than that. They are not a magical privacy tool.
Using Linux distributions like Tails: Keeping persistent state away from the machine
Tails is The Amnesic Incognito Live System. It provides a live Linux
distribution that doesnât store any data outside of a memdisk by default
and can be downloaded for free.
This is similar to the browsers private mode tab. Since no information is stored
by default (if not using an external storage device) the device always looks
clean and doesnât transmit any tracking information from previous sessions.
Of course the hardware doesnât change - but the default settings of the software
are somewhat more secure than with other operating systems. And of course
the routing of all network traffic via the onion router (TOR) provides the
previously mentioned advantages - and also disadvantages. Another advantage of
using tails is that - since everything is by default read only except the
memory disk - potential malware is not kept over more than a single session.
One should of course limit the time oneâs using such a distribution per session.
What it can do for privacy:
- It provides a preconfigured environment that avoids the most common
mistakes
- It normally routes all traffic via the onion router (TOR). This is - if well
used - an effective measure against metadata collection
- It doesnât persistently store any data. This protects against tracking
information and malware being stored for a longer duration than the
current session. Note that the current session is of course still
susceptible to catch some tracking data like cookies or being infected
by malware as every other system.
What it cannot do:
- Protect the user from usage errors - like doing stuff that shouldnât be
correlated in the same session
- Protect against recognition of the hardware (ex.: ethernet adresses when
attaching to a network, etc.)
Encrypting and signing your mail
This is definitely worth the effort. It doesnât protect any metadata and with
the most used schemes also not the subject of your mails but it effectively
protects message payload and provides (using signatures) authenticity. There
are currently two well established methods for protecting mail:
- Using S/MIME and X.509 certificates. This is broken by design
beyond any way of repair but commonly used in busines environments.
- Using OpenPGP and the web of trust.
Encrypting and signing mail provides way more protection than any safe or
claimed to be privacy preserving mail provider can. And it removes the
requirement of any mailprovider to be trustworthy (except metadata collection).
What it can do:
- Protect message content in a save way over any unrelieable network
- Authenticate the identity of the sender of a message if the sender
signs the message
What it cannot do:
- Remove the manual work of building trust (i.e. signing keys, checking fingerprints,
etc. - any system that claims it can do so simply is less trustable).
- Remove the requirement that one has to trust the device that software is
ran on or that keys are stored on (Storing keys on an HSM like the
YubiKey 5 (note: Amazon affilate link, this
pages author profits from qualified purchases) is a really good idea).
This article is tagged: