27 Jan 2023 - tsp
Last update 27 Jan 2023
TL;DR In theory one can just put any JupyterLab installation behind any HTTP authentication based authenticating reverse proxy. But since Jupyter passes the internal token to all XML HTTP Requests it leaks the given token - and replaces and HTTP basic or digest authentication information during XHR requests. This means one either has to match the shared persistent token (that’s also leaked to all users - they could use this token to access all content on the Jupyter notebook) - so this might be no solution at all depending on the environment those notebooks are used.
So since I had to setup a JupyterLab instance at work and we
required a proxied setup - that means we wanted to run the Jupyter instance behind
a reverse proxy that would handle SSL termination, validation of requests, load balancing
and authentication - and since it turned out to be not as straight forward as one would
like to I decided to write this short summary about the usage of
haproxy in front
JupyterLab. The work relation is also the reason why in this blog article
all actions will be described for Manjaro Linux as well as my personal all time favorite
Unixoid operating system - FreeBSD.
Note that there currently is a problem with the way how JupyterLab handles authentication with their XML HTTP requests so this solution leaks the authentication token that is constant over all time and shared by all users to authenticated users. This token in exchange can be used to perform arbitrary requests on the JupyterLab instance - one can see that by design JupyterLab is not a multiuser solution. This means that it looks like users get their own custom username and password combination but they are still able to fetch a universal authentication token and perform actions using this token even when one has removed their user accounts. This has to be fixed in JupyterLab though, there is no way to do this on the proxy (and as far as the bugtracker history goes I don’t think there is a huge intent to fix this).
The first step is the installation of the proxy server. In this case
has been used. This is an open reverse proxy used for load balancing for HTTP over TCP
On FreeBSD the simplest way to install
haproxy is the package manager
$ sudo pkg install haproxy
One could also install using ports:
# cd /usr/ports/net/haproxy # make install clean
haproxy system wide as well as the
rc.init startup script
/usr/local/etc/rc.d/haproxy. To launch
haproxy on boot one can add
/etc/rc.conf. Starting and stopping works as usual for a FreeBSD application:
The simplest way to install
haproxy on Manjaro is via the
sudo pamac install haproxy
haproxy system wide as well as the
systemd startup files
haproxy on boot. The configuration file is created
haproxy on boot one has to enable it using
sudo systemctl enable haproxy
Starting, restarting and stopping can be done using
systemctl start haproxy
systemctl stop haproxy
systemctl restart haproxy
Since one usually wants to expose JupyterLab to the public using SSL one has to either generate a self signed certificate as shown below or use some kind of certificate deployment mechanism (I personally use acme.sh with DNS-01 method and some custom distribution mechanism from the certificate bot machine).
To generate a self signed certificate for internal use or testing one can use OpenSSL:
openssl genrsa -out selfsigned.key 1024 openssl req -new -key selfsigned.key -out selfsigned.csr openssl x509 -req -days 365 -in selfsigned.csr -signkey selfsigned.key -out selfsigned.crt cat selfsigned.crt selfsigned.key > selfsigned.pem
The PEM file including key and certificate is best stored in the same location as
the configuration file (
/usr/local/etc/haproxy/ssl.pem for FreeBSD
/etc/haproxy/ssl.pem for Manjaro Linux).
The next step is frontend, backend and user configuration. This is done in the
haproxy configuration file (FreeBSD:
globalsection looks different on Linux and FreeBSD since FreeBSD sets users,
chroot and logging as well as pidfile and daemon operation in it’s
while Manjaro does not. For FreeBSD the following global section is sufficient:
global daemon maxconn 20000
For Manjaro the following could be used:
global maxconn 20000 log 127.0.0.1 local0 user haproxy chroot /usr/share/haproxy pidfile /run/haproxy.pid daemon
The next section is the user configuration. When one stores the user database directly
inside the configuration file one will use hashed passwords. Those are generated by the
command in the shell (this can be installed on Manjaro using the
whois package for example).
userlist examplerealm user exampleuser password $y$j....
The user entry always starts with
user followed by a username, the string
that an hashed password is going to follow. One can add one user a line.
Now the frontend can be configured. A frontend is the component of
haproxy that accepts incoming
JupyterLab uses token authentication for it’s XML RPC requests one has
haproxy from stripping or failing requests with a given token. This is simple since
JupyterLab also only uses a single static token all the time - one can directly match the correct
token and prevent authentication in this case. Else one performs authentication on all unauthenticated
requests, adds an
forwardfor header as any good proxy should do and might want to run
an http log:
frontend wwwsport bind :80 bind :443 ssl crt /usr/local/etc/haproxy/ssl.pem mode http option httplog option dontlognull option forwardfor except 127.0.0.0/8 acl correcttoken req.hdr(Authorization) -i -m str "token XXXXXXX" acl jupyauthok http_auth(examplerealm) http-request auth realm SampleRealm if !correcttoken !jupyauthok maxconn 3000 timeout client 30s acl url_jupynotebook path_beg -i /examplebook use_backend examplejupyterbook if url_jupynotebook default_backend defaultwww
Now the only missing part for
haproxy is the backend configuration:
backend examplejupyterbook mode http balance roundrobin option forwardfor option http-server-close option redispatch timeout connect 10s timeout server 300s http-response del-header Authorization http-request set-header Authorization "token XXXXX" server examplejupyterbook jupyter.example.com:8888 check
Note that the token supplied in both
Àuthorization` header has also to be specified in the
JupyterLab configuration later on.
In my case I also configured a default static site serving default backend for an index
backend defaultwww mode http balancce roundrobin timeout connect 5s timeout server 5s server staticwwwserver www.example.com:80 check
After finishing up the configuration one can simply reload the configuration or start
In best case create a new Unix user that will later on run JupyterLab. Then install as
pip. In the following example it’s assumed this user is called
$ su myjupyteruser $ cd ~ $ pip install jupyterlab
Then one can generate a new configuration file when one performs the installation manually:
$ jupyter-lab --generate-config
The configuration file will be stored in
Some minor modifications will be required:
c.NotebookApp.token="XXX"will be used to provide the same shared authentication token as has been specified above in the
c.NotebookApp.password="..."might be set to any password in addition when also accessing the notebook server via it’s port directly instead of via the proxy.
c.ServerApp.base_url="/examplebook"allows one to set the base path relative to the URL so one can share the same hostname and domain with other applications or run multiple instances.
c.NotebookApp.allow_origin="*"can be used to set the CORS policy.
*is pretty unsafe, usually one should list all allowed hosts.#
c.ServerApp.port = 8888might be set to ensure the JupyterLab is always launched at the same port. This is especially important when one runs multiple instances.
This basically was all of the required configuration. One can now start Jupyter and try out the new configuration. This could be done by simply launching Jupyter from the command line as a quick test:
/usr/bin/jupyter-lab --ip="192.0.2.1" --no-browser --notebook-dir=/home/myjupyteruser/notebooks --collaborative
The best way to launch services is a
init.d script. This could be put
/usr/local/etc/rc.d/jupyterlab for example:
The best way to launch services is a
systemd init script. This could be put
/etc/systemd/system/jupyter.service for example:
[Unit] Description=Jupyter Lab [Service] Type=simple PIDFile=/run/jupyter.pid ExecStart=/bin/bash -c "/usr/bin/jupyter-lab --ip="192.0.2.1" --no-browser --notebook-dir=/home/myjupyteruser/notebooks --collaborative User=myjupyteruser Group=myjupyteruser WorkingDirectory=/home/myjupyteruser/notebooks Restart=always RestartSec=10
Now one can start and enable the service:
$ sudo systemctl enable jupyter $ sudo systemctl start jupyter
One cannot emphasize this enough - the workaround presented in this blog article leaks the authentication token used between JupyterLab and the proxy. Using this token anyone can perform any action - even through the proxy. This is due to an design problem in JupyterLab that simply does not assume multiuser operation and there is no simple non stateful fix on the proxy side for this problem. So even when you remove a user or change a password anyone who knows the token still can access the JupyterLab and perform arbitrary actions - and thus also perform arbitrary actions with the Unix user account that JupyterLab is running under.
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (email@example.com)
This webpage is also available via TOR at http://coihcmhmb6cg6bvtelykwlte45yqhxkl6ffdoco5kc3a4qn3uno53oqd.onion/