Dockerize Your Data Science Workflow: A Step-by-Step Guide to Setting Up Jupyter Lab on Your Private Linux Machine
Jupyter notebook is a powerful tool for data scientists and machine learning engineers to develop and share their code and research. However, running a Jupyter server on your local machine can be challenging, especially if you're working with large datasets or complex models that require significant computing power. One solution is to run a Jupyter server in a private Linux server for better performance, increased security, and the ability to collaborate with other team members.
In this tutorial, we'll show you how to create a Dockerized Jupyter server on a private Linux server. From installing Docker to running the Jupyter server and accessing it from your local machine, we'll cover everything. By the end of this tutorial, you'll have a Jupyter Lab instance running in a Docker container on your private Linux server, and you'll be able to access it from any web browser on your local machine. So, if you're a data scientist or machine learning engineer looking to scale your Jupyter server, let's get started!
Prerequisites
Before we dive into the tutorial, let's make sure you have the right tools and knowledge to follow along. Here are the prerequisites for this tutorial:
- A Linux server with a public IP address
- Basic knowledge of the Linux command line
- A web browser (preferably Google Chrome) installed on your local machine
If you don't have a Linux server set up, you can create one using a cloud provider like AWS or DigitalOcean. For this tutorial, we'll assume you have a server up and running.
Step 1: Connect to Your Linux Server
To get started, the first thing you need to do is connect to your private Linux server via SSH. To connect to your server, you will need the server's IP address and the login credentials (username and password). If you don't have this information, contact your server administrator.
Once you have the necessary information, open a terminal or command prompt on your local machine and type the following command, replacing username
username
and server_ip
server_ip
with your actual login credentials and server IP address, respectively:
ssh username@server_ip
ssh username@server_ip
You will be prompted to enter your password. After entering your password, you will be logged into your Linux server.
Step 2: Install Docker
To install Docker on your Linux server, you can follow the official Docker installation guide. However, we'll show you how to install Docker on Ubuntu 20.04 in this tutorial. If you're using a different Linux distribution, you can find the installation instructions for your distribution on the official Docker website.
To install Docker on Ubuntu 20.04, first, update the package index and install the required packages:
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
Next, add the Docker's official GPG key:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
Then, add the Docker repository to the APT sources list:
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Finally, update the package index and install Docker:
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
To verify that Docker is installed correctly, run the following command:
docker --version
docker --version
You should see the docker version number in the output. If you see an error message, make sure you followed the steps correctly.
Step 3: Create a Folder for Your Jupyter Server
Now that you have Docker installed, the next step is to create a folder for your Jupyter server. All the files and folders related to your Jupyter server will be stored in this folder. To create a folder, run the following command:
mkdir ~/jupyter-server
mkdir ~/jupyter-server
Navigate to the newly created folder using the cd
cd
command:
cd ~/jupyter-server
cd ~/jupyter-server
Step 4: Create a docker-compose.yml
docker-compose.yml
File
The next step is to create a docker-compose.yml
docker-compose.yml
file in the jupyter-server
jupyter-server
folder. This file will contain the configuration for your Jupyter server. To create the file, run the following command:
touch docker-compose.yml
touch docker-compose.yml
Open the file using your favorite text editor. For this tutorial, we'll use the nano
nano
editor:
nano docker-compose.yml
nano docker-compose.yml
Add the following configuration to the file (we will explain each line in the next section):
version: "3"
services:
jupyter:
image: jupyter/minimal-notebook
hostname: jupyter
container_name: jupyter
restart: unless-stopped
environment:
- CHOWN_EXTRA=/home/${NOTEBOOK_USER}/work
- CHOWN_EXTRA_OPTS=-R
- NB_UID=1000
- NB_GID=100
- NB_USER=${NOTEBOOK_USER}
- NB_GROUP=users
- JUPYTER_ENABLE_LAB=yes
- CHOWN_HOME=yes
- CHOWN_HOME_OPTS=-R
- JUPYTER_TOKEN=${JUPYTER_TOKEN}
- PASSWORD_HASH=${PASSWORD_HASH}
- NVIDIA_VISIBLE_DEVICES=all
runtime: nvidia
ports:
- "8888:8888"
volumes:
- ${PWD}/work:/home/${NOTEBOOK_USER}/work
- ${PWD}/work/environments:/home/${NOTEBOOK_USER}/work/environments
working_dir: /home/${NOTEBOOK_USER}/work
command:
/bin/bash -c "if ls /home/${NOTEBOOK_USER}/work/environments/*.yml >/dev/null 2>&1; then \
for f in /home/${NOTEBOOK_USER}/work/environments/*.yml; do \
env_name=$$(basename $${f%.yml}); \
if conda env list | grep -q $${env_name}; then \
echo \"Environment $${env_name} already exists. Skipping...\"; \
else \
echo \"Creating environment $${env_name}...\"; \
conda env create --quiet --name $${env_name} --file $${f}; \
fi; \
done; \
else \
echo \"No environment files found. Creating default environment...\"; \
conda install --quiet --yes numpy; \
fi && \
conda clean --all -f -y && \
fix-permissions \"/opt/conda\" && \
fix-permissions \"/home/${NOTEBOOK_USER}\" && \
start-notebook.sh --NotebookApp.token='${JUPYTER_TOKEN}' --NotebookApp.allow_password_change=True --NotebookApp.password='${PASSWORD_HASH}'"
user: root
env_file:
- .env
version: "3"
services:
jupyter:
image: jupyter/minimal-notebook
hostname: jupyter
container_name: jupyter
restart: unless-stopped
environment:
- CHOWN_EXTRA=/home/${NOTEBOOK_USER}/work
- CHOWN_EXTRA_OPTS=-R
- NB_UID=1000
- NB_GID=100
- NB_USER=${NOTEBOOK_USER}
- NB_GROUP=users
- JUPYTER_ENABLE_LAB=yes
- CHOWN_HOME=yes
- CHOWN_HOME_OPTS=-R
- JUPYTER_TOKEN=${JUPYTER_TOKEN}
- PASSWORD_HASH=${PASSWORD_HASH}
- NVIDIA_VISIBLE_DEVICES=all
runtime: nvidia
ports:
- "8888:8888"
volumes:
- ${PWD}/work:/home/${NOTEBOOK_USER}/work
- ${PWD}/work/environments:/home/${NOTEBOOK_USER}/work/environments
working_dir: /home/${NOTEBOOK_USER}/work
command:
/bin/bash -c "if ls /home/${NOTEBOOK_USER}/work/environments/*.yml >/dev/null 2>&1; then \
for f in /home/${NOTEBOOK_USER}/work/environments/*.yml; do \
env_name=$$(basename $${f%.yml}); \
if conda env list | grep -q $${env_name}; then \
echo \"Environment $${env_name} already exists. Skipping...\"; \
else \
echo \"Creating environment $${env_name}...\"; \
conda env create --quiet --name $${env_name} --file $${f}; \
fi; \
done; \
else \
echo \"No environment files found. Creating default environment...\"; \
conda install --quiet --yes numpy; \
fi && \
conda clean --all -f -y && \
fix-permissions \"/opt/conda\" && \
fix-permissions \"/home/${NOTEBOOK_USER}\" && \
start-notebook.sh --NotebookApp.token='${JUPYTER_TOKEN}' --NotebookApp.allow_password_change=True --NotebookApp.password='${PASSWORD_HASH}'"
user: root
env_file:
- .env
Let's go over each line of the configuration file and explain what it does.
version: "3"
version: "3"
: This line specifies the version of thedocker-compose.yml
docker-compose.yml
file. We are using version 3 in this tutorial.services:
services:
This line specifies the services that will be running in the Docker container. In this case, we are only running one service, which is the Jupyter server.jupyter:
jupyter:
This line specifies the name of the service. We are calling itjupyter
jupyter
.image: jupyter/minimal-notebook
image: jupyter/minimal-notebook
: This line specifies the Docker image that will be used to run the Jupyter server. We are using thejupyter/minimal-notebook
jupyter/minimal-notebook
image in this tutorial. You can find more information about this image on the official Docker Hub page.hostname: jupyter
hostname: jupyter
: This line specifies the hostname of the Jupyter server. We are calling itjupyter
jupyter
, but you can change it to whatever you want.container_name: jupyter
container_name: jupyter
: This line specifies the name of the Docker container. We are calling itjupyter
jupyter
.restart: unless-stopped
restart: unless-stopped
: This line specifies the restart policy for the Docker container. If the container is stopped, it will be restarted unless it is explicitly stopped. There are other restart policies that you can use. You can find more information about restart policies on the official Docker documentation.environment:
environment:
This line specifies the environment variables that will be used by the Jupyter server. We will go over each environment variables.CHOWN_EXTRA: /home/${NOTEBOOK_USER}/work
CHOWN_EXTRA: /home/${NOTEBOOK_USER}/work
: This environment variable specifies the folder that will be owned by theNOTEBOOK_USER
NOTEBOOK_USER
user. We are setting it to/home/${NOTEBOOK_USER}/work
/home/${NOTEBOOK_USER}/work
.CHOWN_EXTRA_OPTS: -R
CHOWN_EXTRA_OPTS: -R
: This environment variable specifies the options that will be used when changing the ownership of the folder specified in theCHOWN_EXTRA
CHOWN_EXTRA
environment variable. We are setting it to-R
-R
to recursively change the ownership of all the files and folders in the specified folder.NB_UID: 1000
NB_UID: 1000
: This environment variable specifies the user ID of theNOTEBOOK_USER
NOTEBOOK_USER
user. We are setting it to1000
1000
by default.NB_GID: 100
NB_GID: 100
: This environment variable specifies the group ID of theNOTEBOOK_USER
NOTEBOOK_USER
user. We are setting it to100
100
by default.NB_USER: ${NOTEBOOK_USER}
NB_USER: ${NOTEBOOK_USER}
: This environment variable specifies the username of theNOTEBOOK_USER
NOTEBOOK_USER
user. We are setting it to${NOTEBOOK_USER}
${NOTEBOOK_USER}
, we will explain how this variable is set in the next section.NB_GROUP: users
NB_GROUP: users
: This environment variable specifies the group of theNOTEBOOK_USER
NOTEBOOK_USER
user. We are setting it tousers
users
, as theusers
users
group is the default group for theNOTEBOOK_USER
NOTEBOOK_USER
user.JUPYTER_ENABLE_LAB: yes
JUPYTER_ENABLE_LAB: yes
: This environment variable specifies whether JupyterLab will be enabled. We are setting it toyes
yes
to enable JupyterLab.CHOWN_HOME: yes
CHOWN_HOME: yes
: This environment variable specifies whether the home folder of theNOTEBOOK_USER
NOTEBOOK_USER
user will be owned by theNOTEBOOK_USER
NOTEBOOK_USER
user. We are setting it toyes
yes
to change the ownership of the home folder.CHOWN_HOME_OPTS: -R
CHOWN_HOME_OPTS: -R
: This environment variable specifies the options that will be used when changing the ownership of the home folder of theNOTEBOOK_USER
NOTEBOOK_USER
user. We are setting it to-R
-R
to recursively change the ownership of all the files and folders in the home folder.JUPYTER_TOKEN: ${JUPYTER_TOKEN}
JUPYTER_TOKEN: ${JUPYTER_TOKEN}
: This environment variable specifies the token that will be used to access the Jupyter server. We are setting it to${JUPYTER_TOKEN}
${JUPYTER_TOKEN}
, we will explain how this variable is set in the next section.PASSWORD_HASH: ${PASSWORD_HASH}
PASSWORD_HASH: ${PASSWORD_HASH}
: This environment variable specifies the password hash that will be used to access the Jupyter server. We are setting it to${PASSWORD_HASH}
${PASSWORD_HASH}
, we will explain how this variable is set in the next section.NVIDIA_VISIBLE_DEVICES: all
NVIDIA_VISIBLE_DEVICES: all
: This environment variable specifies the GPUs that will be visible to the Jupyter server. We are setting it toall
all
to make all the GPUs visible to the Jupyter server.runtime: nvidia
runtime: nvidia
: This line specifies the runtime that will be used to run the Docker container. We are using thenvidia
nvidia
runtime in this tutorial. You can find more information about thenvidia
nvidia
runtime on the official Docker documentation.ports:
ports:
This line specifies the ports that will be exposed by the Docker container. We are exposing the port8888
8888
in this tutorial, which is the default port used by the Jupyter server.volumes:
volumes:
This line specifies the volumes that will be mounted in the Docker container../work:/home/${NOTEBOOK_USER}/work
./work:/home/${NOTEBOOK_USER}/work
: This volume specifies the folder that will be mounted in the Docker container. We are mounting the./work
./work
folder in the host machine in the/home/${NOTEBOOK_USER}/work
/home/${NOTEBOOK_USER}/work
folder in the Docker container. Everything that is saved in the/home/${NOTEBOOK_USER}/work
/home/${NOTEBOOK_USER}/work
folder in the Docker container will be saved in the./work
./work
folder in the host machine../work/environments:/home/${NOTEBOOK_USER}/work/environments
./work/environments:/home/${NOTEBOOK_USER}/work/environments
: Similarly to the previous volume, this volume specifies the folder that will be mounted in the Docker container. We are mounting the./work/environments
./work/environments
folder in the host machine in the/home/${NOTEBOOK_USER}/work/environments
/home/${NOTEBOOK_USER}/work/environments
folder in the Docker container. We will use this folder to store the environment files that will be used to save the Conda environments.working_dir: /home/${NOTEBOOK_USER}/work
working_dir: /home/${NOTEBOOK_USER}/work
: This line specifies the working directory of the Docker container. We are setting it to/home/${NOTEBOOK_USER}/work
/home/${NOTEBOOK_USER}/work
.command
command
: This line specifies the command that will be executed when the container starts. It first checks if any environment files exist in the environments directory, creates a Conda environment for each one if it does, and installs a default environment if it does not. It then cleans up the Conda environment and starts the Jupyter server.user: root
user: root
: This line specifies the user that will be used to run the Docker container. We are setting it toroot
root
to run the Docker container as the root user.env_file:
env_file:
This line specifies the environment file that will be used by the Docker container. We are using the.env
.env
file in this tutorial to set the environment variables. We will go over the environment variables in the next section.
Step 5: Set the Environment Variables
The environment variables that we used in the docker-compose.yml
docker-compose.yml
file are set in the .env
.env
file. It is a good practice to set the environment variables in a separate file instead of setting them directly in the docker-compose.yml
docker-compose.yml
file. This way, you can easily change the environment variables without having to change the docker-compose.yml
docker-compose.yml
file. Also your passwords and tokens will not be exposed in the docker-compose.yml
docker-compose.yml
file.
Create a new file called .env
.env
in the same directory as the docker-compose.yml
docker-compose.yml
file. Use the following command to create the .env
.env
file:
touch .env
touch .env
Open the .env
.env
file in your favorite text editor and add the following environment variables:
JUPYTER_TOKEN=your_jupyter_token
PASSWORD_HASH=your_password_hash
NOTEBOOK_USER=your_notebook_user
JUPYTER_TOKEN=your_jupyter_token
PASSWORD_HASH=your_password_hash
NOTEBOOK_USER=your_notebook_user
Replace your_jupyter_token
your_jupyter_token
with a random string of your choice to use as your Jupyter token. This will be used to authenticate yourself to the notebook server.
To create a hashed password, you can use the passwd()
passwd()
function from the notebook.auth
notebook.auth
module in a Python shell or Jupyter notebook. This will output a hashed version of your password that you can copy and paste into your .env
.env
file as the value of the PASSWORD_HASH
PASSWORD_HASH
environment variable.
For example, if you wanted to use the password "mysecretpassword", you would enter the following in the Python shell:
python -c 'from notebook.auth import passwd; print(passwd(passphrase="mysecretpassword", algorithm="sha1"))'
python -c 'from notebook.auth import passwd; print(passwd(passphrase="mysecretpassword", algorithm="sha1"))'
This will output a hashed version of your password like the following:
sha1:143bfff689ac:37498bc69b8314a00dd31f6041e4f88b64dae038
sha1:143bfff689ac:37498bc69b8314a00dd31f6041e4f88b64dae038
Copy and paste the hashed version of your password into the .env
.env
file as the value of the PASSWORD_HASH
PASSWORD_HASH
environment variable.
Replace your_notebook_user
your_notebook_user
with the name of the user that you want to use to access the Jupyter server. This user will be created inside the Docker container and will be used to run the Jupyter server.
Step 6: Build and Run the Docker Container
Now that we have created the docker-compose.yml
docker-compose.yml
file and the .env
.env
file, we can build and run the Docker container. To build and run the Docker container, use the following command:
docker-compose up -d
docker-compose up -d
This command will build the Docker container and run it in the background. You can check the status of the Docker container using the following command:
docker-compose ps
docker-compose ps
This command will output the status of the Docker container. You should see the following output:
Name Command State Ports
----------------------------------------------------------------
jupyter /bin/bash /home/jupyter/ ... Up
Name Command State Ports
----------------------------------------------------------------
jupyter /bin/bash /home/jupyter/ ... Up
Now that the Docker container is running, you can access the Jupyter server by going to http://<your_server_ip>:8888
http://<your_server_ip>:8888
in your browser. You should see the Jupyter login page. Enter the Jupyter token that you set in the .env
.env
file as the password to access the Jupyter server.
Step 7: Activate Conda base Environment and create a new Environment
Now that you have access to the Jupyter server, you can create a new Conda environment and install the packages that you need. Open a new terminal in the Jupyter server and activate the Conda base environment using the following command:
source activate base
source activate base
Now that you are in the Conda base environment, you can create a new Conda environment using the following command:
conda create -n myenv python=3.10
conda create -n myenv python=3.10
This command will create a new Conda environment called myenv
myenv
with Python version 3.10. You can activate the new Conda environment using the following command:
conda activate myenv
conda activate myenv
Now that you are in the new Conda environment, you can install the packages that you need using the conda install
conda install
command. For example, if you wanted to install the numpy
numpy
package, you would use the following command:
conda install numpy
conda install numpy
Step 8: Use the new Environment in a Jupyter Notebook
Now that you have created a new Conda environment and installed the packages that you need, you can use the new Conda environment in a Jupyter notebook. To use the new Conda environment in a Jupyter notebook, you need to install the ipykernel
ipykernel
package in the new Conda environment. You can install the ipykernel
ipykernel
package using the following command:
conda install ipykernel
conda install ipykernel
Now that you have installed the ipykernel
ipykernel
package, you can use the new Conda environment in a Jupyter notebook. To use the new Conda environment in a Jupyter notebook, you need to create a new kernel for the new Conda environment. You can create a new kernel for the new Conda environment using the following command:
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"
This command will create a new kernel for the new Conda environment called Python (myenv)
Python (myenv)
. You can now use the new Conda environment in a Jupyter notebook. To use the new Conda environment in a Jupyter notebook, you need to select the Python (myenv)
Python (myenv)
kernel from the kernel menu in the Jupyter notebook.
Step 9: Save the new Environment to a YAML file
Now that you have created a new Conda environment and installed the packages that you need, you can save the new Conda environment to a YAML file. To save the new Conda environment to a YAML file, you need to export the new Conda environment to a YAML file. You can export the new Conda environment to a YAML file using the following command:
conda env export > environments/myenv.yml
conda env export > environments/myenv.yml
The above command exports the newly created Conda environment to a YAML file named myenv.yml
myenv.yml
within the environments
environments
directory. This step is crucial because if the docker container is stopped or deleted, the exported YAML file can be used to restore all the packages installed in the Conda environment during the next start of the docker container. As the docker-compose.yml
docker-compose.yml
file automatically restores the packages, you only need to ensure that you export and save the Conda environment to a YAML file after any modifications.
Conclusion
That's it! We've covered a lot of ground in this blog post! We started with the basics of Jupyter Notebook, and then explored how to set up a secure Jupyter Notebook server on a remote machine. By the end of this post, you should have a Jupyter Notebook server that is accessible over the internet, and secured with a password and token-based authentication.
Remember, it's important to keep your Jupyter Notebook server secure, especially if you're working with sensitive data. By following the steps outlined in this post, you can set up a secure Jupyter Notebook server and use it to share your work with others.
I hope you found this post helpful! If you have any questions or comments, please feel free to leave them below. Happy coding!
About Mir Sazzat Hossain
Mir Sazzat Hossain is a Research Assistant at the Independent University of Bangladesh's Center for Computation and Data Science (CCDS).
Comments
Do you have a problem, want to share feedback, or discuss further ideas? Feel free to leave a comment here! Please stick to English. This comment thread directly maps to a discussion on GitHub, so you can also comment there if you prefer.
Instead of authenticating the giscus application, you can also comment directly on GitHub.
Related Articles
The Poisson Distribution: Your Key to Predicting the Unforeseeable
The blog post explores the Poisson distribution, a statistical distribution commonly used in various fields, including astronomy, to model random events. It explains the properties of the distribution, its real-world applications, and provides a step-by-step guide on how to visualize the Poisson distribution using Python. The post also discusses the issue of Poisson noise in astronomical observations and presents a practical example of how to calculate and deal with it using Python.
Exploring Gaussian Distribution: Understanding the Math Behind the Bell-Shaped Curve
This blog post delves into the concept of Gaussian Distribution, a popular topic in mathematics and statistics. The post provides a comprehensive explanation of the distribution, including its properties, applications, and examples. Get to know the secrets behind the famous bell-shaped curve and how it is used in various fields. Whether you are a student, researcher, or data analyst, this post has everything you need to know about Gaussian Distribution.