Python Virtual Environments On Hopper
There are different approaches in the Python ecosystem for creating virtual environments (VE): pyenv
, venv
, virtualenv
, virtualenvwrapper
, and pipenv
. Among these, the methods venv
and virtualenv
are quite similar, and either one can be chosen. The key distinction between these two lies in how they handle Python executables within the virtual environment folder:
venv
creates a virtual environment without copying Python executables.virtualenv
creates a virtual environment by copying Python executables.
Before utilizing Python Virtual Environments on Hopper, please take note of the following:
- Python environments need to be established separately on Hopper. A Python Virtual environment built on Hopper should not be expected to function on any other system.
- For Hopper, it is recommended to employ
venv
for creating virtual environments, following the process detailed below.
Starting an interactive Session on Hopper GPUs
If you intend to run the Python Virtual Environments on the Hopper DGX-A100 GPU nodes, the first step is to get directly on the
GPU node by starting an interactive session with the salloc
command
salloc -p gpuq -q gpu -n 1 --ntasks-per-node=1 --gres=gpu:1g.10gb:1 --mem=50GB
Type of GPU | SLURM setting | No. of GPUs on Node | No. of CPUs | RAM |
---|---|---|---|---|
1g 10GB | --gres=gpu:1g.10gb:nGPUS | 4 | 64 | 500GB |
2g 20GB | --gres=gpu:2g.20gb:nGPUS | 4 | 64 | 500GB |
3g 40GB | --gres=gpu:3g.40gb:nGPUS | 4 | 64 | 500GB |
A100 80GB | --gres=gpu:A100.80gb:nGPUS | 4 | 64 | 500GB |
DGX A100 40GB | --gres=gpu:A100.40gb:nGPUs | 8 | 128 | 1TB |
A similar command can be used to start an interactive session on the contrib-gpuq partition, as below:
salloc -p contrib-gpuq -q gpu -n 1 --ntasks-per-node=1 --gres=gpu:2g.20gb:1 --mem=5GB
sinfo
command to view the list of nodes available, the time restriction for each node, and the available partitions.
Once you have the interactive session started, you should load the necessary modules. We recommend to load the python modules under the GCC 10 compiler which have been built to run across both the CPU nodes and GPU nodes.
Creating a Python Virtual Environment
venv
is a Python module available in the standard library and compatible with most Python versions. It simplifies the creation of isolated environments for Python projects. Unlike other environment management tools, venv
ensures that libraries are not shared with other virtual environments by default. You also have the option to prevent access to globally installed libraries if desired.
When you utilize venv
to create a virtual environment, it establishes an isolated space where you can install your project's dependencies independently. Unlike tools like Virtualenv, venv
does not rely on copying the Python interpreter binary into the virtual environment directory. Instead, it often employs symbolic links to the existing Python interpreter on your system. As a result, the virtual environment created by venv
contains symbolic links to the Python executable, optimizing disk space usage.
To summarize, venv
is a built-in module within Python's standard library that offers a simple method for establishing isolated environments tailored to Python projects. It avoids library sharing between virtual environments and enables efficient management of project-specific dependencies.
Python Virtual Environment Setup for Hopper Cluster
To ensure that your Python Virtual Environment runs consistently across all nodes, it's important to use modules built for this purpose. Before creating the Python Virtual Environment, follow these steps:
1 - First switch modules to GNU 10 compilations:
module load gnu10/10.3.0-ya
2 - Check and load python module
module avail python
module load python
python -m venv py-env
source py-env/bin/activate
python -m pip install --upgrade pip
4 - Remove system python module and install modules
module unload python
pip install sklearn
deactivate
Example:
==================================================================
[user@hopper2 ~]$ python -m venv py-env
[user@hopper2 ~]$ source py-env/bin/activate
(py-env) [user@hopper2 ~]$ python -m pip install --upgrade pip
Collecting pip
Downloading pip-23.2.1-py3-none-any.whl (2.1 MB)
|████████████████████████████████| 2.1 MB 13.6 MB/s
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.2.1
Uninstalling pip-20.2.1:
Successfully uninstalled pip-20.2.1
Successfully installed pip-23.2.1
(py-env) [user@hopper2 ~]$ module unload python
(py-env) [user@hopper2 ~]$ pip install sklearn
Collecting sklearn
Downloading sklearn-0.0.post7.tar.gz (3.6 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: sklearn
Building wheel for sklearn (pyproject.toml) ... done
Created wheel for sklearn: filename=sklearn-0.0.post7-py3-none-any.whl size=2951 sha256=c62d78ad5864da7dddeea4f58c96b35c6bab2584dec8748ce5687b788491f2eb
Stored in directory: /home/smudund/.cache/pip/wheels/bc/86/46/dd4e366dc5e1303b4d6927d2a603a1ae7f979d488a5d202330
Successfully built sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0.post7
(py-env) [user@hopper2 ~]$ deactivate
[user@hopper2 ~]$
Things to remember before creating a virtual environment
- To create a Python Virtual Environment from a login node, a Python module must first be loaded using the ‘module load’ command.
On Hopper, you need to make sure you have loaded one of the correct Python modules compiled under GNU 10 to run across all nodes. Do this by first loading the gnu10 module and then the Python modules under it:
module load gnu10 openmpi
module load python/<version>
- Using “pip install
” within an activated Virtual Environment installs the package in the Virtual Environment’s home directory and is available only to that Virtual Environment. However, if you use “pip install --user ” within an activated Virtual Environment, then the package is installed both in the Virtual Environment’s home directory and in /home/$USER/.local directory. By being installed in the /home/$USER/.local directory that package is then available to all Virtual Environments you create. So, the use of “--user” option with pip install is not recommended.
Using a Python Virtual Environment in a SLURM submission script
Below is a sample SLURM submission script. Save the information into run.slurm
,
update the timing information, the <N_CPU_CORES>
, <MEMORY>
and <N_GPUs>
to reflect the
number of CPU cores and GPUs you need (referring to the table above) and submit it by entering
sbatch run.slurm
Sample script, run.slurm
:
#!/bin/bash
#SBATCH --partition=gpuq
#SBATCH --qos=gpu
#SBATCH --job-name=gpu_basics
#SBATCH --output=gpu_basics.%j.out
#SBATCH --error=gpu_basics.%j.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=<N_CPU_CORES>
#SBATCH --gres=gpu:A100.80gb:<N_GPUs>
#SBATCH --mem=<MEMORY>
#SBATCH --export=ALL
#SBATCH -time=0-01:00:00
#
set echo
umask 0022
# to see ID and state of GPUs assigned
nvidia-smi
## Load the necessary modules
module load gnu10
source py-env/bin/activate
## Execute your script
python main.py
Adding your Python Virtual Environment as a Kernel in JupyterLabs
Python Virtual Environments created on Hopper can be added as kernels to the JupyterLab sessions started under Open OnDemand. To see your Python Virtual Environment as a kernel, first, activate the virtual environment from the command line:
source py-env/bin/activate
pip install ipykernel
python -m ipykernel install --user --name=env-name