Installing and Managing Software with Anaconda Virtual Environments on Quest

Anaconda is a package and environment manager primarily used for open-source data science packages for the Python and R programming languages. It also supports other programming languages like C, C++, FORTRAN, Java, Scala, Ruby, and Lua.

Using Anaconda on Quest

To use Anaconda, first load the corresponding module:

module purge
module load python-miniconda3/4.12.0

This module is based on the minimal Miniconda installer. Included in all versions of Anaconda, Conda is the package and environment manager that installs, runs, and updates packages and their dependencies.

If you prefer to use Mamba, which is a drop-in replacement for most conda commands that enables faster package solving, downloading, and installing, you can load the corresponding mamba module:

module purge
module load mamba

The next step is to initialize your shell to use conda:

conda init bash
source ~/.bashrc

This modifies your ~/.bashrc file so that conda are ready to use every time you log in (without needing to load the module).

If you want a newer version of Conda or Mamba than what is available in the module, you can also install them into your HOME directory as follows

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh --output $HOME/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda

Conda can also be configured with various options. Read more about Conda configuration here.

Creating Conda environments and installing packages

You can create new Conda environments in one of your available directories. Conda environments are isolated project environments designed to manage distinct package requirements and dependencies for different projects. We recommend using the mamba command for faster package solving, downloading, and installing. However, you can use the conda command, with various options, to install and inspect Conda environments.

The process for creating and using environments has a few basic steps:

  1. Create an environment with conda create
  2. Activate the environment with conda activate
  3. Install packages into the environment with conda install

To create a new Conda environment in your home directory, enter:

conda create --name <env_name>

where <env_name> is the name your want for your environment. Then activate the environment:

conda activate <env_name>

Once activated, you can then install packages into that environment:

conda install <pkg>

Below we demonstrate examples of using anaconda to install Python and R. The commands below will create an isolated environment in your HOME folder in the following location ~/.conda/envs/my-virtenv-py39 with only Python 3.9 installed.

$ conda create --name my-virtenv-py39 python=3.9 --yes
$ conda activate my-virtenv-py39

Once the environment is activated, you have full control over what Python packages are installed into the environment. In many ways, you are the system administrator of this Python installation as you have full read/write/execute privileges inside of that folder in your HOME directory. At this point, you can use either conda to install additional Python packages into the environment or Python's native package manager, PyPi, to install whatever Python package(s).

The commands below will create an isolated environment in your HOME folder in the following location ~/.conda/envs/my-virtenv-R-4d1 with only R 4.1 installed.

$ conda create --name my-virtenv-R-4d1 -c conda-forge r-base=4.1 --yes

Explaining Anaconda "channels"

An overwhelming majority of research software (including R, Python, Julia and more) is available via anaconda in one of three locations. These locations are called "channels", which can be thought of as the remote/cloud repository in which anaconda looks for the package. These three locations are

--channel=conda-forge

--channel=anaconda (a.k.a the default location)

--channel=bioconda

If you do opt to install packages directly from anaconda (which can help insure that all the packages are compatible with each other), it is good practice to make sure they are all installed from the same remote repository/channel.

Finally, if you are need of C and/or C++ compilers you can install those directly into your environment via

conda install -c conda-forge gxx_linux-64 gcc_linux-64

To share an environment with other members of your allocation, you can create the environment in a specific directory.

With conda, use the --prefix option to create the environment in a specific location, such as in your /projects/<allocationID> directory. To use the environment (either interactively from the command line or as part of your job submission script), you then need to point to this directory when activating it.

For example, to create a conda environment called env1 in a subdirectory of a project directory (here p10000 as an example), with the package sqlalchemy included, run the following:

$ module load python-miniconda3
$ cd /projects/p10000
$ mkdir pythonenvs
$ conda create --prefix /projects/p10000/pythonenvs/env1 sqlalchemy python=3.8 --yes

To use this environment, specify the full path to the environment when you activate it:

$ module load python-miniconda3
$ source activate /projects/p10000/pythonenvs/env1

Please note that a version of the main application you are using (e.g., Python or R) is installed in the Conda environment, so the module versions of these should not be loaded when the Conda environment is activated. Enter module purge to unload all loaded modules.

To deactivate an environment, enter:

conda deactivate

You can also create a new environment in your project directory instead using the --prefix option. For example:

conda create --prefix /projects/<allocation_id>/<env_name>

where <allocation_id> is the name of your allocation. Then activate the environment:

conda activate /project/<project_id>/<env_name>

To view a list of all your Conda environments, enter:

conda env list

To remove a Conda environment, enter:

conda env remove --name <env_name>

Using Anaconda in a Slurm Job

In order to submit jobs to the Slurm job scheduler, you will need to use the main application you are using with your Conda environment in batch mode. There are a few steps to follow:

  1. Create an application script
  2. Create a Slurm job script that runs the application script
  3. Submit the job script to the job scheduler with sbatch

Your application script should consist of the sequence of commands needed for your analysis.

A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job using Anaconda, a Slurm job script should look something like the following:

#!/bin/bash
#SBATCH --account=<allocation_id>
#SBATCH --partition=<partition_name>
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=16GB
#SBATCH --time=1:00:00

module purge

eval "$(conda shell.bash hook)"

conda activate /projects/pXXXXX/env

python script.py
Was this helpful?
0 reviews

Details

Article ID: 2064
Created
Mon 12/12/22 10:07 AM
Modified
Tue 12/13/22 11:00 AM

Related Services / Offerings (2)

Northwestern IT provides support installing software for use on Quest, Northwestern’s high-performance computing (HPC) cluster.