Anaconda is a package and environment manager primarily used for open-source data science packages for the Python and R programming languages. It also supports other programming languages like C, C++, FORTRAN, Java, Scala, Ruby, and Lua.
Using Anaconda on Quest
To use Anaconda, first load the corresponding module:
module purge
module load python-miniconda3/4.12.0
This module is based on the minimal Miniconda installer. Included in all versions of Anaconda, Conda is the package and environment manager that installs, runs, and updates packages and their dependencies.
If you prefer to use Mamba, which is a drop-in replacement for most conda
commands that enables faster package solving, downloading, and installing, you can load the corresponding mamba module:
module purge
module load mamba
The next step is to initialize your shell to use conda
:
conda init bash
source ~/.bashrc
This modifies your ~/.bashrc
file so that conda
are ready to use every time you log in (without needing to load the module).
If you want a newer version of Conda or Mamba than what is available in the module, you can also install them into your HOME directory as follows
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh --output $HOME/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda
Conda can also be configured with various options. Read more about Conda configuration here.
Creating Conda environments and installing packages
You can create new Conda environments in one of your available directories. Conda environments are isolated project environments designed to manage distinct package requirements and dependencies for different projects. We recommend using the mamba
command for faster package solving, downloading, and installing. However, you can use the conda
command, with various options, to install and inspect Conda environments.
The process for creating and using environments has a few basic steps:
- Create an environment with
conda create
- Activate the environment with
conda activate
- Install packages into the environment with
conda install
To create a new Conda environment in your home directory, enter:
conda create --name <env_name>
where <env_name>
is the name your want for your environment. Then activate the environment:
conda activate <env_name>
Once activated, you can then install packages into that environment:
conda install <pkg>
Below we demonstrate examples of using anaconda to install Python and R. The commands below will create an isolated environment in your HOME folder in the following location ~/.conda/envs/my-virtenv-py39
with only Python 3.9 installed.
$ conda create --name my-virtenv-py39 python=3.9 --yes
$ conda activate my-virtenv-py39
Once the environment is activated, you have full control over what Python packages are installed into the environment. In many ways, you are the system administrator of this Python installation as you have full read/write/execute privileges inside of that folder in your HOME directory. At this point, you can use either conda
to install additional Python packages into the environment or Python's native package manager, PyPi, to install whatever Python package(s).
The commands below will create an isolated environment in your HOME folder in the following location ~/.conda/envs/my-virtenv-R-4d1
with only R 4.1 installed.
$ conda create --name my-virtenv-R-4d1 -c conda-forge r-base=4.1 --yes
Explaining Anaconda "channels"
An overwhelming majority of research software (including R, Python, Julia and more) is available via anaconda in one of three locations. These locations are called "channels", which can be thought of as the remote/cloud repository in which anaconda looks for the package. These three locations are
--channel=conda-forge
--channel=anaconda (a.k.a the default location)
--channel=bioconda
If you do opt to install packages directly from anaconda (which can help insure that all the packages are compatible with each other), it is good practice to make sure they are all installed from the same remote repository/channel.
Finally, if you are need of C and/or C++ compilers you can install those directly into your environment via
conda install -c conda-forge gxx_linux-64 gcc_linux-64
To share an environment with other members of your allocation, you can create the environment in a specific directory.
With conda, use the --prefix
option to create the environment in a specific location, such as in your /projects/<allocationID>
directory. To use the environment (either interactively from the command line or as part of your job submission script), you then need to point to this directory when activating it.
For example, to create a conda environment called env1 in a subdirectory of a project directory (here p10000 as an example), with the package sqlalchemy included, run the following:
$ module load python-miniconda3
$ cd /projects/p10000
$ mkdir pythonenvs
$ conda create --prefix /projects/p10000/pythonenvs/env1 sqlalchemy python=3.8 --yes
To use this environment, specify the full path to the environment when you activate it:
$ module load python-miniconda3
$ source activate /projects/p10000/pythonenvs/env1
Please note that a version of the main application you are using (e.g., Python or R) is installed in the Conda environment, so the module versions of these should not be loaded when the Conda environment is activated. Enter module purge
to unload all loaded modules.
To deactivate an environment, enter:
conda deactivate
You can also create a new environment in your project directory instead using the --prefix
option. For example:
conda create --prefix /projects/<allocation_id>/<env_name>
where <allocation_id>
is the name of your allocation. Then activate the environment:
conda activate /project/<project_id>/<env_name>
To view a list of all your Conda environments, enter:
conda env list
To remove a Conda environment, enter:
conda env remove --name <env_name>
Using Anaconda in a Slurm Job
In order to submit jobs to the Slurm job scheduler, you will need to use the main application you are using with your Conda environment in batch mode. There are a few steps to follow:
- Create an application script
- Create a Slurm job script that runs the application script
- Submit the job script to the job scheduler with
sbatch
Your application script should consist of the sequence of commands needed for your analysis.
A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job using Anaconda, a Slurm job script should look something like the following:
#!/bin/bash
#SBATCH --account=<allocation_id>
#SBATCH --partition=<partition_name>
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=16GB
#SBATCH --time=1:00:00
module purge
eval "$(conda shell.bash hook)"
conda activate /projects/pXXXXX/env
python script.py