Setting up your Python environment with conda

Anaconda, conda and Miniconda

A package manager is a tool for adding (and removing) software packages to your system. These packages can be software libraries (like matplotlib or pandas) that add extra functions to Python, or they can be tools (like jupyter lab or samtools) that can be run as commands on the command line.

The conda package manager was invented by authors from the Anaconda project. Anaconda is a distribution of Python, that is it is a way of packaging the Python language interpret together with a set of tools and software packages to make it more useful. In addition to giving users ways to install additional packages conda allows users to set up environments which are particular combinations of packages.

These environments allow users to install just the packages they need for a particular task. Besides environments, the conda package manager has channels which is collections of packages from a group of authors. Two notable channels are conda-forge and bioconda, which provide general open source packages and packages for bioinformatics respectively.

We will set up our Python using Miniconda, a stripped down version of the Anaconda Python distribution, set up the conda-forge and biopython channels and then set up an environment to run jupyter lab. In doing so we will, in fact, follow the instructions from the bioconda Getting Started guide.

Install Miniconda

Start by opening a terminal (you should have a Bash command line environment set up already) and downloading the Miniconda installer. You can find the Linux installers on this page and MacOS installers here. Please note that WSL2 in Windows is, essentially, Linux and you should use the instructions for Linux.

For Linux, one of these commands should work:

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

or

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

The run the downloaded installer with

bash Miniconda3-latest-Linux-x86_64.sh

The installer will ask you a number of questions. If you answer with the defaults it will install in the ~/miniconda3 directory. Please choose the option to run conda init. This adds some code to your ~/.bashrc, a file that is run each time you open a new Bash shell (e.g. when logging in or opening a new terminal). After the installation has finished you should exit your terminal and start a new one so that you have a session with conda activate. You can see a recording of a walkthrough of the installation procedure here. Note that I skipped through the long text of the End User License Agreement by pressing the q key.

Set up channels

As bioinformaticians we are particularly interested in the software available via the bioconda channel. Bioconda is a community project packaging bioinformatics software, maintained by over 500 volunteers from around the world. As of this writing there are over 7000 packages available via bioconda. If you make a bioinformatics tool and want to share it with the world, making a package available through bioconda (and ideally a Galaxy tool wrapper) is an excellent way to share it with the world. Bioconda in turn relies on the conda-forge project. Like bioconda, conda-forge is maintained by an international community of volunteers but instead of focusing on bioinformatics, the project focuses on software of general interest. For example, the commonly used bioinformatics packagesamtools is from bioconda but it depends on the libgompmaths library that is provided by conda-forge.

To setup up the channels needed for bioconda support run:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Install the mamba installer and create an environment for jupyter lab

Each software package depends on others. These are its dependencies. Sometimes the same software on one computer might want different and conflicting dependencies. To ensure that dependencies do not clash, conda supports the concept of environment. Using environments allows only the depencies we need for a particular task to be installed.

Computing which depencies are needed is done at install time by conda. Since this can be a time consuming process, installing the mamba installer. Because mamba is generally useful we will install it in the base environment. The base environment (identified by (base) in the command prompt) is the one that you are in when conda starts and in generally you should not install software in the base environment, in order to keep it as simple as possible. Rather use a different environment for each software package you want or each project you are working on.

For our Python exploration we want to use the jupyter lab environment. This is called jupyterlab in the conda package system. To install mamba and then create an environment for our work do:

conda install -y mamba
mamba create -n jupyterlab -y jupyterlab

(the -y flag means go ahead without asking for confirmation from the user)

Activating the environment and starting Jupyter Lab

Now that we have the jupyterlab environment we can activate it with conda activate. We'd want to start Jupyter Lab in a directory where we want to do our work. These commands show how to activate the environment and start Jupyter lab.

conda activate jupyterlab
mkdir ~/Documents/intro_to_python
cd ~/Documents/intro_to_python
jupyter lab

The recording for all the steps from setting up channels to starting jupyter lab is available here. Note that I start jupyter lab in the background (with jupyter lab &). I need to do this to be able to save the recording, you can start with a simple jupyter lab.

Note: After following these instructions, jupyter lab is installed in the jupyterlab environment and you need to activate that each time before using the command. I.e.

conda activate jupyterlab
jupyter lab

And finally note that jupyter lab will open a window in a browser on your computer, so it needs to be run on the computer that you are working on, not (in general) on a remote computer.

Having completed these instructions you are now ready to proceed with learning Python.