Anaconda, conda and Miniconda
A package manager is a tool for adding (and removing) software packages to your system. These packages can be software libraries (like matplotlib or pandas) that add extra functions to Python, or they can be tools (like jupyter lab or samtools) that can be run as commands on the command line.
The conda package manager was invented by authors from the Anaconda project. Anaconda is a distribution of Python, that is it is a way of packaging the Python language interpret together with a set of tools and software packages to make it more useful. In addition to giving users ways to install additional packages conda allows users to set up environments which are particular combinations of packages.
These environments allow users to install just the packages they need for a particular task. Besides environments, the conda package manager has channels which is collections of packages from a group of authors. Two notable channels are conda-forge and bioconda, which provide general open source packages and packages for bioinformatics respectively.
We will set up our Python using Miniconda, a stripped down version of the Anaconda Python distribution, set up the conda-forge and biopython channels and then set up an environment to run jupyter lab. In doing so we will, in fact, follow the instructions from the bioconda Getting Started guide.
Start by opening a terminal (you should have a Bash command line environment set up already) and downloading the Miniconda installer. You can find the Linux installers on this page and MacOS installers here. Please note that WSL2 in Windows is, essentially, Linux and you should use the instructions for Linux.
For Linux, one of these commands should work:
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
The run the downloaded installer with
The installer will ask you a number of questions. If you answer with the defaults it will install in the
~/miniconda3 directory. Please choose the option to run
conda init. This adds some code to your
~/.bashrc, a file that is run each time you open a new Bash shell (e.g. when logging in or opening a new terminal). After the installation has finished you should exit your terminal and start a new one so that you have a session with conda activate. You can see a recording of a walkthrough of the installation procedure here. Note that I skipped through the long text of the End User License Agreement by pressing the q key.
Set up channels
As bioinformaticians we are particularly interested in the software available via the bioconda channel. Bioconda is a community project packaging bioinformatics software, maintained by over 500 volunteers from around the world. As of this writing there are over 7000 packages available via bioconda. If you make a bioinformatics tool and want to share it with the world, making a package available through bioconda (and ideally a Galaxy tool wrapper) is an excellent way to share it with the world. Bioconda in turn relies on the conda-forge project. Like bioconda, conda-forge is maintained by an international community of volunteers but instead of focusing on bioinformatics, the project focuses on software of general interest. For example, the commonly used bioinformatics package
samtools is from bioconda but it depends on the
libgompmaths library that is provided by conda-forge.
To setup up the channels needed for bioconda support run:
conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge
Install the mamba installer and create an environment for jupyter lab
Each software package depends on others. These are its dependencies. Sometimes the same software on one computer might want different and conflicting dependencies. To ensure that dependencies do not clash, conda supports the concept of environment. Using environments allows only the depencies we need for a particular task to be installed.
Computing which depencies are needed is done at install time by conda. Since this can be a time consuming process, installing the
mamba installer. Because mamba is generally useful we will install it in the base environment. The base environment (identified by
(base) in the command prompt) is the one that you are in when conda starts and in generally you should not install software in the base environment, in order to keep it as simple as possible. Rather use a different environment for each software package you want or each project you are working on.
For our Python exploration we want to use the jupyter lab environment. This is called
jupyterlab in the conda package system. To install mamba and then create an environment for our work do:
conda install -y mamba mamba create -n jupyterlab -y jupyterlab
-y flag means go ahead without asking for confirmation from the user)
Activating the environment and starting Jupyter Lab
Now that we have the jupyterlab environment we can activate it with
conda activate. We'd want to start Jupyter Lab in a directory where we want to do our work. These commands show how to activate the environment and start Jupyter lab.
conda activate jupyterlab mkdir ~/Documents/intro_to_python cd ~/Documents/intro_to_python jupyter lab
The recording for all the steps from setting up channels to starting
jupyter lab is available here. Note that I start jupyter lab in the background (with
jupyter lab &). I need to do this to be able to save the recording, you can start with a simple
Note: After following these instructions,
jupyter lab is installed in the
jupyterlab environment and you need to activate that each time before using the command. I.e.
conda activate jupyterlab jupyter lab
And finally note that jupyter lab will open a window in a browser on your computer, so it needs to be run on the computer that you are working on, not (in general) on a remote computer.
Having completed these instructions you are now ready to proceed with learning Python.