2. Environments
There are two main types of environments that you need to consider when developing code in Python. The bash environment which knows about the shell and your OS, and the python environment which knows about the python packages you are using.
2.1. Bash environment
The bash environment is defined by a set of environment variables.
2.1.1. Environment variables
It is a key-value pair, stored by the OS, used by programs and shell scripts to configure the system behavior.
The key is the variable name, and the value is the variable value.
To set an environment variable, you can use the export command. For example:
export MY_VARIABLE="my_value"
To print the value of an environment variable, you can use the echo command. For example:
echo $MY_VARIABLE
with the dollar sign $ to get the value of the variable (think of dollar/value).
You can list the environment variables currently set in your bash session by typing:
printenv
This is a bash command that is very important, you should not forget about it.
Exercise: Connect to CSD3 and list the environment variables.
The main environment variables you need to know are:
PATH: Defines the directories where the system searches for executable programs. It’s essential for adding research software tools like compilers, or compiled programs.
Exercise: On your local machine list the executables found in some of the directories in your PATH.
LD_LIBRARY_PATH(Linux) /DYLD_LIBRARY_PATH(macOS): Used to specify additional directories where the system looks for shared libraries (.so or .dylib files). For example, such librairies can be math libraries (e.g., LAPACK, BLAS, FFTW, GSL, etc.).
Note: In LD_LIBRARY_PATH, the LD stands for “Linker/Loader”. It refers to the dynamic linker or dynamic loader, which is the part of the operating system responsible for loading shared libraries (also known as dynamic libraries) when an executable program is run.
HOME: Your home directory.USER: Holds your username.PYTHONPATH: Used to specify additional directories where the system looks for Python packages. Typically, where the systems look when you runimport ...in Python.
Exercise: Open a Colab notebook and print the values of each of these environment variables.
Remember to use the ! to run the bash commands in Colab.
2.1.2. Adding to your PATH
You can add to your PATH by appending the directory to the PATH variable. For example:
export PATH=$PATH:/path/to/your/program
You can also prepend to your PATH by adding the directory at the beginning of the PATH variable. For example:
export PATH=/path/to/your/program:$PATH
Important: The order in which you add directories to your PATH is important. The first directory in the PATH is the first one that is searched.
Exercise: Add a directory to your PATH and check that it works. If you have a directory that contains an executable, add this one and try running the program from anywhere in the system.
2.1.3. Adding to your path variables
You can add to any path variables in exactly the same way as we have seen for PATH above.
For example, to add a directory to your PYTHONPATH, you can use:
export PYTHONPATH=/path/to/your/program:$PYTHONPATH
2.1.4. Making the changes persistent
You can make the changes persistent by adding them to your ~/.bashrc (or ~/.bash_profile on macOS) file. For example, to add a directory to your PATH, you can type in your terminal:
echo "export PATH=/path/to/your/program:$PATH" >> ~/.bashrc
You can then reload the ~/.bashrc file by typing:
source ~/.bashrc
so the changes are applied in your current shell as well.
Exercise: Do the following:
Add a fake directory to your
PATH. For instance/path/to/nowhere, using theexportcommand.Use echo to check the
PATHhas been updated.Close your current shell (i.e., Terminal) and re-open it (or open a new one).
Check the
PATHand note that the fake path is not there.Redo 1-4 but this time modify the
~/.bashrcfile to make the change persistent before step 3.Check that the change is persistent.
To undo the change, remove the new line added to the ~/.bashrc file. Use vim or VS code to do so.
2.2. Python environment
Python was invented in December 1989 by Guido van Rossum at the Centrum Wiskunde & Informatica (abbr. CWI; English: “National Research Institute for Mathematics and Computer Science”) in the Netherlands.
Python is one of the most popular programming languages, for its widespread use in machine learning and data science.
The name Python comes from the British comedy group Monty Python. You will occasionnaly find some further references, such as the use of the terms “spam” and “eggs”. Python is fun.
As researcher data scientists, your Python environment consists mainly of three components:
Python interpreter/version,
Virtual environment(s),
Jupyter notebook and kernel.
2.2.1. Python interpreter
The Python interpreter is the program that runs your Python code. It is the interface between your code and the Python language.
Python (i.e., a Python interpreter) may not be natively installed on your system.
If you type python in your terminal, and get an error such as:
bash: python: command not found
then you need to install Python.
On MacOS you can install Python (i.e., a Python interpreter) using Homebrew, e.g.,
brew install python
On Linux you can install Python using your package manager, e.g.,
sudo apt-get install python
On CSD3 several Python interpreters are available. You can load the one you need using the module command. For example:
module load miniconda/3
You can then check the default python version you get from the loaded module by typing:
python --version
If you then start a Python session by typing python in your terminal, you should see something like this:
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
The >>> is the Python prompt. You can type Python code between the prompt and press Enter to execute the code.
To exit the Python interpreter, you can type exit() and press Enter.
Note for CSD3: if you don’t use this command, other Pythons are available, but you need to explicitly type the python version you want to use. They are stored in /usr/bin. You can list them using ls /usr/bin/python*.
Now, in a Python session, type import platform; platform.python_implementation() and press Enter. You should see something like this:
'CPython'
Your Python interpreter is CPython. This is the reference implementation of the Python programming language. Written in C and Python, CPython is the default and most widely used implementation of the Python language (Wikipedia).
To check the version of Python you are using from within Python, type:
import sys
print(sys.version)
Exercise: Check the version of Python and interpreter you are using, in Colab, your laptop and CSD3.
There are other implementations of Python, such as PyPy, Jython and IronPython. You can read about them online.
2.2.2. Virtual environments
If you type pip list in a Terminal you will see the packages currently installed in your Python environment.
Exercise: Try pip list in Colab (it runs the shell command from within the notebook), your laptop and CSD3.
Packages can have conflicts, and some software may require a specific version of a package.
A Python environment is a directory that contains a Python interpreter and a curated set of installed packages readily available.
Good practice is to create a new virtual environment for each project.
There are multiple tools to create and manage virtual environments. Here we use venv. See here for the documentation.
To create a new virtual environment, you can use the following command:
First, create a directory to store the virtual environments, e.g., venvs.
mkdir venvs
Then, create the virtual environment in this directory, do:
python -m venv <path/to/venvs/nameofenv>
Here, -m venv specifies that we use the venv module to create a virtual environment. The -m option tells Python to run the venv module.
<path/to/venvs/nameofenv> is the path to the directory where the virtual environment will be created, and nameofenv is the name of the virtual environment. Just above, we have created the directory venvs. We can use any name for the virtual environment, e.g., venv_lecture2 or anything else.
To activate the virtual environment, you can use the following command:
source <path/to/venvs/nameofenv/bin/activate>
In the Terminal, you should see something like this:
(nameofenv) <your-prompt>
Now if you run the command pip list, you will see the packages currently installed in your virtual environment. This should be different from the packages installed in your global Python environment.
To deactivate the virtual environment, you can use the following command in the Terminal:
deactivate
This will return you to your global Python environment.
Exercise: Working with Virtual Environments
Create a new virtual environment named ‘venv_lecture2’ using Python 3.11
Hint: Use
python3.11 -m venv venvs/venv_lecture2Activate the virtual environment
Hint: Use
source venvs/venv_lecture2/bin/activateUse the
treecommand to see the structure of your virtual environmentHint: Use
tree venvs/venv_lecture2Check the Python version in your virtual environment
Hint: Use
python --versionList all packages installed in your virtual environment
Hint: Use
pip listInstall a new package (e.g.,
numpy) in your virtual environmentHint: Use
pip install numpyUse
treeagain to see how the structure has changed after installing numpyHint: Use
tree venvs/venv_lecture2List the packages again to confirm the installation
Hint: Use
pip listDeactivate the virtual environment
Hint: Use
deactivate
Important: To create an environment with a specific Python version, just use the Python version number, e.g., python3.11 -m venv <envdir>/<nameofenv> when creating the environment.
2.2.3. Fresh and non-fresh environments
You can create a virtual environment while allowing it to access packages installed in your global environment. To do so use the --system-site-packages option:
python -m venv --system-site-packages <path/to/venvs/nameofenv>
Exercise: Create an environment with access to your global packages. and use pip list to check that you can see the packages in your global environment.
This can get confusing though because some packages installed in your global environment may be incompatible with the new packages you will want to install in your virtual environment.
2.2.4. Managing Python versions
The Python language evolves fast. Currently, a new version is released every year.
You can consult the status of Python versions here and read about Python development on the same website.
You can install new Python versions on your machine using tools like Anaconda, Miniconda (i.e., conda) or Homebrew (on macOS, brew).
Exercise: Checking Python Versions on CSD3 with Miniconda
Log in to CSD3 if you haven’t already
Load the Miniconda module Hint: Use
module load miniconda/3List the available Python versions Hint: Type
which pythonand then list the files in the directory returned by this command.Now list the versions available by default:
ls /usr/bin/python*and try to start a Python session with one of them. Hint: Just typepython3.11(or another version number) and press Enter.
What is the difference between the versions available by default and the one loaded by the Miniconda module?
2.2.5. Jupyter and IPython kernels
Jupyter provides an interactive environment to run Python code. It typically opens in a web browser. The most common feature is the notebook, which corresponds to what you get in Colab.
Jupyter is an evolution of IPython, “interactive Python”. IPython is an interactive shell that supports more features than the standard Python shell. Type ipython in your terminal and you will understand why.
We don’t generally use IPython, but instead use Jupyter which is based on it.
To start a Jupyter notebook, having access to your Python environment, you need to create the relevant kernels. One kernel is created for each environment.
Assume your virtual environment is called venv_lecture2 and you have activated it.
To create the kernel for this environment, you can use the following command:
python -m ipykernel install --user --name venv_lecture2 --display-name "Python (venv_lecture2)"
This creates a file (kernel) that contains info about the environment and can be propagated to Jupyter. Typically this file is in:
<path/to/Jupyter>/kernels/kernels/<venvs>/kernel.json
Exercise: Create a new kernel from a new environment. Find the kernel.json file and look at it using vim.
You can then start a Jupyter notebook by typing jupyter-lab in your terminal.
2.3. Symbolic links and aliases
When you load or source an environment or install new codes, aliases and symbolic links may be created.
2.3.1. Symbolic links
A symbolic link is a file that is pointer to another file or directory. To create a symbolic link, you can use the ln -s command. For example:
ln -s /path/to/target /path/to/link
To see which file a symbolic link points to, you can use the ls -l command. For example:
ls -l /path/to/link
For instance, on CSD3, there are symbolic links to useful Job submission files. (A job means a heavy computation that is submitted to the batch system.)
2.3.2. Aliases
An alias is a shortcut for a command. To list the aliases currently set in your bash session, you can use the alias command.
alias
For instance on CSD3, by default, we have
alias vi='vim'
So that typing vi is equivalent to typing vim and opens the vim editor.
2.4. Workflow with environments
You create a Python environment for each project. You need to store the information about your projects and relevant environment in a well organised manner.
A good way to stay organised and keep track of what you are doing is certainly to create a github repository for each project, with a README explaining what to setup and how to do it, and storing your codes in it.
2.4.1. Bash scripts to load environments
To make your life easier, you can create bash scripts to load your environments. One bash script per environment.
The script is used to update environment variables and also activate relevant Python virtual environment. It would look something like this:
#!/bin/bash
# Step 1: Define environment variables
export MY_VAR1="value1"
export MY_VAR2="value2"
export PATH="/my/custom/path:$PATH"
export LD_LIBRARY_PATH="/my/custom/path:$LD_LIBRARY_PATH"
# Step 2: Source the Python virtual environment
# Replace the path below with the actual path to your virtual environment's 'activate' script
source /path/to/your/venv/bin/activate
# Optional: Print a message to confirm the environment is set
echo "Environment variables are set, and the virtual environment is activated."
and saved as activate_my_env.sh.
You can then source the script to load the environment, including bash variables and the Python virtual environment:
source activate_my_env.sh
Exercise: Create a bash script to load your Python environment and check that it works.
2.4.2. Note on File Permissions
You may have noticed that when you list with the ls -alh command, the first thing to appear is a string of letters and numbers, such as -rwxr-xr-x.
This represents the permissions of the file.
When you create a bash script, it is generally not executable. To make it executable, you can use the chmod command. For example:
chmod +x activate_my_env.sh
Exercise: Show the permission of the bash script from above before and after making it executable with chmod.