1. Getting started¶
In this lecture, we introduce the material, tools and workflow we will use throughout the course.
1.1. Operating Systems¶
Operating systems are referred to as OS.

You may remember the film HER (2013) where Joaquin Phoenix falls in love with his computer’s operating system (Scarlett Johansson’s voice).
The three main OSes are:
Windows (Microsoft)
MacOS (Apple) – in fact a Unix-based OS
Linux (Open Source)
Amazon AWS and Microsoft Azure are not traditional OSes but provide virtualized OS instances “on the cloud”.
They are the interface between you and the hardware.
Note: MacOS and Linux platforms are sometimes referred to as POSIX (“Portable Operating System Interface for UNIX”).
1.2. Graphical User Interface (GUI)¶
The GUI is a visual interface to the OS.
It allows you to interact with the OS using graphical icons and a mouse.
Common examples of GUIs are:
Windows Explorer
MacOS Finder
1.3. Command Line Interface (CLI)¶
The CLI is a text-based interface to the OS.
It allows you to pass commands to the OS.
It displays a plain text-based screen with a prompt (e.g., > or $) where you can type commands.
Common examples of CLIs are:
Unix/Linux Terminal
macOS Terminal
Windows Terminal
Note concerning Windows: In this course we will not cover Windows or MacOS in details, but rather Linux. Nonetheless, MacOS and Windows have Unix-like kernels. In particular, Windows Users should refer to the Windows Subsystem for Linux (WSL), which allows them to access a Linux subsystem. See this LinkedIn Learning Tutorial recommended by Hadi (MPhil DiS 1). See also the Microsoft docs: Install WSL on Windows.
1.4. Shell¶
The shell interprets and executes the commands typed in the CLI.
Common shells are:
Unix/Linux: Bash, Zsh
Windows: PowerShell
Open a terminal on your OS and type:
echo $SHELL
This will print the shell you are using.
On CSD3 it prints:
/bin/bash
On a Mac it prints:
/bin/zsh
BASH stands for Bourne Again SHell, and ZSH stands for Z Shell (an enhanced version of the BASH shell). We will not make the difference between the two in this course and refer to them simply as “the shell”.
A shell script is a program that can be directly executed by the shell. It consists of a sequence of commands that you could write down in your terminal.
1.5. Linux¶
Linux is the most widely used OS in scientific computing. In fact it referes to a familly of OS which are Unix-like. (MacOS is also Unix-like, but not Windows!)
Popular Linux distributions (or distros) include Debian, Fedora, Arch, and Ubuntu. Commercial distributions include Red Hat Enterprise Linux and SUSE Linux Enterprise. Desktop Linux distributions include a windowing system such as X11 or Wayland and a desktop environment such as GNOME, KDE Plasma or Xfce (Wikipedia).
The Ice Lake nodes on CSD3 (that you are probably going to use) are running Rocky Linux 8, which is a rebuild of Red Hat Enterprise Linux 8 (RHEL8).
Google Colab also runs on a Linux environment. The virtual machines (VMs) behind Google Colab are based on Ubuntu. Today, Colab’s default runtime is Ubuntu 20.04 LTS.
You can type:
uname -a
to get know what your kernel/runtime is. This command in Unix/Linux and macOS systems displays various details about the system, such as:
Kernel name
Node name (hostname)
Kernel version and release
Machine hardware name
Processor architecture
OS
Note that in when you install Windows Subsystem for Linux (WSL), you will be prompted to choose a Linux distribution. By default, it is Ubuntu.
1.6. Useful terminal commands¶
We refer to the following commands throughout the course, and call them “bash commands”.
echo
echo <text>
prints the text.
We use echo to display the value of a variable.
NAME="Boris"
echo "Hello, $NAME"
will print:
Hello, Boris
pwd
pwd
tells you where you are in the file system.
ls
ls <directory>
lists the files and directories in the specified directory.
We often use the following options with ls:
-a: all files (including hidden files, see section on hidden files below)-l: long format (with permissions, number of links, owner, group, size, and timestamp)-h: human readable (e.g., 1493934 bytes -> 1.4 MB)-S: sort by size from largest to smallest So you would type:
ls -alhS
cd
cd <path/to/directory>
changes the current directory to the specified directory.
mkdir
mkdir <directory>
creates a new directory.
cp
cp <source/file.ext> <destination>
copies the source file to the destination directory.
mv
mv <source/file.ext> <destination>
moves the source file to the destination directory.
rm
rm <file.ext>
removes the file.
recursive option
rm -r <directory>
removes the directory and all its contents.
rm -rf <directory>
removes the directory and all its contents without asking for confirmation.
rm -rf <directory>/*
removes all the files in the directory without removing the directory itself.
Important: rm -rf and rm -r are very dangerous commands that will delete files and directories without asking for confirmation. Use them with great caution.
cp -r <source/directory/folder> <destination/directory>
copies the source directory folder and all its contents to the destination directory.
Note: mv moves files and directories by default, including all the contents within directories. It does not need the -r option.
git
Git is distributed version control system (VCS) that tracks versions of files (Wikipedia). We use it to maintain collaborative research projects. It will be covered in details in the course.
Some useful git bash commands that you will often use are:
git clone <repository_url>: to clone a remote repositorygit status: to check the status of your repositorygit add <file.ext>: to add a file to the staging areagit commit -m "<commit message>": to commit the changes you added to the staging areagit push: to push your changes to the remote repository
Note that git is not a native bash command. It is a separate utility that can be installed on most Unix-based systems (like Linux and macOS) through package managers such as apt (on Ubuntu/Debian), yum (on CentOS/RHEL) or brew (on macOS).
It is often installed by default on HPC systems, and is installed on Colab.
Exercise: Clone our course repository with git.
In this Colab notebook you can also experiment with most of the bash commands we have seen above.
tree
tree -L <level> <directory>
lists the files and directories in the specified directory in a tree-like structure. <level> is a number indicating the depth of the tree to be displayed.
Note that tree is not a native bash command. It is a separate utility that can be installed on most Unix-based systems (like Linux and macOS) through package managers such as apt (on Ubuntu/Debian), yum (on CentOS/RHEL) or brew (on macOS).
To install tree, you can use the following commands depending on your system:
On Ubuntu/Debian:
sudo apt-get install tree
On CentOS/RHEL:
sudo yum install tree
On macOS (with Homebrew):
brew install tree
After installing, you can use tree to display directory structures in a tree-like format.
1.7. File system hierarchy¶
Most Linux distributions follow the following Filesystem Hierarchy Standard (FHS):
/bin: Essential binaries.
/etc: System configuration files.
/usr: User binaries, libraries, and documentation.
/var: Variable data like logs and spools.
/dev: Device files.
/home: User home directories.
/tmp: Temporary files.
/proc: Virtual files for system processes.
The root directory is /. If you type: tree -L 1 / in CSD3 you will see something like this:
/
├── bin -> usr/bin
├── boot
├── cgroup-sl
├── datasets
├── dev
├── etc
├── home
├── IMAGE
├── lib -> usr/lib
├── lib64 -> usr/lib64
├── local
├── lost+found
├── media
├── mfa-data
├── misc
├── mnt
├── net
├── opt
├── private
├── proc
├── ramdisks
├── rcs
├── rcs1
├── rcs2
├── rcs3
├── rds
├── rds-d2
├── rds-d3
├── rds-d4
├── rds-d5
├── rds-d6
├── rds-d7
├── rds-d8
├── rfs
├── root
├── run
├── sbin -> usr/sbin
├── scratch
├── slurm
├── srv
├── sys
├── tmp
├── usr
└── var
If you do the same in Colab, you will see that the structure is similar:
/
├── bin -> usr/bin
├── boot
├── content
├── cuda-keyring_1.0-1_all.deb
├── datalab
├── dev
├── etc
├── home
├── lib -> usr/lib
├── lib32 -> usr/lib32
├── lib64 -> usr/lib64
├── libx32 -> usr/libx32
├── media
├── mnt
├── NGC-DL-CONTAINER-LICENSE
├── opt
├── proc
├── python-apt
├── python-apt.tar.xz
├── root
├── run
├── sbin -> usr/sbin
├── srv
├── sys
├── tmp
├── tools
├── usr
└── var
1.8. Hidden files¶
Files or directories that are not displayed by default when you list the contents of a directory.
Used to store configuration settings or metadata that users do not usually need to see or modify directly.
You can list hidden files in current directory by typing ls -a.
Some of the most important hidden files for you, as researchers, are:
.bashrc: It contains the configuration for your bash shell, i.e., your bash environment (see section on Bash environment)..bash_profile: It is usually linked to.bashrc. Generally used on MacOS instead of.bashrc..gitignore: It specifies intentionally untracked files that Git should ignore..ssh/: It contains your SSH keys for connecting to remote servers (for instance CSD3).
1.9. Home directory¶
The path to your home directory is stored in the environment variable $HOME.
You can print it by typing:
echo $HOME
On CSD3 it prints:
/home/<username>
where <username> is your username (e.g., bb667).
A shortcut to your home directory is ~. You can change directory to your home directory by typing:
cd ~
which is equivalent to:
cd $HOME
On Colab if you type echo $HOME it prints:
/root