1. Getting started

1.1. Operating Systems

Operating systems are referred to as OS.

Operating Systems

The three main OSes are:

  • Windows (Microsoft)

  • MacOS (Apple) – in fact a Unix-based OS

  • Linux (Open Source)

Google GCP, Amazon AWS and Microsoft Azure are not traditional provide virtualized OS instances “on the cloud”.

They are the interface between you and the hardware.

Note: MacOS and Linux platforms are sometimes referred to as POSIX (“Portable Operating System Interface for UNIX”).

1.2. Graphical User Interface (GUI)

The GUI is a visual interface to the OS.

It allows you to interact with the OS using graphical icons and a mouse.

Common examples of GUIs are:

  • Windows Explorer

  • MacOS Finder

1.3. Command Line Interface (CLI)

The CLI is a text-based interface to the OS.

It allows you to pass commands to the OS.

It displays a plain text-based screen with a prompt (e.g., > or $) where you can type commands.

Common examples of CLIs are:

  • Unix/Linux Terminal

  • macOS Terminal

  • Windows Terminal

Note concerning Windows: In this course we will not cover Windows or MacOS in details, but rather Linux. Nonetheless, MacOS and Windows have Unix-like kernels. In particular, Windows Users should refer to the Windows Subsystem for Linux (WSL), which allows them to access a Linux subsystem. See this LinkedIn Learning Tutorial recommended by Hadi (MPhil DiS 1). See also the Microsoft docs: Install WSL on Windows. Ask ChatGPT.

1.4. Shell

The shell interprets and executes the commands typed in the CLI.

Common shells are:

  • Unix/Linux: Bash, Zsh

  • Windows: PowerShell

Open a terminal on your OS and type:

echo $SHELL

This will print the shell you are using.

On CSD3 it prints:

/bin/bash

On a Mac it prints:

/bin/zsh

BASH stands for Bourne Again SHell, and ZSH stands for Z Shell (an enhanced version of the BASH shell). We will not make the difference between the two in this course and refer to them simply as “the shell”.

A shell script is a program that can be directly executed by the shell. It consists of a sequence of commands that you could write down in your terminal.

1.5. Linux

Linux is the most widely used OS in scientific computing. In fact it refers to a family of OS which are Unix-like. (MacOS is also Unix-like, but not Windows!)

Popular Linux distributions (or distros) include Debian, Fedora, Arch, and Ubuntu. Commercial distributions include Red Hat Enterprise Linux and SUSE Linux Enterprise. Desktop Linux distributions include a windowing system such as X11 or Wayland and a desktop environment such as GNOME, KDE Plasma or Xfce (Wikipedia).

The Ice Lake nodes on CSD3 (that you are probably going to use) are running Rocky Linux 8, which is a rebuild of Red Hat Enterprise Linux 8 (RHEL8).

Google Colab also runs on a Linux environment. The virtual machines (VMs) behind Google Colab are based on Ubuntu. Today, Colab’s default runtime is Ubuntu 20.04 LTS.

You can type:

uname -a

to get know what your kernel/runtime is. This command in Unix/Linux and macOS systems displays various details about the system, such as:

  • Kernel name

  • Node name (hostname)

  • Kernel version and release

  • Machine hardware name

  • Processor architecture

  • OS

Note that in when you install Windows Subsystem for Linux (WSL), you will be prompted to choose a Linux distribution. By default, it is Ubuntu.

1.6. Useful terminal commands

We refer to the following commands throughout the course, and call them “bash commands”.

echo

echo <text>

prints the text.

We use echo to display the value of a variable.

NAME="Boris"
echo "Hello, $NAME"

will print:

Hello, Boris

pwd

pwd

tells you where you are in the file system.

ls

ls <directory>

lists the files and directories in the specified directory.

We often use the following options with ls:

  • -a: all files (including hidden files, see section on hidden files below)

  • -l: long format (with permissions, number of links, owner, group, size, and timestamp)

  • -h: human readable (e.g., 1493934 bytes -> 1.4 MB)

  • -S: sort by size from largest to smallest So you would type:

ls -alhS

du

du -sh .

shows the disk usage of the current directory (“.”) as one total.

The -s option is for summary, and -h for human readable.

cd

cd <path/to/directory>

changes the current directory to the specified directory.

mkdir

mkdir <directory>

creates a new directory.

cp

cp <source/file.ext> <destination>

copies the source file to the destination directory.

mv

mv <source/file.ext> <destination>

moves the source file to the destination directory.

rm

rm <file.ext>

removes the file.

recursive option

rm -r <directory>

removes the directory and all its contents.

rm -rf <directory>

removes the directory and all its contents without asking for confirmation.

rm -rf <directory>/*

removes all the files in the directory without removing the directory itself.

Important: rm -rf and rm -r are very dangerous commands that will delete files and directories without asking for confirmation. Use them with great caution.

cp -r <source/directory/folder> <destination/directory>

copies the source directory folder and all its contents to the destination directory.

Note: mv moves files and directories by default, including all the contents within directories. It does not need the -r option.

git

Git is distributed version control system (VCS) that tracks versions of files (Wikipedia). We use it to maintain collaborative research projects. It will be covered in details in the course.

Some useful git bash commands that you will often use are:

  • git clone <repository_url>: to clone a remote repository

  • git status: to check the status of your repository

  • git add <file.ext>: to add a file to the staging area

  • git commit -m "<commit message>": to commit the changes you added to the staging area

  • git push: to push your changes to the remote repository

Note that git is not a native bash command. It is a separate utility that can be installed on most Unix-based systems (like Linux and macOS) through package managers such as apt (on Ubuntu/Debian), yum (on CentOS/RHEL) or brew (on macOS).

It is often installed by default on HPC systems, and is installed on Colab.

Exercise: Clone our gitlab course repository with git.

Click on the badge to open the example notebook in Colab: Open In Colab

In this Colab notebook you can also experiment with most of the bash commands we have seen above.

tree

tree -L <level> <directory>

lists the files and directories in the specified directory in a tree-like structure. <level> is a number indicating the depth of the tree to be displayed.

Note that tree is not a native bash command. It is a separate utility that can be installed on most Unix-based systems (like Linux and macOS) through package managers such as apt (on Ubuntu/Debian), yum (on CentOS/RHEL) or brew (on macOS).

To install tree, you can use the following commands depending on your system:

  • On Ubuntu/Debian:

    sudo apt-get install tree
    
  • On CentOS/RHEL:

    sudo yum install tree
    
  • On macOS (with Homebrew):

    brew install tree
    

After installing, you can use tree to display directory structures in a tree-like format.

1.7. File system hierarchy

Most Linux distributions follow the following Filesystem Hierarchy Standard (FHS):

  • /bin: Essential binaries.

  • /etc: System configuration files.

  • /usr: User binaries, libraries, and documentation.

  • /var: Variable data like logs and spools.

  • /dev: Device files.

  • /home: User home directories.

  • /tmp: Temporary files.

  • /proc: Virtual files for system processes.

The root directory is /. If you type: tree -L 1 / in CSD3 you will see something like this:

/
├── bin -> usr/bin
├── boot
├── cgroup-sl
├── datasets
├── dev
├── etc
├── home
├── IMAGE
├── lib -> usr/lib
├── lib64 -> usr/lib64
├── local
├── lost+found
├── media
├── mfa-data
├── misc
├── mnt
├── net
├── opt
├── private
├── proc
├── ramdisks
├── rcs
├── rcs1
├── rcs2
├── rcs3
├── rds
├── rds-d2
├── rds-d3
├── rds-d4
├── rds-d5
├── rds-d6
├── rds-d7
├── rds-d8
├── rfs
├── root
├── run
├── sbin -> usr/sbin
├── scratch
├── slurm
├── srv
├── sys
├── tmp
├── usr
└── var

If you do the same in Colab, you will see that the structure is similar:

/
├── bin -> usr/bin
├── boot
├── content
├── cuda-keyring_1.0-1_all.deb
├── datalab
├── dev
├── etc
├── home
├── lib -> usr/lib
├── lib32 -> usr/lib32
├── lib64 -> usr/lib64
├── libx32 -> usr/libx32
├── media
├── mnt
├── NGC-DL-CONTAINER-LICENSE
├── opt
├── proc
├── python-apt
├── python-apt.tar.xz
├── root
├── run
├── sbin -> usr/sbin
├── srv
├── sys
├── tmp
├── tools
├── usr
└── var

1.8. Hidden files

Files or directories that are not displayed by default when you list the contents of a directory.

Used to store configuration settings or metadata that users do not usually need to see or modify directly.

You can list hidden files in current directory by typing ls -a.

Some of the most important hidden files for you, as researchers, are:

  • .bashrc: It contains the configuration for your bash shell, i.e., your bash environment (see next lectures).

  • .bash_profile: It is usually linked to .bashrc. Generally used on MacOS instead of .bashrc.

  • .gitignore: It specifies intentionally untracked files that Git should ignore.

  • .ssh/: It contains your SSH keys for connecting to remote servers (for instance CSD3).

1.9. Permissions

Every file and directory has permissions that determine who can read, write, or execute it.

You can view permissions by typing:

ls -l

Example output:

-rw-r--r--  1 bb667  users  1234 Oct  9 10:30 notes.txt

Meaning:

Symbol

Meaning

Who it applies to

r

Read (view file contents)

w

Write (modify or delete file)

x

Execute (run file or enter directory)

-

No permission

Positions 1–3

Owner permissions

The user who owns the file

Positions 4–6

Group permissions

Members of the same group

Positions 7–9

Other users

Everyone else on the system

(3 categories × 3 permissions = 9 positions)

You can change permissions with chmod, change owner with chown, etc.

Sometimes, you will see the notation:

chmod XYZ file

where XYZ is a number like 700 or 755 for example. Each of the letters r, w, x has a value (4,2,1) respectively. The digits in XYZ are just the sum of the letters values in each category (owner, group, user).

1.10. Administrator rights

To get administrator rights on file system use: sudo (superuser do). You will only be able to do that if you are a user in the sudoer list.

1.11. Home directory

The path to your home directory is stored in the environment variable $HOME.

You can print it by typing:

echo $HOME

On CSD3 it prints:

/home/<username>

where <username> is your username (e.g., bb667).

A shortcut to your home directory is ~. You can change directory to your home directory by typing:

cd ~

which is equivalent to:

cd $HOME

On Colab if you type echo $HOME it prints:

/root