To receive notifications about scheduled maintenance, please subscribe to the mailing-list gitlab-operations@sympa.ethz.ch. You can subscribe to the mailing-list at https://sympa.ethz.ch

README.md 9.03 KB
Newer Older
sfux's avatar
sfux committed
1
2
# Jupyter on Euler
This project aims to help beginner users to run simple jupyter notebooks on our HPC cluster Euler. It is not addressing advanced users that need a wide range of additional features going beyond simple jupyter notebooks.
sfux's avatar
sfux committed
3

sfux's avatar
sfux committed
4
When you run this shell script on your local computer, then it starts a Jupyter notebook in a batch job on Euler and connects your local browser with it.
5
6
7

## Requirements

8
The script assumes that you have setup SSH keys for passwordless access to the cluster. Please find some instructions on how to create SSH keys on the scicomp wiki:
9
10
11

https://scicomp.ethz.ch/wiki/Accessing_the_clusters#SSH_keys

12
Currently the script runs on Linux, Mac OS X and Windows (using WSL/WSL2 or git bash). When using a Linux computer, please make sure that xdg-open is installed. This package is used to automatically start your default browser. You can install it with the command
13
14
15
16
17
18
19
20
21
22
23
24
25

CentOS:

```
yum install xdg-utils
```

Ubuntu:

```
apt-get install xdg-utils
```

sfux's avatar
sfux committed
26
27
28
## Security token vs. password setup
Please note that a part of the script (parsing of the ports) requires that you use jupyter notebooks with the security tokens. If you configure a password instead, such that you can use the jupyter notebook without the security token, then the script will not work anymore (it cannot parse the port on the remote compute node) without adapting it.

sfux's avatar
sfux committed
29
## Using SSH keys with non-default names
sfux's avatar
sfux committed
30
Since the reopening of Euler after the cyber attack in May 2020, we recommend to the cluster users to use SSH keys.
sfux's avatar
sfux committed
31
32
33
34
```
$HOME/.ssh/id_ed25519_euler
```

sfux's avatar
sfux committed
35
You can either use the -k option of the script to specify the location of the SSH key, or even better use an SSH config file with the IdentityFile option
sfux's avatar
sfux committed
36

sfux's avatar
sfux committed
37
https://scicomp.ethz.ch/wiki/Accessing_the_clusters#How_to_use_keys_with_non-default_names
sfux's avatar
sfux committed
38

39
40
I would recommend to use the SSH config file as this works more reliably.

41
42
## Usage

sfux's avatar
sfux committed
43
44
### Install

45
46
47
Download the repository with the commnad

```
48
git clone https://gitlab.ethz.ch/sfux/Jupyter-on-Euler-or-Leonhard-Open
49
50
```

sfux's avatar
sfux committed
51
52
53
54
55
56
Mac OS X:

```
git clone https://gitlab.ethz.ch/sfux/Jupyter-on-Euler-or-Leonhard-Open.git
```

57
58
59
After downloading the script from gitlab.ethz.ch, you need to change its permissions to make it executable

```
sfux's avatar
sfux committed
60
cd Jupyter-on-Euler-or-Leonhard-Open/
61
62
63
chmod 755 start_jupyter_nb.sh
```

sfux's avatar
sfux committed
64
### Software stack
sfux's avatar
sfux committed
65
66
67
68
69
70
71
72
73
74
75
76
Please note that currently the old software stack is still set a default (this will change). The script is using the new software stack (unless you explicitly request the old software stack with the option -s old (or --softwarestack old). Therefore please make sure that you set the new software stack as permanent default by using the command

```
set_software_stack.sh new
```

You can find more information about this script on our wiki:

```
https://scicomp.ethz.ch/wiki/Setting_permanent_default_for_software_stack_upon_login
```

sfux's avatar
sfux committed
77
78
### Run Jupyter in a batch job

sfux's avatar
sfux committed
79
The start_jupyer_nb.sh script needs to be executed on your local computer. Please find below the list of options that can be used with the script:
sfux's avatar
sfux committed
80

sfux's avatar
sfux committed
81
```
sfux's avatar
sfux committed
82
$ ./start_jupyter_nb.sh -h
sfux's avatar
sfux committed
83
./start_jupyter_nb.sh: Script to start jupyter notebook/lab on Euler from a local computer
sfux's avatar
sfux committed
84
Usage: start_jupyter_nb.sh [options]
85

sfux's avatar
sfux committed
86
Required options:
sfux's avatar
sfux committed
87

sfux's avatar
sfux committed
88
        -u | --username       USERNAME         ETH username for SSH connection to Euler
sfux's avatar
sfux committed
89
90
91

Optional arguments:

sfux's avatar
sfux committed
92
93
94
95
96
97
98
        -c | --config         CONFIG_FILE      Configuration file for specifying options
        -e | --environment    ENV              Python virtual environment
        -g | --numgpu         NUM_GPU          Number of GPUs to be used on the cluster
        -h | --help                            Display help for this script and quit
        -i | --interval       INTERVAL         Time interval for checking if the job on the cluster already started
        -k | --key            SSH_KEY_PATH     Path to SSH key with non-standard name
        -l | --lab                             Start jupyter lab instead of a jupyter notebook
sfux's avatar
sfux committed
99
100
        -m | --memory         MEM_PER_CORE     Memory limit in MB per core
        -n | --numcores       NUM_CPU          Number of CPU cores to be used on the cluster
sfux's avatar
sfux committed
101
102
103
        -s | --softwarestack  SOFTWARE_STACK   Software stack to be used (old, new)
        -v | --version                         Display version of the script and exit
        -w | --workdir        WORKING_DIR      Working directory for the jupyter notebook/lab
sfux's avatar
sfux committed
104
        -W | --runtime        RUN_TIME         Run time limit for jupyter notebook/lab in hours and minutes HH:MM
sfux's avatar
sfux committed
105
106
107

Examlples:

sfux's avatar
sfux committed
108
        ./start_jupyter_nb.sh -u sfux -n 4 -W 04:00 -m 2048 -w /cluster/scratch/sfux
sfux's avatar
sfux committed
109
110
111

        ./start_jupyter_nb.sh --username sfux --numcores 2 --runtime 01:30 --memory 2048 --softwarestack new

sfux's avatar
sfux committed
112
        ./start_jupyter_nb.sh -c $HOME/.jnb_config
sfux's avatar
sfux committed
113
114
115
116
117
118

Format of configuration file:

JNB_USERNAME=""             # ETH username for SSH connection to Euler
JNB_NUM_CPU=1               # Number of CPU cores to be used on the cluster
JNB_NUM_GPU=0               # Number of GPUs to be used on the cluster
sfux's avatar
sfux committed
119
JNB_RUN_TIME="01:00"        # Run time limit for jupyter notebook/lab in hours and minutes HH:MM
sfux's avatar
sfux committed
120
121
122
123
JNB_MEM_PER_CPU_CORE=1024   # Memory limit in MB per core
JNB_WAITING_INTERVAL=60     # Time interval to check if the job on the cluster already started
JNB_SSH_KEY_PATH=""         # Path to SSH key with non-standard name
JNB_SOFTWARE_STACK="new"    # Software stack to be used (old, new)
sfux's avatar
sfux committed
124
JNB_WORKING_DIR=""          # Working directory for jupyter notebook/lab
125
JNB_ENV=""                  # Path to virtual environment
sfux's avatar
sfux committed
126
JNB_JLAB=""                 # "lab" -> start jupyter lab; "" -> start jupyter notebook
sfux's avatar
sfux committed
127
128
```

sfux's avatar
sfux committed
129
130
131
### Reconnect to a Jupyter notebook
When running the script, it creates a local file called reconnect_info in the installation directory, which contains all information regarding the used ports, the remote ip address, the command for the SSH tunnel and the URL for the browser. This information should be sufficient to reconnect to a Jupyter notebook if connection was lost.

sfux's avatar
sfux committed
132
133
134
135
136
137
138
139
140
### Running multiple notebooks in a single Jupyter instance
If you run Jupyter on the Leonhard cluster, using GPUs, then you need to make sure a notebook is correctly terminated before you can start another one.

If you don't properly close the first notebook and run a second one, then the previous notebook will still occupy some GPU memory and have processes running, which will throw some errors, when executing the second notebook.

Therefore please make sure that you stop running kernels in the "running" tab in the browser, before starting a new notebook.

### Terminate the Jupyter session

sfux's avatar
sfux committed
141
Please note that when you finish working with the jupyter notebook, you need to click on the "Quit" or "Logout" button in your Browser. "Quit" will stop the batch job running on Euler, "Logout" will just log you out from the session but not stop the batch job (in this case you need to login to the cluster, identify the job with bjobs and then kill it with the bkill command, using the jobid as parameter). Afterwards you also need to clean up the SSH tunnel that is running in the background. Example:
142
143

```
144
samfux@bullvalene:~/Jupyter-on-Euler-or-Leonhard-Open$ ps -u | grep -m1 -- "-L" | grep -- "-N"
145
samfux    8729  0.0  0.0  59404  6636 pts/5    S    13:46   0:00 ssh sfux@euler.ethz.ch -L 51339:10.205.4.122:8888 -N
146
samfux@bullvalene:~/jupyter-on-Euler-or-Leonhard-Open$ kill 8729
147
```
148

sfux's avatar
sfux committed
149
150
151
152
### Additional kernels

When using this script, you can either use the Python 3.6 Kernel, or in addition a bash kernel or an R kernel (3.6.0 on Euler, 3.5.1 on Leonhard Open)

sfux's avatar
sfux committed
153
154
155
156
157
### Installing additional Python and R packages locally

When starting a Jupyter notebook with this script, then it will use a central Python and R installation:

```
sfux's avatar
sfux committed
158
159
Old software stack: module load new gcc/4.8.2 python/3.6.1
New software stack: module load gcc/6.3.0 python/3.8.5
sfux's avatar
sfux committed
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
```

Therefore you can only use packages that are centrally installed out-of-the-box. But you have the option to install additional packages locally in your home directory, which can afterwards be used.

For installing a Python package from inside a Jupyter notebook, you would need to run the following command:

```
!pip install --user package_name
```

This will install <tt>package_name</tt> into <tt>$HOME/.local</tt>, as described on our wiki page about Python:

```
https://scicomp.ethz.ch/wiki/Python#Installing_a_Python_package.2C_using_PIP
```

The command to locally install an R package:

```
install.packages("package_name")
```

Then follow the instructions provided on our wiki:

```
https://scicomp.ethz.ch/wiki/R#Extensions
```

188
189
190
191
192
193
194
195
196
### Running with a Python Virtual Environment

You can create your own [virtual environment](https://scicomp.ethz.ch/wiki/Python_virtual_environment) in the cluster and run your jupyter notebook with that environment. Please make sure that the Python version used to create your virtual environment is compatible with the one used in the jupyter script. 

```
./start_jupyter_nb.sh -u sfux -n 4 -W 04:00 -m 2048 -w /cluster/scratch/sfux -e sample_env
```


sfux's avatar
sfux committed
197
## Main author
198
199
200
* Samuel Fux

## Contributions
sfux's avatar
sfux committed
201
* Andrei Plamada
202
* Urban Borstnik
sfux's avatar
sfux committed
203
* Steven Armstrong
sfux's avatar
sfux committed
204
* Swen Vermeul
sfux's avatar
sfux committed
205
* Jarunan Panyasantisuk
206
* Gül Sena Altıntaş
sfux's avatar
sfux committed
207
* Mikolaj Rybinski