Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • mlange/VSCode_remote_HPC
  • umarka/VSCode_remote_HPC
  • lhausammann/VSCode_remote_HPC
  • zhouhao/VSCode_remote_HPC
  • sfux/VSCode_remote_HPC
5 results
Show changes
Commits on Source (31)
......@@ -4,6 +4,38 @@ This script can be used to start a batch job on the cluster and then connect Mic
https://medium.com/@isaiah.taylor/use-vs-code-on-a-supercomputer-15e4cbbb1bc2
## Requirements
The script assumes that you have setup SSH keys for passwordless access to the cluster. Please find some instructions on how to create SSH keys on the scicomp wiki:
https://scicomp.ethz.ch/wiki/Accessing_the_clusters#SSH_keys
Currently the script runs on Linux, Mac OS X and Windows (using WSL/WSL2 or git bash). When using a Linux computer, please make sure that xdg-open is installed. This package is used to automatically start your default browser. You can install it with the command
CentOS:
```
yum install xdg-utils
```
Ubuntu:
```
apt-get install xdg-utils
```
## Using SSH keys with non-default names
Since the reopening of Euler after the cyber attack in May 2020, we recommend to the cluster users to use SSH keys.
```
$HOME/.ssh/id_ed25519_euler
```
You can either use the -k option of the script to specify the location of the SSH key, or even better use an SSH config file with the IdentityFile option
https://scicomp.ethz.ch/wiki/Accessing_the_clusters#How_to_use_keys_with_non-default_names
I would recommend to use the SSH config file as this works more reliably.
## Preparation
The preparation steps only need to be executed once. You need to carry out those steps to set up the basic configuration for your ETH account with regards to the code-server.
......@@ -12,7 +44,7 @@ The preparation steps only need to be executed once. You need to carry out those
* Start and interactive job with
```
bsub -Is -W 0:10 -n 1 -R "rusage[mem=2048]" bash
srun --ntasks=1 --time=00:10:00 --mem-per-cpu=2048 --pty bash
```
When using Euler, switch to the new software stack (in case you haven't set it as default yet), either using
......@@ -53,7 +85,19 @@ This will setup the local configuration (including a password for you) and store
After the server started, terminate it with ctrl+c
## Running the script
## Usage
### Install
Download the repository with the command
```
git clone https://gitlab.ethz.ch/sfux/VSCode_remote_HPC.git
```
### Run VSCode in a batch job
The start_vscode.sh script needs to be executed on your local computer. Please find below the list of options that can be used with the script:
```
$ ./start_vscode.sh --help
......@@ -67,6 +111,7 @@ Options:
-n | --numcores NUM_CPU Number of CPU cores to be used on the cluster
-W | --runtime RUN_TIME Run time limit for the code-server in hours and minutes HH:MM
-m | --memory MEM_PER_CORE Memory limit in MB per core
-b | --batchsys BATCH_SYS Batch system to use (LSF or SLURM)
Optional arguments:
......@@ -76,12 +121,14 @@ Optional arguments:
-i | --interval INTERVAL Time interval for checking if the job on the cluster already started
-k | --key SSH_KEY_PATH Path to SSH key with non-standard name
-v | --version Display version of the script and exit
-j | --jobargs JOB_ARGS Additional job arguments
Examlples:
./start_vscode.sh -u sfux -n 4 -W 04:00 -m 2048
Examples:
./start_vscode.sh --username sfux --numcores 2 --runtime 01:30 --memory 2048
./start_vscode.sh -u sfux -b SLURM -n 4 -W 04:00 -m 2048
./start_vscode.sh --username sfux --batchsys SLURM --numcores 2 --runtime 01:30 --memory 2048
./start_vscode.sh -c /c/Users/sfux/.vsc_config
......@@ -94,3 +141,28 @@ VSC_RUN_TIME="01:00" # Run time limit for the code-server in hours and mi
VSC_MEM_PER_CPU_CORE=1024 # Memory limit in MB per core
VSC_WAITING_INTERVAL=60 # Time interval to check if the job on the cluster already started
VSC_SSH_KEY_PATH="" # Path to SSH key with non-standard name
VSC_BATCH_SYSTEM="SLURM" # Batch system to use (SLURM or LSF)
VSC_JOB_ARGS="" # Additional job arguments
```
### Reconnect to a code-server session
When running the script, it creates a local file called reconnect_info in the installation directory, which contains all information regarding the used ports, the remote ip address, the command for the SSH tunnel and the URL for the browser. This information should be sufficient to reconnect to a code-server session if connection was lost.
## Cleanup after the job
Please note that when you finish working with the code-server, you need to login to the cluster, identify the job with bjobs and then kill it with the bkill command, using the jobid as parameter). Afterwards you also need to clean up the SSH tunnel that is running in the background. Example:
```
$ ps -u | grep -m1 -- "-L" | grep -- "-N"
samfux 8729 0.0 0.0 59404 6636 pts/5 S 13:46 0:00 ssh sfux@euler.ethz.ch -L 51339:10.205.4.122:8888 -N
$ kill 8729
```
This example is from a Linux computer. If you are using git bash on Windows, then you can find the SSH process with the ps kommand and use kill to stop it.
## Main author
* Samuel Fux
## Contributions
* Andreas Lugmayr
* Mike Boss
* Nadia Marounina
#!/bin/bash
if [[ $# -lt 1 ]]
then
echo -e "Error: No ETH username is specified, terminating script\n"
exit 1
fi
VSC_USERNAME=$1
VSC_TUNNEL=$(cat reconnect_info | grep -o -E '[0-9]+:([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+):[0-9]+')
TUNNEL_JOBS=$(ps -u | grep $VSC_TUNNEL | grep ssh | awk '{ print $2 }')
for TUNNEL_JOB in $TUNNEL_JOBS; do echo $TUNNEL_JOB; kill $TUNNEL_JOB; done
ssh -T $VSC_USERNAME@euler.ethz.ch bkill $(cat reconnect_info | grep BJOB | awk '{print $NF}')
......@@ -2,16 +2,19 @@
###############################################################################
# #
# Script to run on a local computer to start a code-server on Euler and #
# connect it with a local browser to it #
# Script for local computer to start a code-server on Euler and use a SSH- #
# tunnel to connect it with a local browser #
# #
# Main author : Samuel Fux #
# Contributions : Andreas Lugmayr #
# Date : October 2021 #
# Contributions : Andreas Lugmayr, Mike Boss, Nadia Marounina, #
# Haoyang Zhou, Loïc Hausammann #
# Date : Oct 2021-2023 #
# Location : ETH Zurich #
# Version : 0.1 #
# Change history : #
# #
# 05.05.2023 Added variable for standard port #
# 24.10.2022 Added Slurm support #
# 19.05.2022 JOBID is now saved to reconnect_info file #
# 28.10.2021 Initial version of the script based on Jupyter script #
# #
###############################################################################
......@@ -21,7 +24,7 @@
###############################################################################
# Version
VSC_VERSION="0.1"
VSC_VERSION="0.5"
# Script directory
VSC_SCRIPTDIR=$(pwd)
......@@ -34,30 +37,39 @@ VSC_HOSTNAME="euler.ethz.ch"
# 2. Command line options overwrite defaults
# 3. Config file options overwrite command line options
# Configuration file default : $HOME/.vsc_config
# Configuration file default : $HOME/.vsc_config
VSC_CONFIG_FILE="$HOME/.vsc_config"
# Username default : no default
# Username default : no default
VSC_USERNAME=""
# Number of CPU cores default : 1 CPU core
# Number of CPU cores default : 1 CPU core
VSC_NUM_CPU=1
# Runtime limit default : 1:00 hour
# Runtime limit default : 1:00 hour
VSC_RUN_TIME="01:00"
# Memory default : 1024 MB per core
# Memory default : 1024 MB per core
VSC_MEM_PER_CPU_CORE=1024
# Number of GPUs default : 0 GPUs
# Number of GPUs default : 0 GPUs
VSC_NUM_GPU=0
# Waiting interval default : 60 seconds
# Waiting interval default : 60 seconds
VSC_WAITING_INTERVAL=60
# SSH key location default : no default
# SSH key location default : no default
VSC_SSH_KEY_PATH=""
# Batch system : Slurm
VSC_BATCH_SYSTEM="SLURM"
# Additional job arguments default : no default
VSC_JOB_ARGS=""
# Standard port for code-server : 8899
VSC_REMOTE_PORT=8899
###############################################################################
# Usage instructions #
###############################################################################
......@@ -68,12 +80,13 @@ $0: Script to start a VSCode on Euler from a local computer
Usage: start_vscode.sh [options]
Options:
Required options:
-u | --username USERNAME ETH username for SSH connection to Euler
-b | --batchsys BATCH_SYS Batch system to use (LSF or SLURM)
-m | --memory MEM_PER_CORE Memory limit in MB per core
-n | --numcores NUM_CPU Number of CPU cores to be used on the cluster
-u | --username USERNAME ETH username for SSH connection to Euler
-W | --runtime RUN_TIME Run time limit for the code-server in hours and minutes HH:MM
-m | --memory MEM_PER_CORE Memory limit in MB per core
Optional arguments:
......@@ -81,14 +94,17 @@ Optional arguments:
-g | --numgpu NUM_GPU Number of GPUs to be used on the cluster
-h | --help Display help for this script and quit
-i | --interval INTERVAL Time interval for checking if the job on the cluster already started
-j | --jobargs JOB_ARGS Additional job arguments
-k | --key SSH_KEY_PATH Path to SSH key with non-standard name
-p | --port CS_PORT Port number to be used by code-server
-v | --version Display version of the script and exit
Examlples:
./start_vscode.sh -u sfux -n 4 -W 04:00 -m 2048
Examples:
./start_vscode.sh --username sfux --numcores 2 --runtime 01:30 --memory 2048
./start_vscode.sh -u sfux -b SLURM -n 4 -W 04:00 -m 2048
./start_vscode.sh --username sfux --batchsys SLURM --numcores 2 --runtime 01:30 --memory 2048
./start_vscode.sh -c $HOME/.vsc_config
......@@ -101,6 +117,9 @@ VSC_RUN_TIME="01:00" # Run time limit for the code-server in hours and mi
VSC_MEM_PER_CPU_CORE=1024 # Memory limit in MB per core
VSC_WAITING_INTERVAL=60 # Time interval to check if the job on the cluster already started
VSC_SSH_KEY_PATH="" # Path to SSH key with non-standard name
VSC_BATCH_SYSTEM="SLURM" # Batch system to use (SLURM or LSF)
VSC_JOB_ARGS="" # Additional job arguments
VSC_REMOTE_PORT=8899 # Port to be used with the code-server
EOF
exit 1
......@@ -160,6 +179,19 @@ do
shift
shift
;;
-b|--batchsys)
VSC_BATCH_SYSTEM=$2
shift
shift
;;
-j|--jobargs)
VSC_JOB_ARGS=$2
shift
shift
;;
-p|--port)
VSC_REMOTE_PORT=$2
;;
*)
echo -e "Warning: ignoring unknown option $1 \n"
shift
......@@ -267,6 +299,28 @@ else
echo -e "Using SSH key $VSC_SSH_KEY_PATH"
fi
# check if VSC_BATCH_SYSTEM is set to SLURM or LSF
case $VSC_BATCH_SYSTEM in
LSF)
echo -e "Using LSF batch system"
;;
SLURM)
echo -e "Using Slurm batch system"
;;
*)
echo -e "Error: Unknown batch system $VSC_BATCH_SYSTEM. Please either specify LSF or SLURM as batch system"
;;
esac
# check if VSC_REMOTE_PORT an integer
if ! [[ "$VSC_REMOTE_PORT" =~ ^[0-9]+$ ]]; then
echo -e "Error: $VSC_REMOTE_PORT -> Incorrect format. Please specify the port number as an integer and try again\n"
display_help
fi
echo -e "Using port number $VSC_REMOTE_PORT for the code-server"
# put together string for SSH options
VSC_SSH_OPT="$VSC_SKPATH $VSC_USERNAME@$VSC_HOSTNAME"
......@@ -296,15 +350,42 @@ ENDSSH
###############################################################################
# run the code-server job on Euler and save the ip of the compute node in the file vscip in the home directory of the user on Euler
echo -e "Connecting to $VSC_HOSTNAME to start the code-server in a batch job"
# FIXME: save jobid in a variable, that the script can kill the batch job at the end
ssh $VSC_SSH_OPT bsub -n $VSC_NUM_CPU -W $VSC_RUN_TIME -R "rusage[mem=$VSC_MEM_PER_CPU_CORE]" $VSC_SNUM_GPU <<ENDBSUB
echo -e "Connecting to $VSC_HOSTNAME to start the code-server in a $BATCH_SYS batch job"
case $VSC_BATCH_SYSTEM in
"LSF")
VSC_BJOB_OUT=$(ssh $VSC_SSH_OPT bsub -n $VSC_NUM_CPU -W $VSC_RUN_TIME -R "rusage[mem=$VSC_MEM_PER_CPU_CORE]" $VSC_SNUM_GPU $VSC_JOB_ARGS<<ENDBSUB
module load $VSC_MODULE_COMMAND
export XDG_RUNTIME_DIR="\$HOME/vsc_runtime"
VSC_IP_REMOTE="\$(hostname -i)"
echo "Remote IP:\$VSC_IP_REMOTE" >> /cluster/home/$VSC_USERNAME/vscip
code-server --bind-addr=\${VSC_IP_REMOTE}:\${VSC_REMOTE_PORT}
ENDBSUB
) ;;
"SLURM")
VSC_RUN_TIME="${VSC_RUN_TIME}":00" "
if [ "$VSC_NUM_GPU" -gt "0" ]; then
VSC_SNUM_GPU="-G $VSC_NUM_GPU"
fi
VSC_BJOB_OUT=$(ssh $VSC_SSH_OPT sbatch --ntasks=1 --cpus-per-task=$VSC_NUM_CPU "--time=$VSC_RUN_TIME" "--mem-per-cpu=$VSC_MEM_PER_CPU_CORE" -e "error.dat" $VSC_SNUM_GPU $VSC_JOB_ARGS<<ENDBSUB
#!/bin/bash
module load $VSC_MODULE_COMMAND
export XDG_RUNTIME_DIR="\$HOME/vsc_runtime"
VSC_IP_REMOTE="\$(hostname -i)"
echo "Remote IP:\$VSC_IP_REMOTE" >> /cluster/home/$VSC_USERNAME/vscip
code-server --bind-addr=\${VSC_IP_REMOTE}:8899
code-server --bind-addr=\${VSC_IP_REMOTE}:\${VSC_REMOTE_PORT}
ENDBSUB
)
;;
esac
# TODO: get jobid for both cases (LSF/Slurm)
# store jobid in a variable
# VSC_BJOB_ID=$(echo $VSC_BJOB_OUT | awk '/is submitted/{print substr($2, 2, length($2)-2);}')
# wait until batch job has started, poll every $VSC_WAITING_INTERVAL seconds to check if /cluster/home/$VSC_USERNAME/vscip exists
# once the file exists and is not empty the batch job has started
......@@ -318,10 +399,9 @@ ENDSSH
# give the code-server a few seconds to start
sleep 7
# get remote ip, port and token from files stored on Euler
# get remote ip and token from files stored on Euler
echo -e "Receiving ip, port and token from the code-server"
VSC_REMOTE_IP=$(ssh $VSC_SSH_OPT "cat /cluster/home/$VSC_USERNAME/vscip | grep -m1 'Remote IP' | cut -d ':' -f 2")
VSC_REMOTE_PORT=8899
# check if the IP, the port and the token are defined
if [[ "$VSC_REMOTE_IP" == "" ]]; then
......@@ -347,6 +427,10 @@ VSC_LOCAL_PORT=$((3 * 2**14 + RANDOM % 2**14))
echo -e "Using local port: $VSC_LOCAL_PORT"
# write reconnect_info file
#
# FIXME: add jobid
# BJOB ID : $VSC_BJOB_ID
cat <<EOF > $VSC_SCRIPTDIR/reconnect_info
Restart file
Remote IP address : $VSC_REMOTE_IP
......@@ -368,7 +452,7 @@ sleep 5
# save url in variable
VSC_URL=http://localhost:$VSC_LOCAL_PORT
echo -e "Starting browser and connecting it to the code-server"
echo -e "Connecting to url $VSc_URL"
echo -e "Connecting to url $VSC_URL"
# start local browser if possible
if [[ "$OSTYPE" == "linux-gnu" ]]; then
......@@ -380,4 +464,4 @@ elif [[ "$OSTYPE" == "msys" ]]; then # Git Bash on Windows 10
else
echo -e "Your operating system does not allow to start the browser automatically."
echo -e "Please open $VSC_URL in your browser."
fi
fi
\ No newline at end of file
......@@ -5,3 +5,4 @@ VSC_RUN_TIME="01:00" # Run time limit for the code-server in hours and mi
VSC_MEM_PER_CPU_CORE=1024 # Memory limit in MB per core
VSC_WAITING_INTERVAL=60 # Time interval to check if the job on the cluster already started
VSC_SSH_KEY_PATH="" # Path to SSH key with non-standard name
VSC_JOB_ARGS="" # Additional arguments when submitting the job