Commit 4507cd8d authored by Florian Landis's avatar Florian Landis 👾
Browse files

Florian's draft of euler_instruction.md

parent d5e9e659
### **Euler instructions**
#### Useful bash commands ####
Euler's command line interface (shell) is called "bash".
The scicomp wiki gives a good [overview over a list of bash commands](https://scicomp.ethz.ch/wiki/Linux_command_line).
We just give a short list of useful applications of these commands:
- `Ctrl-p` and `Ctrl-n` give previous and next commands in the command history.
- `Ctrl-r <expression>` searches for `<expression>` in the command history.
- Use `cd` for **c**hanging **d**irectories.
E.g., `cd nexus-e/Run_Nexuse` to go to the directory where the main matlab script `run_Nexuse.m` lives.
Note: avoid white space in directory or file names **like the plague**.
- Use `ls` for **l**i**s**ting the contents of the current directory
- Use `rm <filename>` for deleting file with name filename
- Use `rm -r <foldername>` for recursively deleting the contents of folder
- Use `pwd` for printing the current working directory
- `grep <expression> <filename>` prints all lines of file(s) `filename` in which `expression` appears.
Use option `-i` for case-**i**nsensitive search.
- `<command1> | <command2>` employs a "**pipe**" (`|`) to redirect output of command1 as input to command2.
E.g., `ls | grep <filename>` sends the listing of the current directory contents to grep, which looks for lines in which filename appears.
If this command does not return anything, filename does not exist in the current directory.
- Wildcards: bash accepts several wildcards for file name or string completion. E.g.,
+ `*` stands for a sequence of characters of arbitrary length
+ `?` stands for one character
`ls lsf.o14*`, e.g., will list all files with names starting with *lsf.o14*.
- Use `scp` for making **s**ecure (remote) **c**o**p**ies of files and folders.
E.g. enter
`scp ./run_Nexuse.m <username>@euler.ethz.ch:/cluster/home/<username>/nexus-e/Run_Nexuse/run_Nexuse.m`
To transfer a local copy of `run_Nexuse.m` to `~/nexus-e/Run_Nexuse/run_Nexuse.m` on Euler.
Issue this command on your local machine!
This command may come in handy for scripting certain things but is unnecessary if you are happy to make all file transfers using FileZilla.
#### Euler specific commands ####
##### Account information #####
To access information about your account on Euler
- `lquota` checks the amount of data and files you have on the cluster
- `busers -w` shows resource usage
- `my_share_info` returns your user group
##### Modules #####
Diverse commands exist for organizing and checking the modules that Euler has loaded in your environment.
Generally, running Nexus-e should work fine if the modules listed under [Setup](setup.html#prepare-dependencies) are loaded.
Here's a list anyway (see [scicomp wiki](https://scicomp.ethz.ch/wiki/Setting_up_your_environment) for more explanation around the commands).
- `module load new` loads all new modules
- `module list` shows currently loaded modules
- `module avail` shows all available modules
- `module help <module_name>` brief description
- `module show <module_name>` what the module would do
- `module load <module_name>` load a module
- `which icc` check the compiler
- `module unload <module_name>` unload module
<!-- #### Load modules automatically #### -->
<!-- put the modules in the shell's configuration -->
<!-- script -->
<!-- $HOME/.bash_profile -->
##### Batch system: How to run Nexus-e #####
On Euler, users are asked to run large jobs using Euler's batch system.
Scicomp gives an extensive [description](https://scicomp.ethz.ch/wiki/Using_the_batch_system) of this system.
Here we summarize what is useful for running Nexus-e on Euler.
Generally, commands like
```bsub -n 36 -R "model==XeonGold_6150" -R "rusage[mem=5180]" -W "10:00" matlab -r run_Nexuse```
will be used to run Nexus-e from folder nexus-e/Run_Nexuse on Euler.
- `-W "10:00"` gives Euler 10 hours to run the process
- `-R "rusage[mem=5180]"` tells Euler to allocate 5180 MB of RAM per core (default is 1GB)
- `-R "model==XeonGold_6150"` tells Euler to use the XeonGold\_6150 nodes
Available nodes:
+ **XeonGold\_6150** High performance nodes (36 cores, max memory 192 GB per node)
+ **XeonE5\_2680v3** High memory nodes (24 cores, max memory 512 GB per node)
+ **EPYC\_7742** AMD nodes (128 cores, max memory 512 GB per node)
Note that requiring for specific nodes may imply longer queuing times.
- `-n 36` tells Euler to use 36 processors
- `matlab -r run_Nexuse` tells Euler to run the script run\_Nexuse using matlab
Other useful options are the following.
- Add `-nojvm` to matlab command to prevent Java from being used.
`bsub [...] matlab -nojvm -r run_Nexuse`
- Add `-nodisplay` to matlab command to explicitly tell matlab that no graphical interface is available.
`bsub [...] matlab -nodisplay -r run_Nexuse`
- Set the output filename using `-o <output_filename>`. Default is `lsf.o<JobID>`.
- For [parallel computation using OpenMP](https://scicomp.ethz.ch/wiki/Using_the_batch_system#OpenMP), the number of processor cores available needs to be specified using the environmental variable OMP\_NUM\_THREADS.
Set this with the command
`export OMP_NUM_THREADS=<number_of_cores>`
before issuing your `bsub ...` command.
Consider writing this to .bash_profile if you don't want to repeat this each session.
- To use [job arrays](https://scicomp.ethz.ch/wiki/Job_arrays) for parallel calculation, use option
`-J "arrayname[1-10]" ./program [arguments]`
If some jobs fail, it's possible to rerun only those:
`brequeue -e <JOBID>`
##### Batch system: How to check the status of your jobs #####
- `bjobs` lists jobs
- `bjobs -p` lists only pending jobs
- `bjobs -l` lists jobs with more details
- `bkill <JOBID>` kills job with JOBID
- `bpeek <JOBID>` checks the output of a particular running job
- `bbjobs <JOBID>` checks resources used
- `bjobs -l -aff` is another method for checking the resources used
#### Recommendations for running Nexus-e ####
##### Experience with resource usage #####
**8-day time resolution with convergence criterium 2 percent**
For this, change line
`tpRes = 2;`
in run\_Nexuse.m to
`tpRes = 8;`
and make sure that limDifference is set to 0.02 in line
`limDifference = 0.02; % Convergence limit (0.02 coresponds to 2% difference in demand)`.
When running Nexus-e with a 8-day time resolution and energy-econonmic convergence criterion 0.02, the command
`bsub -n 36 -R "rusage[mem=2180]" -W "30:00" matlab -r run_Nexuse`
works fine.
**8-day time resolution with convergence criterium 0.1 percent**
As above, but set `limDifference = 0.001` in run\_Nexuse.m and give Euler more time for solving:
`bsub -n 36 -R "rusage[mem=2180]" -W "60:00" matlab -r run_Nexuse`. (This is still being tested as of October 22 2020).
##### Finding your way around output on Euler #####
After you issue a batch job through `bsub`, Euler will respond by telling you what the jobID is.
After completion, the ‘standard output’ of running Nexus-e will be written to lsf.jobID and can inspected there using the text [editor of your choice](#viewing-output).
While the job is running, however, standard output can be accessed by means of the command
```bpeek```
which in turn writes to standard output of your console.
To more conveniently browse this, several options exist:
- write the output of bpeek to a file:
`bpeek>yourfilename.txt`
and then look at it in the text [editor of your choice](#viewing-output).
- use a pipe (‘|’) to find lines including certain ‘patterns’ with grep:
`bpeek | grep <pattern>`
for example,
`bpeek | grep ‘maximum difference’`
will print all lines that contain information about Gemel’s convergence criterion in given iterations.
- use a pipe (‘|’) to display the output of bpeek using the ‘less’ command:
`bpeek | less`
This will bring up an interface that lets the user browse through the output of bpeek without using the mouse.
Some basic functions of the interface:
+ press “Space” to go down one page,
+ press “q” to **q**uit the “less” interface,
+ press “/” to enter a pattern to search for.
- Pressing “n” (n for next) after having searched for a pattern jumps to the next instance of the pattern;
- pressing “N” jumps to the previous.
+ A relative to “/” (forward search) is “?” (backward search).
+ “G” **g**oes to the end of bpeek’s output, “g” to the beginning.
###### Finding stuff in lsf.o\<JobID\> files ######
- `grep -i <expression> lsf*`
searches for ```<expression>``` in all files starting with lsf.
E.g.,
`grep -inH nexus_disagg_nuc50_oct20 lsf*`
searches for copies of `database nexuse_disagg_nuc50` mentioned in lsf* files that where made on October 20.
This may be helpful for identifying the copies of the database that can safely be removed ("dropped") from the PSL server.
#### Viewing output ####
The output of Matlab is directed to files `lsf.o<JobID>`. To inspect this output, open it in your editor of choice.
- Several editors are available on Euler, e.g.,
- emacs
- vim
- nano
nano may be a good choice if you don't know emacs or vim (if you do know emacs or vim, you will have [strong opinions](https://en.wikipedia.org/wiki/Editor_war) on which one to use).
Nano gives some on-screen instructions on basic key combinations.
(E.g., `^G` means _type g while holding Ctrl pressed_).
- For better user experience, copy files to your local computer for viewing with a GUI editor (e.g., using [FileZilla](https://filezilla-project.org/)).
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment