Some thoughts about current design problems
While this is a nice example how to get a Jupyter notebook working on Euler, it also displays some problems:
- the notebook server itself should not be part of a bsub command, simply because it is not the notebook server which is doing the hard work
- it is the kernel which is actually executing a code and doing the hard work
- I am not an expert in bsub, but any bsub command should always end some time and not run forever, doing nothing.
- on the other hand, a notebook server should stay almost forever (hence the name, «server») and not suddenly being dropped, just because the Euler job is ending
- but if we put the bsub command with
jupyter notebook &
in the background, the job executes forever and possibly blocks other jobs from being executed (am I wrong?) - currently, just starting a notebook server can take a very long while, until the cluster finally picks up my request
- users can easily wait until an actual job is finished, but they should not be waiting just for a notebook to be opened!
All this being said: we need to design the whole setup differently. We need to write a Jupyter kernel which will wrap a cell into a bsub command and execute it on the cluster. To keep the user sessions alive, they should be hosted with JupyterHub on a dedicated machine outside of Euler.
To say it in different words: A Jupyter notebook makes use of the so called REPL (Read-Eval-Print-Loop) environment. This is a fundamentally different principle than the batch job system we have on Euler. REPL is most of the time idle and waits for input (ideal for interactive programming!), whereas a batch job system like bsub tries to maximize the performance for all users by organising their jobs in time and putting them in a queue.
Rok's fork of an existing project in this direction might be a good starting point: