README.md 8.24 KB
Newer Older
1
2
3
4
#### ⚠ **Important**: In order to not overburden ETH servers during the current situation, I highly recommend only downloading videos outside of peak hours, i.e. early in the morning or late at night ⚠

***

Georg Teufelberger's avatar
Georg Teufelberger committed
5
6
7
8
9
10
11
12
13
14
15
16
# vo-scraper

A python script for ETH students to download lecture videos from [video.ethz.ch](https://video.ethz.ch/).

## Requirements:
 * `requests`

Install with:

    pip3 install requests

## Setup
17
Download the file [here](https://gitlab.ethz.ch/tgeorg/vo-scraper/raw/master/vo-scraper.py?inline=false) and run with
Georg Teufelberger's avatar
Georg Teufelberger committed
18
19
20
21
22
23
24
25
26

    python3 vo-scraper.py

# FAQ

### Q: How do I use it?

#### A:

27
28
29
30
    python3 vo-scraper.py <arguments> <lecture link(s)>

To see a list of possible arguments check

Georg Teufelberger's avatar
Georg Teufelberger committed
31
32
    python3 vo-scraper.py --help

33
**For protected lectures** the vo-scraper will ask for your login credentials before downloading the video(s).
Georg Teufelberger's avatar
Georg Teufelberger committed
34

35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
### Q: How can I choose which episodes of a lecture to download?

#### A: You will be prompted with the list of episodes available for downloading for each lecture.

You can either specify single episodes by typing their indices separated by space, or add ranges in Haskell syntax, like `1..5` for `1 2 3 4`.
Ranges are upper-bound-inclusive. Custom steps sizes are supported too, e.g. `1..3..10`

You may find this example of ranges useful:

| Range     | Equivalent      | In Words                                                                                  |
|-----------|-----------------|-------------------------------------------------------------------------------------------|
| `1..4`    | `1 2 3 4`       | Episode one to four                                                                       |
| `..4`     | `0 1 2 3 4`     | All episodes up to four (the fifth)                                                       |
| `3..`     | `3 4 5 6 [...]` | All episodes starting from three (the fourth)                                             |
| `..`      | `0 1 2 3 [...]` | All episodes                                                                              |
| `2..4..6` | `2 4 6`         | Every other episodes from two to six                                                      |
| `..2..6`  | `0 2 4 6`       | Every other episodes until six (when I started paying attention)                          |
| `1..3..`  | `1 3 5 [...]`   | Every other episodes starting from the second (i.e.. all the second episodes of the week) |
| `..3..`   | `0 3 6 [...]`   | Every third episodes, starting from the beginning                                         |

Georg Teufelberger's avatar
Georg Teufelberger committed
55
56
### Q: How do I pass a file with links to multiple lectures?

Georg Teufelberger's avatar
Georg Teufelberger committed
57
#### A: Use `--file <filename>`
Georg Teufelberger's avatar
Georg Teufelberger committed
58

59
The file should have a single link for each new line. Lines starting with `#` will be ignored and can be used for comments. It should look something like this:
Georg Teufelberger's avatar
Georg Teufelberger committed
60

Georg Teufelberger's avatar
Georg Teufelberger committed
61
    https://video.ethz.ch/lectures/<department>/<year>/<spring/autumn>/XXX-XXXX-XXL.html
62
    # This is a comment
Georg Teufelberger's avatar
Georg Teufelberger committed
63
    https://video.ethz.ch/lectures/<department>/<year>/<spring/autumn>/XXX-XXXX-XXL.html
Georg Teufelberger's avatar
Georg Teufelberger committed
64
65
    ...

66
Additionally you can also add a username and password at the end of the link seperated by a single space:
Georg Teufelberger's avatar
Georg Teufelberger committed
67

68
69
    https://video.ethz.ch/lectures/<department>/<year>/<spring/autumn>/XXX-XXXX-XXL.html username passw0rd1
    ...
Georg Teufelberger's avatar
Georg Teufelberger committed
70

71
72
**Note:** This is **NOT** recommended for your NETHZ account password for security reasons!

73
### <a name="how_it_works"></a> Q: How does it acquire the videos?
Georg Teufelberger's avatar
Georg Teufelberger committed
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108

#### A: Like so:

Each lecture on [video.ethz.ch](https://video.ethz.ch/) has a JSON file with metadata associated with it.

So for example

    https://video.ethz.ch/lectures/d-infk/2019/spring/252-0028-00L.html

has its JSON file under:

    https://video.ethz.ch/lectures/d-infk/2019/spring/252-0028-00L.series-metadata.json

This JSON file contains a list of all "episodes" where the ids of all the videos of the lecture are located.

Using those ids we can access another JSON file with the video's metadata under

    https://video.ethz.ch/.episode-video.json?recordId=<ID>

Example:

    https://video.ethz.ch/.episode-video.json?recordId=3f6dee77-396c-4e2e-a312-a41a457b319f

This file contains links to all available video streams (usually 1080p, 720p, and 360p). Note that if a lecture requires a login, this file will only be accessible if you a cookie with a valid login-token!

The link to the video stream looks something like this:

    https://oc-vp-dist-downloads.ethz.ch/mh_default_org/oaipmh-mmp/<video id>/<video src id?>/presentation_XXXXXXXX_XXXX_XXXX_XXXX_XXXXXXXXXXXX.mp4

Example:

    https://oc-vp-dist-downloads.ethz.ch/mh_default_org/oaipmh-mmp/3f6dee77-396c-4e2e-a312-a41a457b319f/2bd93636-e95d-4552-8722-332a95e1a0a6/presentation_c6539ed0_1af9_490d_aec0_a67688dad755.mp4

So what the vo-scraper does is getting the list of episodes from the lecture's metadata and then acquiring the links to the videos selected by the user by accessing the videos' JSON files. Afterwards it downloads the videos behind the links.

109
### Q: How does it access lecture videos that are password protected?
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151

#### A: Like so:

There exist three (known) types of protection for lecture videos:

* `NONE` requires no login at all

* `ETH` requires logging in with a NETHZ account

* `PWD` requires logging in with a custom user name and password

What kind of protection a series of lecture videos has, can be found in its metadata file under `<lecture link>.series-metadata.json`, e.g.

    https://video.ethz.ch/lectures/d-infk/2019/spring/252-0028-00L.series-metadata.json

This JSON file has a field called `protection` with a value corresponding to one of the three protection types.

If the series is protected then a cookie containing an authentication token needs to be sent when requesting the individual videos' metadata file at `https://video.ethz.ch/.episode-video.json?recordId=<ID>`

Getting a cookie with a valid token differs between videos that require a NETHZ login and videos that use custom credentials.

For NETHZ logins we need to send a POST request to `https://video.ethz.ch/j_security_check` with the following headers:

    Content-Type: application/x-www-form-urlencoded
    CSRF-Token: undefined
    User-Agent: Mozilla/5.0

as well as the following parametres:

    __charset__: utf-8
    j_validate: True
    j_username: <NETHZ username>
    j_password: <NETHZ password>

For logins with custom credentials we have to perforn a POST request to `<lecture link>.series-login.json`, e.g.:

    https://video.ethz.ch/lectures/d-infk/2020/spring/252-0220-00L.series-login.json

with the following headers:

    Referer: <lecture link>.html
    User-Agent: Mozilla/5.0
Georg Teufelberger's avatar
Georg Teufelberger committed
152

153
154
155
156
157
158
159
as well as the following parametres:

    __charset__: utf-8
    username: <custom username>
    password: <custom password>

In both cases we get back a cookie which we then can include when requesting the individual video metdata files.
Georg Teufelberger's avatar
Georg Teufelberger committed
160
161
162
163
164
165

### Q: It doesn't work for my lecture. What can I do to fix it?

#### A: Follow these steps:
1. Make sure you have connection to [video.ethz.ch](https://video.ethz.ch/). The scraper should let you know when there's no connection.
2. Try running it again. Sometimes random issues can throw it off.
166
3. If the lecture is password protected, make sure you use the correct credentials. Most protected lectures require your NETHZ credentials while some use a custom username and password.
167
168
169
170
171
4. Make sure you're running the newest version of the scraper by re-downloading the script from the repository. There might have been an update.
5. Check whether other lectures still work. Maybe the site was updated which broke the scraper.
6. Enable the debug flag with `-v` and see whether any of the additional information now provided is helpful.
7. Check "[How does it acquire the videos?](#how_it_works)" and see whether you can manually reach the video in your browser following the steps described there.
8. After having tried all that without success, feel free to open up a new issue. Make sure to explain what you have tried and what the results were. There is no guarantee I will respond within reasonable time as I'm a busy student myself. If you can fix the issue yourself, feel free to open a merge request with the fix.
Georg Teufelberger's avatar
Georg Teufelberger committed
172
173
174
175


### Q: Can you fix *X*? Can you implement feature *Y*?

176
#### A: Feel free to open an issue [here](https://gitlab.ethz.ch/tgeorg/vo-scraper/issues). Merge requests are always welcome but subject to my own moderation.
Georg Teufelberger's avatar
Georg Teufelberger committed
177
178
***

Georg Teufelberger's avatar
Georg Teufelberger committed
179
Loosely based on https://gitlab.ethz.ch/dominik/infk-vorlesungsscraper