... | ... | @@ -16,7 +16,9 @@ terminal.--> |
|
|
|
|
|
In Python you can use [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/). You can install it with `pip install beautifulsoup4`.
|
|
|
|
|
|
`import urllib.request as urllib
|
|
|
Here a short example how to parse a website:
|
|
|
```python
|
|
|
import urllib.request as urllib
|
|
|
from bs4 import BeautifulSoup
|
|
|
|
|
|
# variables
|
... | ... | @@ -28,7 +30,7 @@ page_source = response.read() |
|
|
|
|
|
# parse source code and print it
|
|
|
soup = BeautifulSoup(page_source, 'html.parser')
|
|
|
print(soup.prettify())`
|
|
|
print(soup.prettify())```
|
|
|
|
|
|
## Exercise 1 - Crawl Academy Awards for Best Actor/Actress
|
|
|
|
... | ... | |