... | ... | @@ -16,6 +16,20 @@ terminal.--> |
|
|
|
|
|
In Python you can use [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/). You can install it with `pip install beautifulsoup4`.
|
|
|
|
|
|
`import urllib.request as urllib
|
|
|
from bs4 import BeautifulSoup
|
|
|
|
|
|
# variables
|
|
|
WEBSITE = 'https://www.google.ch/'
|
|
|
|
|
|
# read source code
|
|
|
response = urllib.urlopen(WEBSITE)
|
|
|
page_source = response.read()
|
|
|
|
|
|
# parse source code and print it
|
|
|
soup = BeautifulSoup(page_source, 'html.parser')
|
|
|
print(soup.prettify())`
|
|
|
|
|
|
## Exercise 1 - Crawl Academy Awards for Best Actor/Actress
|
|
|
|
|
|
We’ve prepared a [website](http://disco-crawler-lab.tik.ee.ethz.ch/academyawardnominees/) that
|
... | ... | |