... | ... | @@ -39,19 +39,19 @@ Hints: |
|
|
## Exercise 2 - Crawl Rotten Tomatoes
|
|
|
|
|
|
To keep web traffic low and reduce the risk of being blacklisted, we
|
|
|
have cloned some Rotten Tomatoes pages and are hosting them locally. You
|
|
|
have cloned some Rotten Tomatoes pages. You
|
|
|
can access the detail page through a unique URL. Combine the year and
|
|
|
movie title like this: <http://disco-crawler-lab.tik.ee.ethz.ch/m/year/title> to access the
|
|
|
local clone of the movie detail page. (Transform the movie title to
|
|
|
clone of the movie detail page. (Transform the movie title to
|
|
|
lower case. Remove any apostrophe characters (’) and replace spaces and
|
|
|
slashes (/) with underline characters (\_)).
|
|
|
|
|
|
1. Visit any of the local movie sites. Which element contains the
|
|
|
1. Visit any of the movie sites. Which element contains the
|
|
|
[tomatometer](https://en.wikipedia.org/wiki/Rotten_Tomatoes#Critic_aggregate_score)
|
|
|
score of the movie? Which element contains the audience score?
|
|
|
|
|
|
2. Access each of the cloned websites and extract the tomatometer and
|
|
|
the audience score. Some movies are missing on our local server.
|
|
|
the audience score. Some movies are missing on our server.
|
|
|
Also, occasionally, you’ll see movies that don’t have a tomatometer
|
|
|
score. Think about how you want to handle such a missing movie page
|
|
|
or tomatometer score.
|
... | ... | |