... | ... | @@ -5,12 +5,7 @@ programming language you want to use for this lab. However, we suggest |
|
|
you use Java (or Python).
|
|
|
|
|
|
Before you start, you may want to read up on the [basics of
|
|
|
HTML](http://www.w3schools.com/html/html_basic.asp). Additionally, a
|
|
|
useful resource that deals with crawling structured content from a
|
|
|
website can be found
|
|
|
[here](http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html).
|
|
|
This specific guide was written for Python, but similar tools exist for
|
|
|
other programming languages as well. In Java you can use jsoup to fetch
|
|
|
HTML](http://www.w3schools.com/html/html_basic.asp). In Java you can use jsoup to fetch
|
|
|
and analyze the web pages. The [jsoup
|
|
|
documentation](https://jsoup.org/cookbook/extracting-data/dom-navigation)
|
|
|
explains how you can navigate a document.
|
... | ... | |