LC: 1236. Web Crawler
https://leetcode.com/problems/web-crawler/
1236. Web Crawler
Given a url startUrl and an interface HtmlParser, implement a web crawler to crawl all links that are under the same hostname as startUrl.
Return all urls obtained by your web crawler in any order.
Your crawler should:
Start from the page:
startUrlCall
HtmlParser.getUrls(url)to get all urls from a webpage of given url.Do not crawl the same link twice.
Explore only the links that are under the same hostname as
startUrl.

As shown in the example url above, the hostname is example.org. For simplicity sake, you may assume all urls use http protocol without any port specified. For example, the urls http://leetcode.com/problems and http://leetcode.com/contest are under the same hostname, while urls http://example.org/test and http://example.com/abc are not under the same hostname.
The HtmlParser interface is defined as such:
Below are two examples explaining the functionality of the problem, for custom testing purposes you'll have three variables urls, edges and startUrl. Notice that you will only have access to startUrl in your code, while urls and edges are not directly accessible to you in code.
Example 1:

Example 2:

Constraints:
1 <= urls.length <= 10001 <= urls[i].length <= 300startUrlis one of theurls.Hostname label must be from 1 to 63 characters long, including the dots, may contain only the ASCII letters from 'a' to 'z', digits from '0' to '9' and the hyphen-minus character ('-').
The hostname may not start or end with the hyphen-minus character ('-').
You may assume there're no duplicates in url library.
The Essence:
Die Weblinke sind hier wie die Knoten eines Graphs. Wir müssen die Nachbarn eines gesuchten Knotens finden und den Graphen auch durch sie durchlaufen, wenn sie nicht schon besucht sind.
Details:
Für die Traversierung kann man Breitensuche oder Tiefensuche benutzen. Man kann daneben die String-Funktionen der Programmiersprachen verwenden, um die Hostnamen zu vergleichen.
Solution(s):
Default Code:
Last updated