Webmirror 2.0, Retrieval Domain
Web mirroring programs usually mirror a single site, or a certain subdirectory
from a single site. WEBMIRROR PRO v2.0 on the other hand mirrors a whole domain of
web pages.
The user in the RDF file define several include and exclude
commands that decide which urls to download and which urls are not to download.
A page is downloaded if its url matched any of the include command and
does not match any of the exclude commands. All these pages form the retrieval domain.
To be 100% precise a page is in the retrieval domain if
- URL matches any of the patterns defined in the include commands.
- URL does not match any of the patterns defined in the exclude commands.
- The page can be reached from at least one of the start pages following standard
html links with less than n steps going through pages that are all in the
retrieval domain. n is defined in the command level.
TOC