datafireHome | Products | Services | Support | Contact us

Chapter 7. Skip unwanted web sites

7.1. Skipping Unwanted Web Sites

Web Image Collector can be configured using advanced settings to skip web pages using several different rules. These rules can check the URL of the web page and decide to process or skip matching web pages.

7.2. Configure The URL Filter

This is where you configure the filter which includes or excludes web pages based on keywords or regular expressions (regex). The keywords can be any text like the word 'puppies' and every time a URL is found on a web page, it is compared to the keywords in the filter list. If any of the keywords are found in the URL, the URL will be skipped or included based on the filter setting. This filter works on the URL of the web page only and does not include words or phrases that may be found in the web page itself. To read detailed specifics on the settings of this page, please see editing the page filter page.

7.2.1. How to find images on specific web pages

To control WIC and have it look at only specific web pages, set the URL include filter to operate on web links that contain specific words.

For example, assume that you have

	
				A URL Include List of: ferrari;corvette
			

The following URLs would be included:

	
				http://www.test.com/ferrari/allphotos.html
				http://www.test.com/corvette/allphotos.html
			

The following URLs would be excluded:

	
				http://www.test.com/toyota/allphotos.html
			

Filtering web page URLs is the first step to locating images. If you exclude specific web pages then you will also exclude the images found on those web pages. This can greatly reduce junk images that you may not want to save to your computer.