datafireHome | Products | Services | Support | Contact us

Chapter 6. Skip unwanted images

6.1. Skipping Unwanted Images

Web Image Collector can be configured using advanced settings to skip images using several different filters. As images are found on a web page, they are passed through these filter rules to check the image width, height, file size, URL, and file name. Configuring these filters can help WIC avoid downloading unwanted images to your computer. You can use one, two or all filters together for very specific search results.

Some of these rules are automatically set when you create a new image collection. These automatic settings attempt to filter out common advertisement images and small thumbnail images. You can always edit these settings by editing the image collection before you begin collecting or after the collection process is finished.

6.2. Configure The Image Dimension Filter

This is where you configure the filter which includes or excludes images based on their width and height in pixels (px). As images are downloaded to your computer, they are compared to the settings on this page to determine if they should be downloaded or skipped. To read detailed specifics on the settings of this page, please see editing the image filter page.

6.2.1. How to skip small images

Many web sites use thumbnails which are smaller versions of the full size image. When you click on the thumbnail image, usually you will see the full size image. To prevent WIC from downloading useless thumbnail images, you should configure the minimum image size filter. Usually setting a width of 100 and a height of 100 will skip most thumbnail images. This setting may need to be adjusted depending on the websites that you are running searches on.

6.2.2. How to skip advertisements

Many web sites use image banner ads which get downloaded unless you configure this image filter. To skip advertisements, set the filter mode to 'Exclude all listed items'. Next click add and fill in the width and height of common advertisement sizes such as 468 x 60.

6.2.3. How to download desktop backgrounds

There are a number of websites that provide free desktop background images. Most sites will have images for a variety of common desktop sizes, but usually only one size works for you. For example, if your desktop size is 1920 x 1080, you can set this filter to download images only in this size. To start, set the filter mode to 'Include all listed items', then click add and enter 1920 x 1080. Next, make sure the 'Minimum image size' and 'Maximum image size' fields are empty. We only want to focus on the image size list for this example.

Now say that you also have a laptop with a different screen size of 1024 x 768. Simply click add and enter 1024x768. Now you should have two image dimensions set in the include list and only images of these sizes will be downloaded.

6.3. Configure The File Size Filter

This is where you configure the filter which excludes images that are smaller than the minimum size and larger than the maximum size. The minimum and maximum sizes are in kilobytes (KB).

6.3.1. How to skip small images

Many web sites use thumbnails which are smaller versions of the full size image. When you click on the thumbnail image, usually you will see the full size image. In addition to using the image size filter which filters out images by width and height, you can combine in this filter to skip small images by their file size. For example, setting a 'Minimum file size' of 1 KB will avoid downloading icons and other tiny images.

6.3.2. How to download only large images

Usually images with a large file size are high quality images that have a large width and height in pixels. One possibility is to configure the image dimension filter which can filter out images that are smaller than the specified width and height. This may not be enough in some cases and so you can set the 'Minimum file size' to a large amount of 1000 KB (for example). This file size filter can be used with or without the image dimension filter.

6.4. Configure The File Filter

This is where you configure the filter which includes or excludes images based on the file types. For example, you can include or exclude images of .JPG, .PNG and more. The case of the keywords is ignored, so you can enter .JPG or .jpg and have the same result.

6.4.1. How to download only png files

To download only png files and skip jpegs, bitmaps and others, set the filter mode to 'Include all listed items'. Next click on add and type in .png and you will have only png files downloaded.

6.5. Configure The Image URL Filter

This is where you configure the filter of including and excluding images by matching keywords or regular expressions (regex) against image URLs. All images have URLs which is the location on the web server where the file is stored. In many cases the URL contains keywords that you may want to skip such as 'thumbnails'. In other cases the URL may contain a path such as 'ferarri' and you may want to focus only on images of Ferrari cars. This filter works on the URL of the web page only and does not include words or phrases that may be found in the web page itself.

6.5.1. How to download images in a specific URL directory

To download images that match a specific URL pattern, use the URL include list. For example to download images that are in the 'bulldogs' folder on the web server, add bulldogs to the URL include list. In this example, images will be downloaded only from URLs that look like:

	
				http://www.dogimages.com/bulldogs/1.jpg
				http://www.catsrule.com/enemy-bulldogs/aaa01.jpg
				http://www.bulldogsofmars.org/meeting1/twelvedays.png