03.01.2015 Views

Combining Information from Multiple Internet Sources

Combining Information from Multiple Internet Sources

Combining Information from Multiple Internet Sources

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Table search_engine:<br />

This table contains definitions of search engines. It contains all necessary<br />

data to construct specific queries which in turn can be used to extract<br />

answers <strong>from</strong> the search engines.<br />

Columns:<br />

• id – Primary Key<br />

• name – Name of the search engine<br />

• query_parameter – Contains base URL of the search engine, used when<br />

constructing queries<br />

• result_count_parameter – Contains name of the HTTP request parameter<br />

used to manipulate number of results displayed per page<br />

• cookie_based_parameter – Some of the search engines store user<br />

preferences in cookies rather than, for instance, allow the user to<br />

supply HTTP request parameters to manipulate the results<br />

Table search_engine_ignore:<br />

This table contains lists of pairs (Link Name, HREF) which should be ignored<br />

when parsing page of search engine with results on given query. When parsing<br />

the HTML document system should not consider all buttons, URLs, etc. which<br />

are not connected to the query (for instance on Google page one can find<br />

hyper links to Google Maps, Google News, etc. which should not be considered<br />

as a result)<br />

Columns:<br />

• id – Primary Key<br />

• href – URL to be ignored (if empty system will ignore all pairs with<br />

the given link name)<br />

• link_name – Name of a link to be ignored (if empty system will ignore<br />

all pairs with given HREF)<br />

• search_engine_id – (Foreign key) Specifies which search engine should<br />

use the given pair<br />

Listing 2.2.2 Search_engine and search_engine_ignore tables descriptions<br />

19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!