Automatic Gathering of Newspaper Articles on Internet Abuse from ...
Automatic Gathering of Newspaper Articles on Internet Abuse from ...
Automatic Gathering of Newspaper Articles on Internet Abuse from ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
General Approach<br />
● crawler: obtain informati<strong>on</strong> <strong>from</strong> the web<br />
● c<strong>on</strong>vert: c<strong>on</strong>vert document in text format<br />
● strip / cut: extract the “kernel”<br />
● compare: throw away multiple copies<br />
● throw away “index” files<br />
● analyse: look for search words<br />
● categorise: define document relevance with respect to categories<br />
● assign keywords: to facilitate document search<br />
● store<br />
● query