04.11.2014 Views

elektronická verzia publikácie - FIIT STU - Slovenská technická ...

elektronická verzia publikácie - FIIT STU - Slovenská technická ...

elektronická verzia publikácie - FIIT STU - Slovenská technická ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

User Modeling for Personalized Web-Based Systems 241<br />

The IE techniques could be applied to user-supplied documents (such as CV, from which<br />

a system could deduce user’s age, education or previous employments [41] or scientific<br />

paper the user submitted to a particular conference, from which a system could deduce<br />

user’s domain of interests and further focus within the domain) or documents found on the<br />

web, where it can be inspired by classical IE techniques.<br />

User Information Extraction<br />

Web Appearance Disambiguation. Paper [7] proposes the Web appearance disambiguation<br />

methods (Link Structure Model, Agglomerative/Conglomerative Double Clustering and<br />

their combination) and uses a social networks of users as a background knowledge. The<br />

Web appearance disambiguation in general is inferring a model that ultimately provides<br />

a function f answering whether or not a Web page d refers to a particular person h, given<br />

a model M and background knowledge K.<br />

Authors attempt to use as little background knowledge as possible and decided user’s<br />

social network to be such knowledge. Therefore, instead of solving one problem, they solve N<br />

interrelated problems: for each person h i in the group H (a group of people H = {h 1 ,...,h N }<br />

who are related to each other), they find Web pages that refer to h i . The group of people was<br />

defined manually based on e-mail correspondence.<br />

The basic idea of Link Structure Model is that Web pages of a group of acquaintances<br />

are likely to be interconnected, while pages of their namesakes would not. However, the<br />

existence of a direct hyperlink from one relevant page to another may be rare. Two pages can<br />

be considered as linked if both contain a hyperlink to the same page, or both are hyperlinked<br />

from one page, or one page can be reached within three hyperlink hops from the other. Yet<br />

another approach can also be considered, for example, two pages are linked if both mention<br />

the same organization. Authors decided that for their purposes, two Web pages are linked to<br />

each other if their hyperlinks share something in common.<br />

Their set of Web pages D is constructed by providing a search engine with queries<br />

t h1 ,...t HN (where t hi is a name of person h i in user’s h social network) and retrieving top K<br />

hits for each one of the query, so that N × K Web pages are retrieved overall. Every page d is<br />

already associated with a personal name t hi , however, it is yet unknown whether the page d<br />

refers to the actual person h or to his/her namesake (or to neither).<br />

Based on a set D, the model M is constructed. Authors defined a Link Structure Graph<br />

over a set of Web pages D as G LS =(V,E) if nodes of the graph are the Web pages (V ≡ D)<br />

and there exists an edge between any pair of nodes d i and d j iff d i and d j are linked to each<br />

other. Than the the Link Structure Model M LS is defined as a pair (C, δ), where C is the set of<br />

all connected components of the graph G LS (note that C 0 ∈ C, where C 0 is the central cluster,<br />

the largest connected component in G LS that consists of pages retrieved by more than one<br />

query) and δ is a distance threshold.<br />

Finally, the discrimination function is defined:<br />

{ 1 if d ∈ C : ‖ Ci − C<br />

f(d, h|M,K) =<br />

0 ‖ < δ,i =0..M<br />

0 otherwise<br />

(8.3)<br />

The intuition behind this definition is that the pages of the central cluster and of a few clusters<br />

that are close to the central cluster are considered to be relevant, while others are irrelevant.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!