22.01.2013 Views

Automated Marketing Research Using Online Customer Reviews

Automated Marketing Research Using Online Customer Reviews

Automated Marketing Research Using Online Customer Reviews

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Conceptually, we model this process as a constrained optimization problem. Abusing our<br />

previous notation slightly, assume a set of phrases I composed from the set of words J and a set of<br />

attribute dimensions D. We have J � D binary decision variables Xjd where Xjd is 1 if word j is<br />

assigned to dimension d. There are I � J {0, 1} variables representing a constraint matrix where Yij is<br />

1 or 0 depending upon whether word j appears in phrase i. Thus, our objective is to:<br />

max<br />

s.<br />

t.<br />

� X<br />

�<br />

�i Y * X � 1<br />

X<br />

jd<br />

jd<br />

J<br />

ij<br />

binary<br />

The graph partitioning algorithm used to set the parameters I, J, and D and the constrained logic<br />

program (CLP) by which we solve the optimization are implemented in Python and detailed next.<br />

[5] Graph representations<br />

To discover attributes, we assume that each customer review phrase corresponds to a distinct<br />

product attribute. To discover attribute dimensions, we assume that each word in the phrase corresponds<br />

to a distinct dimension. Discovering attribute dimensions then reduces to the assignment of particular<br />

words to attribute dimensions. But how do we know how many dimensions there are in the assignment<br />

problem? Is it possible that the assignment optimization has no feasible solution because of conflicting<br />

constraints due to noise from the vagaries of human language? To solve this problem, we generate a<br />

graph of all words in the cluster. Each word is a node and arcs are defined by the co-occurrence of two<br />

words in the same phrase. We partition the graph into (possibly overlapping) sub-graphs by searching for<br />

maximal cliques. Intuitively, each sub-graph represents a maximal subset of words and phrases for which<br />

an optimal solution exists. The size of the maximal clique sets the number of attributes |D|. The sub-<br />

graph (words J and phrases I) define the optimization.<br />

jd<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!