Automated Marketing Research Using Online Customer Reviews

More documents

Recommendations

Info

More formally, we assume that phrases and words are preprocessed and normalized into words as before. A graph G = (V,E) is a pair of the set of vertices V and the set of edges E. An edge in E is a connection between two vertices and may be represented as a pair (vi,vj) � V. Each phrase (word) represents a vertex v in the graph; edges are defined by phrase pairs within a review (word pairs within a phrase). An N-partite graph is a connected graph where there are no edges in any set of vertices Vi. A clique of size N simulates a plays the role of arelational schema and can be extended to an N-partite graph by substituting each vertice vi of the clique with a set of vertices Vi. A database table with disjoint columns thus represents an N-partite graph where the size of the clique defines the number of columns and each word in the clique “names” a column. A maximal-complete-N-partite graph is a complete-N- partite graph not contained in any other such graph; in other words, the initial clique is maximal. The corresponding database table of phrases represents the existing product attribute space, and the maximal- complete-N-partite graph includes possibly novel combinations of previously unpaired attributes and/or attribute properties. To relate the graph back to customer reviews, we say that a product attribute is constructed from k dimensions. Each dimension names a domain (D). Each domain D is defined by a finite set of words that includes the value NULL for review phrases where customers fail to mention one or more attribute dimension(s). The Cartesian product of domains D1 …Dk is the set of all k-tuples {t1…tk | ti � Di}. Each phrase is simply one such k-tuple and the set of all phrases in the cluster simply defines a finite subset of the Cartesian product. A relational schema is simply a mapping of attribute properties A1 …Ak to domains D1 … Dk. Note the strong, implicit assumption that a maximal clique, taken over a word graph, is a proxy for the proper number of attribute dimensions. Under this assumption, it is easy to see how searching for cliques within the graph results in a table. 6
[6] Constrained Logic Programming To align words into their corresponding attribute dimensions, we frame the task as a mathematical assignment problem and resolve the problem using a bounds consistency approach. We define the assignment using the maximal clique that corresponds to the schema for each product attribute table (see Figure WA1.1). In the bounds consistency approach, we invert the constraints (tok_exclusion) to express the complementary set of candidate assignments (tok_candidates) for each attribute dimension. If the phrase constraints, taken together, are internally consistent, then the candidate assignments (tok_assign)for a given token are simply the intersection of all candidate assignments as defined by all phrases in the cluster containing that token. We transform the mutual exclusivity constraint represented by each phrase into a set of candidate assignments using the algorithm in Figure WA1.2. Note that we need only propagate the mutual exclusivity of words that are previously unassigned. Accordingly, for each unassigned token in a given phrase, the set of candidate assignments is the intersection of the possible assignments based upon the current phrase and all candidate assignments from earlier phrases containing the same token. We maintain a list of active tokens boundary_list to avoid rescanning the set of all tokens every time the possible assignments for a given token is updated. Finally, the K-means clustering used to separate review phrases into distinct product attributes is a noisy process. The clustering can easily result in the inclusion of spurious phrases. Both the initial process_phrases(p_list) [1] schema = find_maximal_clique(p_list) [2] order phrases by length [3] for each phrase p: [4] # initialize data structures [5] tok_exclusion – for each tok, mutually exclusive tokens [6] tok_candidates – for each tok, valid candidate assignments [7] tok_assign – for each tok, the dimension assignment [8] # propagate the constraints for each successive phrase [9] tok_candidates, tok_exclusion, tok_assign = [10] propagate_bounds(phrase, tok_candidates, [11] tok_exclusion, tok_assign, schema) [12] Figure WA1.1 Logical Assignment 7
Page 1 and 2: Automated Marketing Research Using
Page 4 and 5: [3] Phrase clustering We cluster th
Page 8 and 9: propagate_bounds(phrase, tok_candid
Page 10 and 11: Web Appendix B. Automatically gener
Page 12 and 13: 14. Support (service) support produ
Page 14 and 15: 26. Size small, broke, convenient,
Page 16 and 17: 38. Picture quality extremely, swit
Page 18 and 19: VOC Only Survey attribute Auto Fami
Page 20 and 21: Web Appendix D. Interpreting the Co

Automated Marketing Research Using Online Customer Reviews

Create successful ePaper yourself

Delete template?

Save as template?