130 6 <strong>Web</strong> Usage <strong>Mining</strong>ap k =∑s i ∈R kθ i,k · s i|R k |(6.30)where |R k | is the number of the chosen user sessions in R k .Step 3: Output a set of task-specific user access patterns TAP corresponding to t tasks,TAP = {ap k ,k = 1,···,t}. In this expression, each user access pattern is represented by aweighted page vector, where the weights indicate the relative visit preferences of pages exhibitedby all associated user sessions for this task-specific access pattern.6.4 Co-Clustering Analysis of weblogs using Bipartite SpectralProjection ApproachIn previous sections, we broadly discussed <strong>Web</strong> clustering in <strong>Web</strong> usage mining. Basically<strong>Web</strong> clustering could be performed on either <strong>Web</strong> pages or user sessions in the context of <strong>Web</strong>usage mining. <strong>Web</strong> page clustering is one of popular topics in <strong>Web</strong> clustering, which aimsto discover <strong>Web</strong> page groups sharing similar functionality or semantics. For example, [114]proposed a technique LSH (Local Sensitive Hash) for clustering the entire <strong>Web</strong>, concentratingon the scalability of clustering. Snippet-based clustering is well s<strong>tud</strong>ied in [92]. [147] reportedusing a hierarchical monothetic document clustering for summarizing the search results. [121]proposed a <strong>Web</strong> page clustering algorithm based on measuring page similarity in terms ofcorrelation. In contrast to <strong>Web</strong> page clustering, <strong>Web</strong> usage clustering is proposed to discover<strong>Web</strong> user behavior patterns <strong>and</strong> associations betw<strong>ee</strong>n <strong>Web</strong> pages <strong>and</strong> users from the perspectiveof <strong>Web</strong> user. In practice, Mobasher et al. [184] combined user transaction <strong>and</strong> pageviewclustering techniques, which was to employ the traditional k-means clustering algorithm tocharacterize user access patterns for <strong>Web</strong> personalization based on mining <strong>Web</strong> usage data. In[258] Xu et al. attempted to discover user access patterns <strong>and</strong> <strong>Web</strong> page segments from <strong>Web</strong>log files by utilizing a so-called Probabilistic Semantic Latent Analysis (PLSA) model.The clustering algorithms described above are mainly manipulated on one dimension (orattribute) of the <strong>Web</strong> usage data only, i.e. user or page solely, rather than taking into accountthe correlation betw<strong>ee</strong>n <strong>Web</strong> users <strong>and</strong> pages. However, in most cases, the <strong>Web</strong> object clustersdo often exist in the forms of co-occurrence of pages <strong>and</strong> users - the users from the samegroup are particularly interested in one subset of <strong>Web</strong> pages. For example, in the contextof customer behavior analysis in e-commerce, this observation could correspond to the phenomenonthat one specific group of customers show strong interest to one particular categoryof goods. In this scenario, <strong>Web</strong> co-clustering is probably an effective means to address thementioned challenge. The s<strong>tud</strong>y of co-clustering is firstly proposed to deal with co-clusteringof documents <strong>and</strong> words in digital library [73]. And it has b<strong>ee</strong>n widely utilized in many s<strong>tud</strong>ieswhich involved in multiple attribute analysis, such as social tagging system [98] <strong>and</strong> geneticrepresentation [108] etc. In this section, we will propose a co-clustering algorithm for <strong>Web</strong>usage mining based on bipartite spectral clustering.
6.4 Co-Clustering Analysis of weblogs using Bipartite Spectral Projection Approach 1316.4.1 Problem FormulationBipartite Graph ModelAs the nature of <strong>Web</strong> usage data is a reflection of a set of <strong>Web</strong> users visiting a number of <strong>Web</strong>pages, it is intuitive to introduce a graph model to represent the visiting relationship betw<strong>ee</strong>nthem. In particular, here we use the Bipartite Graph Model to illustrate it.Definition 6.1. Given a graph G =(V,E), where V is a set of vertices V = {v 1 ,···v n } <strong>and</strong> Eis a set of edges {i, j} with edge weight E ij , the adjacency matrix M of the graph G is definedby{ Eij if there is an edge (i,j)M ij =0 otherwiseDefinition 6.2. (Cut of Graph): Given a partition of the vertex set V into multiple subsetsV 1 ,···,V k , the cut of the graph is the sum of edge weights whose vertices are assigned to twodifferent subsets of vertices:∑cut(V 1 ,V 2 ,···,V k )= M iji∈V i , j∈V jAs discussed above, the usage data is ind<strong>ee</strong>d demonstrated by the visits of <strong>Web</strong> users on various<strong>Web</strong> pages. In this case, there are NO edges betw<strong>ee</strong>n user sessions or betw<strong>ee</strong>n <strong>Web</strong> pages,instead there are only edges betw<strong>ee</strong>n user sessions <strong>and</strong> <strong>Web</strong> pages. Thus it is essential thatthe bipartite graph model is an appropriate graphic representation to characterize their mutualrelationships.Definition 6.3. (Bipartite Graph Representation): Consider a graph G =(S,P;E) consistingof a set of vertices V{s i , p j : s i ∈ S, p j ∈ P;i = 1,···,m, j = 1,···,n}, where S <strong>and</strong> P are theuser session collection <strong>and</strong> <strong>Web</strong> page collection, respectively, <strong>and</strong> a set of edges {s i , p j } eachwith its weight a ij , where s i ∈ S <strong>and</strong> p j ∈ P, the links betw<strong>ee</strong>n user sessions <strong>and</strong> <strong>Web</strong> pagesrepresent the visits of users on specific <strong>Web</strong> pages, whose weights indicate the visit preferenceor significance on respective pages.Furthermore, given the m × n session-by-pageview matrix A such that a ij equals to the edgeweight E ij , it is easy to formulate the adjacency matrix M of the bipartite graph G as[ ]0 AM =A t 0In this manner, the first m rows in the reconstructed matrix M denote the co-occurrence of usersessions while the last n rows index the <strong>Web</strong> pages. The element value of M is determined byclick times or duration period. Because the ultimate goal is to extract subsets of user sessions<strong>and</strong> <strong>Web</strong> pageviews to construct a variety of co-clusters of them such that they possess thecloser cohesion within the same cluster but the stronger disjointness from other clusters, it isnecessary to model the user session <strong>and</strong> <strong>Web</strong> page vectors in a same single unified space. Inthe coming section, we will discuss how to perform co-clustering on them.
- Page 2 and 3:
Web Mining and Social Networking
- Page 4:
Guandong Xu • Yanchun Zhang • L
- Page 8 and 9:
VIIIPrefacefollowing characteristic
- Page 11:
Acknowledgements: We would like to
- Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
- Page 16 and 17:
XVIContentsPart III Social Networki
- Page 19:
Part IFoundation
- Page 22 and 23:
4 1 Introduction(3). Learning usefu
- Page 24 and 25:
6 1 Introductioncalled computationa
- Page 26 and 27:
8 1 Introduction• The data on the
- Page 28 and 29:
10 1 Introductionin a broad range t
- Page 31 and 32:
2Theoretical BackgroundsAs discusse
- Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
- Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
- Page 37 and 38:
2.5 Singular Value Decomposition (S
- Page 39 and 40:
2.6 Tensor Expression and Decomposi
- Page 41 and 42:
2.7 Information Retrieval Performan
- Page 43 and 44:
2.8 Basic Concepts in Social Networ
- Page 45:
2.8 Basic Concepts in Social Networ
- Page 48 and 49:
30 3 Algorithms and TechniquesTable
- Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
- Page 52 and 53:
34 3 Algorithms and Techniquesa sub
- Page 54 and 55:
36 3 Algorithms and TechniquesMetho
- Page 56 and 57:
38 3 Algorithms and TechniquesCusto
- Page 58 and 59:
40 3 Algorithms and TechniquesTable
- Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
- Page 62 and 63:
44 3 Algorithms and Techniques{a}10
- Page 64 and 65:
46 3 Algorithms and Techniques3.2 S
- Page 66 and 67:
48 3 Algorithms and TechniquesConce
- Page 68 and 69:
50 3 Algorithms and TechniquesNaive
- Page 70 and 71:
52 3 Algorithms and Techniquesuses
- Page 72 and 73:
54 3 Algorithms and Techniquesin th
- Page 74 and 75:
56 3 Algorithms and Techniques// Fu
- Page 76 and 77:
58 3 Algorithms and Techniquesendd
- Page 78 and 79:
60 3 Algorithms and Techniquesstart
- Page 80 and 81:
62 3 Algorithms and TechniquesHere
- Page 82 and 83:
64 3 Algorithms and Techniques3.8.2
- Page 84 and 85:
66 3 Algorithms and Techniquesfor e
- Page 86 and 87:
68 3 Algorithms and Techniquesthat
- Page 89 and 90:
4Web Content MiningIn recent years
- Page 91 and 92:
score(q,d)=4.2 Web Search 73V(q) ·
- Page 93 and 94:
4.2 Web Search 75algorithm. The Web
- Page 95 and 96:
4.3 Feature Enrichment of Short Tex
- Page 97 and 98: 4.4 Latent Semantic Indexing 794.4
- Page 99 and 100: Notation4.5 Automatic Topic Extract
- Page 101 and 102: 4.5 Automatic Topic Extraction from
- Page 103 and 104: 4.6 Opinion Search and Opinion Spam
- Page 105: 4.6 Opinion Search and Opinion Spam
- Page 108 and 109: 90 5 Web Linkage Mining5.2 Co-citat
- Page 110 and 111: 92 5 Web Linkage Mining{ /1 out deg
- Page 112 and 113: 94 5 Web Linkage Mininga =(a(1),·
- Page 114 and 115: 96 5 Web Linkage Mining5.4.1 Bipart
- Page 116 and 117: 98 5 Web Linkage MiningNext, consid
- Page 118 and 119: 100 5 Web Linkage Mining(5) Creatin
- Page 120 and 121: 102 5 Web Linkage Miningpower-law d
- Page 122 and 123: 104 5 Web Linkage MiningFig. 5.10.
- Page 124 and 125: 106 5 Web Linkage Miningbetween use
- Page 126 and 127: 6Web Usage MiningIn previous chapte
- Page 129 and 130: 6.1 Modeling Web User Interests usi
- Page 131 and 132: 6.1 Modeling Web User Interests usi
- Page 133 and 134: 6.1 Modeling Web User Interests usi
- Page 135 and 136: 6.1 Modeling Web User Interests usi
- Page 137 and 138: 6.2 Web Usage Mining using Probabil
- Page 139 and 140: 6.2 Web Usage Mining using Probabil
- Page 141 and 142: 6.2 Web Usage Mining using Probabil
- Page 143 and 144: 6.3 Finding User Access Pattern via
- Page 145 and 146: 6.3 Finding User Access Pattern via
- Page 147: 6.3 Finding User Access Pattern via
- Page 151 and 152: 6.5 Web Usage Mining Applications 1
- Page 153 and 154: 6.5 Web Usage Mining Applications 1
- Page 155 and 156: 6.5 Web Usage Mining Applications 1
- Page 157 and 158: 6.5 Web Usage Mining Applications 1
- Page 159 and 160: 6.5 Web Usage Mining Applications 1
- Page 161: Part IIISocial Networking and Web R
- Page 164 and 165: 146 7 Extracting and Analyzing Web
- Page 166 and 167: 148 7 Extracting and Analyzing Web
- Page 168 and 169: 150 7 Extracting and Analyzing Web
- Page 170 and 171: 152 7 Extracting and Analyzing Web
- Page 172 and 173: 154 7 Extracting and Analyzing Web
- Page 174 and 175: 156 7 Extracting and Analyzing Web
- Page 176 and 177: 158 7 Extracting and Analyzing Web
- Page 178 and 179: 160 7 Extracting and Analyzing Web
- Page 180 and 181: 162 7 Extracting and Analyzing Web
- Page 182 and 183: 164 7 Extracting and Analyzing Web
- Page 184 and 185: 166 7 Extracting and Analyzing Web
- Page 186 and 187: 168 7 Extracting and Analyzing Web
- Page 188 and 189: 170 8 Web Mining and Recommendation
- Page 190 and 191: 172 8 Web Mining and Recommendation
- Page 192 and 193: 174 8 Web Mining and Recommendation
- Page 194 and 195: 176 8 Web Mining and Recommendation
- Page 196 and 197: 178 8 Web Mining and Recommendation
- Page 198 and 199:
180 8 Web Mining and Recommendation
- Page 200 and 201:
182 8 Web Mining and Recommendation
- Page 202 and 203:
184 8 Web Mining and Recommendation
- Page 204 and 205:
186 8 Web Mining and Recommendation
- Page 206 and 207:
188 8 Web Mining and Recommendation
- Page 208 and 209:
190 9 Conclusionsries commonly used
- Page 210 and 211:
192 9 Conclusionsas computer scienc
- Page 212 and 213:
194 9 Conclusionsresearches have de
- Page 214 and 215:
196 References14. J. Ayres, J. Gehr
- Page 216 and 217:
198 References49. D. Chakrabarti, R
- Page 218 and 219:
200 References82. C. Dwork, R. Kuma
- Page 220 and 221:
202 References119. J. Hou and Y. Zh
- Page 222 and 223:
204 References151. A. N. Langville
- Page 224 and 225:
206 References186. J. K. Mui and K.
- Page 226 and 227:
208 References223. C. Shahabi, A. M
- Page 228:
210 References260. G.-R. Xue, D. Sh