Markovian Combination of Decomposable Model Structures: MCMoSt

Markovian Combination 1/45 

London Math Society Durham Symposium on 

Mathematical Aspects of Graphical Models. 

June 30 - July 10, 2008 

Markovian Combination of 

Decomposable Model Structures: 

MCMoSt 

Sung-Ho Kim & Sangjin Lee 

Department of Mathematical Sciences 

Korea Advanced Institute of Science and Technology 

Dept. of Math. Sciences, KAIST


Contents 

1 Introduction 

2 Preliminaries 

3 Distribution, interaction graph, and Markovian 

subgraph 

4 Combined model structures 

5 Graph of prime separators (GOPS) 

6 Algorithm 

7 Time complexity 

8 Illustration-1 


10 Concluding remarks 

11 Where to go? 



1 Introduction 

• Problem and motivation 

1. Consider a problem of developing a graphical model of 

30 item score variables (X’s) and 20 or more 

cognitive/knowledge state variables (U’s). 

2. We use the model for diagnosing knowledge states 

where knowledge states are predicted via conditional 

probabilities. 

3. X’s are observable and U’s are not, and assume they 

are all binary. 

4. Cognitive diagnosis and model structure 



• Two common sense issues in large-scale modeling: 

1. Sparseness of data (Koehler, 1986; Maydeu-Olivares 

and Joe, 2005; Kim(in revision)) 

2. Model/time complexity (Chickering, 1996) 

- Koehler, K.J., 1986. Goodness-of-fit tests for log-linear models in 

sparse contingency tables. JASA, 81(394), 483-493. 

- Maydeu-Olivares, A., Joe, H., 2005. Limited- and full-information 

estimation and goodness-of -fit testing in 2 n contingency tables: a 

unified framework. JASA, 100(471), 1009-1020. 

- Chickering, D. (1996). Learning Bayesian networks is 

NP-complete, In: Learning from Data, D. Fisher and H. Lenz (Ed.), 

121-130, Springer-Verlag. 

- Kim (in revision). Estimate-based goodness-of-fit test for large 

sparse multinomial distributions. 



• Fienberg and Kim (1999) and Kim (2006) considered a 

problem of combining conditional graphical log-linear 

structures and derived a combining rule for them based on 

the relation between the log-linear model and its 

conditional version. 

• A main feature of the relation is that conditional 

log-linear structures appear as parts of their joint model 

structure [Theorems 3 and 4, Fienberg and Kim]. 

- Fienberg and Kim (1999). Combining conditional log-linear 

structures, JASA, 445(94), 229-239. 

- Kim (2006). Conditional log-linear structures for log-linear 

modelling, CSDA, 50(8), 2044-2064. 



But not between marginal and joint models! 

• Consider two marginal models, [12][23] and [12][24], which 

are possible from each of 

[12][24][23], [12][24][34], [12][23][34], [12][234]. 

• How can we find a joint model structure from a given 

set of marginal model structures? 



2 Preliminaries 

• We will consider only undirected graphs. 

• For a subset A ⊆ V , we denote by GA = (A, EA) the 

subgraph of G = (V, E) confined to A where 

EA = (E ∩ A × A) ∪ 

{(u, v) ∈ A × A; u and v are not separated by 

A \ {u, v} in G}. 

We will call GA the Markovian subgraph of G confined to 

A. 

2 

2 

1 

4 5 

3 

(a) (b) 

1 

3 

4 



• If G = (V, E), G ′ = (V, E ′ ), and E ′ ⊆ E, then we say that G ′ 

is an edge-subgraph of G and write G ′ ⊆ e G. 

• According to the definition of a decomposable graph, we 

can find a sequence of cliques C1, · · · , Ck of a decomposable 

graph G which satisfies the following condition [see 

Proposition 2.17 of Lauritzen (1996)]: with C (j) = ∪ j 

i=1 Ci 

and Sj = Cj ∩ C (j−1), 

for all i > 1, there is a j 

• We denote the collection of these Sj’s by χ(G). 

- Lauritzen (1996). Graphical Models. Oxford: Oxford University 

Press. 



• The cliques are elementary graphical components and 

the Sj is obtained as intersection of neighboring cliques. 

We will call the Sj’s prime separators of the decomposable 

graph G. 

• Prime graphs are defined as the maximal subgraphs 

without a complete separator in Cox and Wermuth(1999). 

• The prime separators in a decomposable graph may be 

extended to separators of prime graphs in any undirected 

graph. 

- Cox, D.R. and Wermuth, N. (1999). Likelihood factorizations for 

mixed discrete and continuous variables, SJS, 26, 209-220. 



• We will denote by M(G) the collection of the 

distributions that are globally Markov with respect to G, 

i.e., if, for three disjoint subsets A, B, C of V , XA and XB 

are conditionally independent given XC, then A and B are 

separated by C in G. 



3 Distribution and interaction graphs 

• For a distribution P, let G(P) be the interaction graph of 

P. 

Theorem 1. (Corollary 3.4 in Kim (2004)) For a distribution 

P of XV and A ⊆ V , 

PA ∈ M(G(P)A). 

• For a collection V of subsets of V , let 

˜L(GA, A ∈ V) = {P; PA ∈ M(GA), A ∈ V}. 

Theorem 2. (Theorem 3.6 in Kim(2004)) For a collection V of 

subsets of V with an undirected graph G, 

M(G) ⊆ ˜ L(GA, A ∈ V). 

- Kim (2004). Combining decomposable model structures, RR 04-15, 

Division of Applied Mathematics, KAIST. 



4 Combined model structures 

• Let G = (V, E) be decomposable and let V1, V2, · · · , Vm be 

subsets of V . For simplicity, we write Gi = GVi. 

Definition 3. Suppose there are m Markovian subgraphs, 

G1, · · · , Gm. Then we say that graph H of a set of variables 

V is a combined model structure (CMS) corresponding to 

G1, · · · , Gm, if the following conditions hold: 

(i) ∪ m i=1 Vi = V. 

(ii) HVi = Gi, for i = 1, · · · , m. That is, Gi are Markovian 

subgraphs of H. 

We will call H a maximal CMS corresponding to G1, · · · , Gm 

if adding any edge to H invalidates condition (ii) for at 

least one i = 1, · · · , m. 



• Let CG(A) denote the collection of the cliques which 

include nodes of A in the graph G. 

Lemma 4. Let G ′ = (V ′ , E ′ ) be a Markovian subgraph of G 

and suppose that, for three disjoint subsets A, B, C of V ′ , 

〈A|B|C〉G ′. Then 

(i) 〈A|B|C〉G; 

(ii) For W ∈ CG(A) and W ′ ∈ CG(C), 〈W |B|W ′ 〉G. 



Theorem 5. (Kim, 2006) Let there be Markovian subgraphs 

Gi, i = 1, 2, · · · , m, of a decomposable graph G. Then 

(i) ∪ m i=1χ(Gi) ⊆ χ(G); 

(ii) for any maximal CMS H, 

∪ m i=1χ(Gi) = χ(H). 

Theorem 6 (Unique existence). (Kim, 2006) Suppose there are 

m Markovian subgraphs G1, · · · , Gm of a decomposable 

graph G. Then there exists a unique maximal CMS H ∗ of 

the m Markovian subgraphs such that G ⊆ e H ∗ . 

- Kim. (2006). Properties of Markovian subgraphs of a 

decomposable graph, LNAI 4293, 15-26. 



Theorem 7 (Invariance of PS). Let G be a decomposable 

graph and G1 and G2 be Markovian subgraphs of G. 

Suppose that a set C ∈ χ(G1) and that C ⊆ V2. Then C is 

not intersected in G2 by any other subset of V2. 

• We will call a node PS-node if it is a component of a 

PS; otherwise, a non-PS-node. 



5 Graph of prime separators (GOPS) 

Definition 8. Let A = ∪ a∈χ(G)a. Then the graph of the 

prime separators (GOPS for short) of G is obtained from 

GA by replacing every PS and all the edges between every 

pair of neighboring PSs in GA with a node and an edge, 

respectively. 

1 

2 

12 

4 

3 

11 

8 

7 

5 

6 

3, 4 

3, 5 

4, 8 {11,12} 



6 Algorithm 

[Separateness condition] Let M be a set of Markovian 

subgraphs of G and H a maximal CMS of M. If two 

nodes are in a graph in M and they are not adjacent in 

the graph, then neither are they in H. Otherwise, 

adjacency of the nodes in H is determined by checking 

separateness of the nodes in M. 

Suppose that M consists of m Markovian subgraphs, 

G1, · · · , Gm, of G and we denote by a i a PS of Gi. We can 

then combine the models of M as follows. 

Step 1. We arrange the subgraphs into Gi1 , · · · , Gim such 

that |Vij ∩ Vij+1| ≥ |Vij+1 ∩ Vij+2| for j = 1, 2, · · · , m − 2. For 

convenience, let ij = j, j = 1, 2, · · · , m. We define 

η1 = {G1}. 



Step 2a. We first put an edge between every pair of PSs, 

a 1 and a 2 , if 

a 1 ∩ a 2 �= ∅, (1) 

in such a way that the separateness condition is 

satisfied with regard to M. We denote the resulting 

GOPS by H. 

Step 2b. Once the node-sharing PSs are all considered in 

Step 2a, we need to consider all the PSs a 1 and a 2 such 

that 

a 1 ∩ � ∪ a∈χ(G2)a � = ∅ and a 2 ∩ � ∪ a∈χ(G1)a � = ∅ (2) 

and put edges between a i , i = 1, 2, and every PS in G3−i 

that is acceptable under the separateness condition, in 

addition to the GOPS which is obtained in Step 2a. 

For example, for each a 1 satisfying (2), we add edges to 

H between the a 1 and every possible PS in G2 under 



the separateness condition, and similarly for each of a 2 

that satisfy (2). We denote the result of the 

combination by η2. 

Step 3. Let ηi be the GOPS obtained from the preceding 

step. Note that ηi can be a set of GOPS’s. For each 

GOPS H in ηi, we combine H with Gi+1 as in Step 2, 

where we replace G1 and G2 with H and Gi+1, 

respectively. We repeat this combination with Gi+1 for 

all the graphs H in ηi, which results in the set, ηi+1, of 

newly combined graphs. 

Step 4. If i + 1 = m, then stop the process. Otherwise, 

repeat Step 3. 



1 

2 

3 

4 

5 

6 

11 

Figure 1: A model of 13 variables. 

7 

12 

8 

10 

9 

13 



1 

6 

2 

3 

11 

4 

12 

G 

5 

10 

6 

1 

9 

2 

3 

7 

G 3 

G 1 

4 

5 

6 

Figure 2: Marginals of the model in Figure 1. 

11 

7 

12 

G 2 

G 4 

8 

10 

6 

9 

4 

13 

7 

6 

7 

12 

10 

8 

10 

13 



Unite 

4 

7 

4 G2 G4 

6 

10 

6 

7 

8 

10 

12 13 

8 

Check separateness 

4 

6 

7 

6 

8 

10 

12 13 

7 

12 

10 

13 

Select & delete 

4 

X 

X 

6 

7 

X 

8 

10 

12 13 

Figure 3: Combining G2 and G4 of the model in Figure 1. 

Dept. of Math. Sciences, KAIST 

X


6 

7 

11 

Unite 

6 

11 

12 

10 

12 

9 

10 

9 

13 

G 3 

6 

7 

11 

G 4 

12 

6 

10 

Chech separateness 

9 

7 

13 

12 

10 

X2 

6 

7 

11 

13 

X1 

X 

Select & delete 

Figure 4: Combining G3 and G4 of the model of 13 variables. 


12 

10 

9 

X 

13


30 

31 

32 

28 

29 

34 

35 

33 

36 

30 

32 

38 

37 

35 

31 

37 

36 

38 

39 

40 



1. Unite 

29, 34 

28, 30, 29 

29, 34 

34, 36 37, 38 

30, 32 

28, 30, 29 

36, 38 

34, 36 37, 38 

3. Check separateness 

29, 34 

28, 30, 29 

30, 32 

31 

31 

30, 32 

36, 38 

34, 36 37, 38 

30, 32 

31 

36, 38 

37, 38 

31 

2. Connect 

29, 34 

28, 30, 29 

30, 32 

36, 38 

34, 36 37, 38 

4. Select & delete 

29, 34 

28, 30, 29 

30, 32 

X 

36, 38 

34, 36 37, 38 

Figure 5: Combining xGOPS5 and xGOPS6 of the model of 

40 variables. 

X 

X 

31 

X 

X 

31 



7 Time complexity of the procedure 

• For two graphs, G1 and G2, let |Vi| = ni with i = 1, 2, 

|V1 ∩ V2| = n12 and ñi = ni − n12. 

• It is well known that the time complexity of the 

depth-first search method (Tarjan, 1972) for a graph 

G = (V, E) is of order O(|V | + |E|). 

• So the time complexity for the combination is of order 

ñ 2 1O(ñ2 + ˜e2) + ñ 2 2O(ñ1 + ˜e1), where ˜ei is the number of 

edges in the induced subgraph of Gi on Vi \ V3−i. 

- Tarjan, R. E. (1972). Depth-first search and linear graph 

algorithms. SIAM J. Comput. 1(2), 146-160. 




6 

1 

7 

2 

3 

4 

5 

17 

11 

8 

14 

12 

16 

9 

23 

10 

18 

13 

15 

25 

21 

19 

28 

31 

26 27 39 

Figure 6: A model of 40 binary variables 

20 

22 

24 

34 

29 

30 

37 

36 

38 

35 

32 

33 

40 



Figure 7: A matrix of regressor-response relationships for 

the 40 variables as obtained from a tree regression analysis. 

The 1’s in column j are the indicators of the regressor 

variables for the response variable Xj. The six blocks correspond 

to the six subsets of variables listed in Table 1. 



Table 1: The indexes of the variables in the 6 subsets, 

V1, · · · , V6. 

V1 = {1, 2, 3, 4, 5, 6, 7, 8, 11, 12} 

V2 = {8, 9, 10, 11, 12, 14, 15, 16, 17, 18} 

V3 = {10, 13, 14, 15, 19, 20, 21, 22, 23, 24} 

V4 = {13, 20, 21, 22, 25, 26, 27, 28, 29, 34} 

V5 = {28, 29, 30, 31, 32, 34, 35, 36, 37, 38} 

V6 = {30, 31, 32, 33, 35, 36, 37, 38, 39, 40} 



Table 2: Goodness-of-fit levels of the six marginal models 

Marginal model d.f. Pearson χ 2 p-value 

1 567 547.50 0.714 

2 645 667.41 0.263 

3 601 589.07 0.628 

4 649 679.25 0.199 

5 617 591.89 0.760 

6 604 621.53 0.302 



1 

2 

12 

4 

3 

8 

7 

5 

6 

G1 

11 

18 

G2 

23 

10 

21 

15 

13 

20 

22 G3 

24 

27 

G4 

G5 

30 

14 

31 

32 

28 

29 

34 

35 

36 

19 

38 

37 

Figure 8: Marginal models of the model in Figure 6 for 

the 6 subsets of variables which are listed in Table 1. Gi is 

the decomposable log-linear model for subset Vi. PSs are 

represented by thick lines. See Figure 9 for the PSs of the 

6 marginal models. 

12 

8 

29 

28 

33 

16 

30 

32 

9 

10 

34 

13 

20 

35 

31 

21 

14 

11 

22 

36 

15 

37 

26 

38 

39 

17 

25 

40 

G6 



3, 4 

3, 5 

4, 8 {11,12} 

8, 10 {11,12} 

15, 16 

14, 16 

21, 22 

13, 14 

10, 21 

GOPS1 GOPS2 GOPS3 

20, 29 

20, 28 

13, 20 

29,34 

21, 22 

28, 30, 29 

37,38 

29, 34 

30,32 

34, 36 

GOPS4 GOPS5 GOPS6 

31 

15 

36, 38 

37, 38 

10, 13 

10, 19 

30, 32 31 

Figure 9: The GOPS’s of the six marginal models in Figure 

8. 


20


17 

18 

1 

2 6 

11 

12 

14,16 

3, 4 3, 5 

2 

15,16 

23 

24 

1 1 

4, 8 

9 

2 

1 

13,14 

10,19 

8, 10 

3 

3 

10,13 

10,21 21,22 

3 

7 

2 

3 

4 

13,20 

39 40 

35 

20, 29 

20, 28 

4 

4 

4 

25 26 

27 

37, 38 

29, 34 

5 

6 

34, 36 

31 

5 

36, 38 

28,30 , 29 

5 

6 

30,32 6 

Figure 10: An independence graph of PSs and non-PS variables. 

The PSs are in ovals and the dots are for the non-PS 

variables, and the small numbers at the bottom-right of the 

ovals are the submodel labels of which the ovals are PSs. 

33 



6 

1 

7 

2 

3 

5 

4 

11 

8 

17 

12 

16 

14 

9 

10 

18 

23 

15 

21 

13 

25 26 27 

Figure 11: The combined model structure which is obtained 

from the independence graph in Figure 10. The thick edges 

are additional to the true model in Figure 6. 

28 

22 

20 

31 

19 

24 

30 

34 

37 

39 

29 

32 

38 

36 

40 

35 

33 



1 

2 

12 

4 

3 

8 

7 

5 

6 

11 

18 

G1 G2 

12 

8 

16 

9 

10 

14 

11 

15 

17 




6 

41 

1 

46 

3 

17 

42 

2 

43 

47 

5 

4 

11 

7 

44 

45 

8 

12 

9 

14 

16 

52 

48 

57 

18 

58 

23 

51 

10 

49 

63 

56 

55 

15 

21 

13 

54 

67 

61 

25 

19 

31 

22 

20 

66 

50 

62 

26 

Figure 12: A model of 100 variables 

27 

24 

30 

65 

53 

29 

28 

59 

34 

33 

64 

71 

37 

35 

36 

86 

72 

32 

68 

38 

39 

60 

40 

73 

81 

70 

69 

85 

91 

75 

84 

82 

87 

74 

92 

76 

83 

89 

80 

88 

77 

94 

90 

78 

96 

79 

100 

93 

97 

99 


95 

98


7 

10 

6 

1 

13 

3 

17 

2 

19 

14 

16 

31 

5 

4 

G 1 

18 

G 4 

20 

11 

51 

30 

8 

15 

12 

9 

G 7 

29 

28 

33 

4 

35 

11 

32 

42 

8 

12 

9 

16 

29 

14 

34 

7 

44 

Figure 13: The first nine marginal models, G1, · · · , G9, of the 

model in Figure 12. 

G 5 

10 

15 

G 2 

35 

36 

86 

G 8 

52 

13 

18 

40 

85 

19 

51 

81 

20 

41 

46 

23 

10 

29 

42 

43 

47 

34 

21 

44 

35 

19 

G 3 

45 

G 6 

37 

G 9 

25 

36 

22 

48 

38 

26 

52 

39 

27 

24 

49 



74 

86 

76 

57 

58 

80 

77 

81 

85 

94 

G 13 

56 

55 

G 16 

78 

54 

84 

82 

87 

G 10 

96 

79 

93 

97 

83 

95 

63 

88 

68 

60 

67 

70 

61 

91 

69 

84 

66 

50 

75 

G 17 

92 

74 

G 14 

62 

83 

76 

65 

Figure 14: The second nine marginal models, G10, · · · , G18, 

of the model in Figure 12. 

59 

89 

88 

G 11 

80 

64 

77 

94 

90 

78 

100 

93 

79 

99 

63 

50 

56 

55 

54 

53 

61 

59 

94 

G 18 

90 

50 

96 

100 

93 

G 12 

71 

G 15 

62 

53 

72 

99 

59 

68 

95 

64 

60 

98 

73 

70 

69 



6 

41 

1 

46 

3 

17 

42 

2 

43 

47 

5 

4 

11 

7 

44 

45 

8 

12 

9 

14 

16 

52 

48 

57 

18 

58 

23 

51 

10 

49 

63 

56 

55 

15 

21 

13 

54 

67 

61 

25 

20 

31 

22 

19 

66 

50 

62 

26 

Figure 15: The combined result of the 18 marginal models 

in Figures 13 and 14. The thick edges are additional to the 

true model in Figure 12. 

27 

24 

30 

65 

53 

34 

28 

59 

33 

29 

64 

71 

37 

35 

36 

86 

72 

32 

68 

38 

39 

60 

40 

73 

81 

70 

69 

85 

91 

75 

84 

82 

87 

74 

92 

76 

83 

89 

80 

88 

77 

94 

90 

78 

96 

79 

100 

93 

97 

99 


95 

98


4 

11 

16 

8 

12 

9 

16 

11 

14 

4, 9 

8, 10 

13, 14 

15 

G 2 

4, 12 

GOPS 2 

10 

15 

19 

13 

10, 13 

19 

20 

Figure 16: Two marginal models which include nodes 10 

and 19. 

20 

23 

23 

10 

25 

10, 21 

21 

25 

19 

G 3 

26 

21, 22 

22 

GOPS 3 

26 

27 

27 

24 

10, 19, 22 

24 



10 Concluding Remarks 

1. The main idea of MCMoSt is similar to 

constraint-based learning[CBL] (Meek 1995; Spirtes, 

Glymour & Scheines, 2000; Neapolitan, 2004) where 

we construct a Bayesian network based on a list of 

constraints which are given in terms of conditional 

independence among a given set of random variables. 

- Meek, C. (1995). Causal influence and causal explanation with 

background knowledge, UAI 11, 403-410. 

- Neapolitan, R.E. (2004). Learning Bayesian Networks, Pearson 

Prentice Hall, Upper Saddle River, NJ. 

- Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, 

Prediction, and Search, 2nd ed. 



But a noteworthy difference between the two is that, 

while the statements of conditional independencies, as 

for the CBL, are an extraction from the probability 

model of the whole set of the variables involved, the 

statements for MCMoSt are from the marginal 

probability models of the subsets of variables. 

2. In selecting subsets of random variables, it is important 

to have the random variables associated more highly 

within subsets than between subsets. This way of 

subset-selection would end up with subsets of random 

variables where random variables that are neighbors in 

the graph of the model structure of the whole data set 

are more likely to appear in the same marginal model. 



3. Although the model combination is carried out under 

the decomposability assumption, we can deal with the 

marginal models of a graphical model, which are not 

decomposable, by transforming their model structures 

into decomposable (i.e., triangulated) graphs. 



11 Where to go? 

• Combining model structures other than UG’s. 

• Models with no observed data. 

• Consistency of models (Dawid and Lauritzen, 1993) 

• Robustness of prediction/classification (Kim, 2005; 

Fienberg and Kim, 2007) 

- Dawid and Lauritzen (1993). Hyper Markov laws in the statistical 

analysis of decomposable graphical models. The Annals of Statistics, 

21, 3, 1272-1317. 

- Kim (2005). Stochastic ordering and robustness in classification 

from a Bayesian network, Decision Support Systems, 39 (3), 253-266. 

- Fienberg and Kim (2007). Positive association among three binary 

variables and cross-product ratios, Biometrika 94(4), 999-1005. 



Thank You!

Markovian Combination of Decomposable Model Structures: MCMoSt

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?