30.12.2012 Views

Glushkov Automata - sbes - 2007

Glushkov Automata - sbes - 2007

Glushkov Automata - sbes - 2007

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

22nd Brazilian Symposium on Database<br />

SBBD <strong>2007</strong><br />

Assisting XML Schema Evolution that<br />

Preserves Validity<br />

Béatrice Bouchou<br />

bouchou@univ-tours.fr<br />

Laboratoire d'Informatique (LI)<br />

Denio Duarte<br />

denio@unochapeco.edu.br<br />

Centro Tecnológico (Cetec)<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Agenda<br />

● Motivation<br />

● Theoretical Background<br />

● Approach<br />

● Final Considerations<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Motivation<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Motivation<br />

Documents are valid<br />

Schema<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Motivation<br />

Schema<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

Constraint:<br />

Publications are grouped by:<br />

journal papers organized by<br />

subject and year of publication


Motivation<br />

Schema<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

Publication : Subject (Year Journal + ) *


Motivation<br />

Labs decide to consider<br />

conference papers<br />

as publications Publication : Subject (Year Journal + ) *<br />

Schema<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Motivation<br />

Labs decide to consider<br />

conference papers<br />

as publications Publication : Subject (Year Journal + ) *<br />

Schema<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

The schema must be updated


Motivation<br />

Publication : Subject (Year Journal + Conference) *<br />

Schema'<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

Mandatory


Motivation<br />

Publication : Subject (Year Journal + Conference?) *<br />

Schema'<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

Only one conference paper<br />

by year?


Motivation<br />

Publication : Subject (Year Journal + Conference + ) *<br />

Schema'<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

Mandatory


Motivation<br />

Publication : Subject (Year Journal + Conference * ) *<br />

Schema'<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

It's OK!


Motivation<br />

Publication : Subject (Year Journal + Conference * ) *<br />

Schema'<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

One of the laboratory sends<br />

several entries for conferences<br />

and journals.


Motivation<br />

Publication : Subject (Year Journal + Conference * ) *<br />

Schema'<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

The administrator must organize<br />

them for insertion since journals<br />

should appear before conference


Motivation<br />

Publication : Subject (Year (Journal|Conference) + ) *<br />

Schema'<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Motivation<br />

✔ It is difficult to evolve XML schemas<br />

mainly if the administrator is not a<br />

computer science expert.<br />

Schema'<br />

Document<br />

XML<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Motivation<br />

● The demand for tools designed for administrators<br />

not belonging to the computer science community<br />

● The cost (time and money) of revalidation process<br />

● Distributed XML databases following the same<br />

schema<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Theoretical Background<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Theoretical Background<br />

● Theoretical notions used in this work<br />

– Regular expressions (RE) to define the allowed subelements<br />

of an element.<br />

– Finite state automata (FSA) to verify whether or not<br />

the sub-elements respect the constraints imposed by<br />

the element.<br />

– Transformation of RE to FSA<br />

● <strong>Glushkov</strong>'s Algorithm ⇒<br />

<strong>Glushkov</strong> automaton<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

● Given a RE, a <strong>Glushkov</strong> automaton is built as<br />

follows:<br />

– All symbols in RE are subscripted by their positions.<br />

+ *<br />

● Subject (Year Journal+)* Subject (Year Journal )<br />

1 2 3<br />

– We add an end mark (#) to the RE:<br />

+ *<br />

● Subject (Year Journal ) #4<br />

1 2 3<br />

– Symbols and positions become transitions in the FSA<br />

– Each state represents a symbol (except the initial<br />

state)<br />

⇒<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

+ *<br />

● Subject (Year Journal ) #4<br />

1 2 3<br />

Year<br />

Subject Year<br />

0 1 2 3<br />

Journal<br />

� �<br />

Journal<br />

0 1 2 3 4<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

#<br />

#<br />

4<br />

<strong>Glushkov</strong> Graph (homogeneous)


<strong>Glushkov</strong> <strong>Automata</strong><br />

+ *<br />

● Subject (Year Journal ) #4<br />

1 2 3<br />

0 1 2 3 4<br />

Cycles are called orbits<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

+ *<br />

● Subject (Year Journal ) #4<br />

1 2 3<br />

0 1 2 3 4<br />

The orbits represent the starred subexpression of<br />

the regular expression<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

+ *<br />

● Subject (Year Journal ) #4<br />

1 2 3<br />

0 1 2 3 4<br />

The orbits represent the starred subexpression of<br />

the regular expression<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

+ *<br />

● Subject (Year Journal ) #4<br />

1 2 3<br />

0 1 2 3 4<br />

● The hierarchy of orbits H is formed by the orbits in the graph<br />

and the set of all nodes (respecting the set inclusion property):<br />

● H={{3},{2,3},{0,1,2,3,4}}<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

● The notion of orbit gives us another notion:<br />

– Context: the set of symbols that appear in an orbit.<br />

– A general context contains the symbols not<br />

belonging to any context.<br />

– Contexts are disjoint sets.<br />

● In our example:<br />

– Context for Journal: {Journal} corresponds to the<br />

orbit {3}<br />

– Context for Year: {Year} corresponds to the orbit {2,3}<br />

– The general one: {Subject}<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

● We can transform a <strong>Glushkov</strong> automaton into a<br />

RE following Caron and Ziadi approach: (Caron &<br />

Ziadi, 2000 - TCS 1 )<br />

– First, the orbits are removed: for each orbit, all arcs<br />

producing a cycle are deleted:<br />

0 1 2 3 4<br />

– The orbits are stored in the hierarchy of orbits H<br />

1<br />

Theoretical Computer Science<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

Orbit {2,3}


<strong>Glushkov</strong> <strong>Automata</strong><br />

● We can transform a <strong>Glushkov</strong> automaton into a<br />

RE:<br />

– First, the orbits are removed (cont.)<br />

0 1 2 3 4<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

Orbit {3}


<strong>Glushkov</strong> <strong>Automata</strong><br />

● We can transform a <strong>Glushkov</strong> automaton into a<br />

RE:<br />

– First, the orbits are removed (cont.)<br />

0 1 2 3 4<br />

● We have a graph without orbits<br />

● And the hierarchy of orbits H={{3},{2,3},{0,1,2,3,4}}<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

● We can transform a <strong>Glushkov</strong> automaton into a<br />

RE:<br />

– Second, we start a reduction process over the graph<br />

without orbits by using H.<br />

– This process is applied according to H, respecting the<br />

set inclusion property.<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

● We can transform a <strong>Glushkov</strong> automaton into a<br />

RE:<br />

– Applying three rules:<br />

Rule 1<br />

Rule 2<br />

Rule 3<br />

x y xy<br />

y<br />

x<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

x|y<br />

x x?


<strong>Glushkov</strong> <strong>Automata</strong><br />

● We can transform a <strong>Glushkov</strong> automaton into a<br />

RE:<br />

– Moreover, if a node represents a whole orbit, it is<br />

decorated with a + (positive closure)<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


<strong>Glushkov</strong> <strong>Automata</strong><br />

● Reduction process:<br />

0 -<br />

1 -<br />

2 -<br />

3 -<br />

4 -<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1 2 3 4<br />

1 2 3 + 4<br />

R 1<br />

1 (2 3 + ) + 4<br />

R 3<br />

1 (2 3 + ) + 4<br />

R 1<br />

1 (2 3 + ) * 4<br />

R 1<br />

0 1 (2 3 + ) * 5 - 4 Result:<br />

R 1<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

H={{3},{2,3},{0,1,2,3,4}}<br />

H={{2,3},{0,1,2,3,4}}<br />

H={{2,3},{0,1,2,3,4}}<br />

H={{0,1,2,3,4}}<br />

H={{0,1,2,3,4}}<br />

0 1 (2 3 + ) * 4


<strong>Glushkov</strong> <strong>Automata</strong><br />

● Reduction process:<br />

0 -<br />

1 -<br />

2 -<br />

3 -<br />

4 -<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1 2 3 4<br />

1 2 3 + 4<br />

R 1<br />

1 (2 3 + ) + 4<br />

R 3<br />

1 (2 3 + ) + 4<br />

R 1<br />

1 (2 3 + ) * 4<br />

R 1<br />

R 1<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

H={{3},{2,3},{0,1,2,3,4}}<br />

H={{2,3},{0,1,2,3,4}}<br />

H={{2,3},{0,1,2,3,4}}<br />

H={{0,1,2,3,4}}<br />

H={{0,1,2,3,4}}<br />

0 1 (2 3 + ) * 5 - 4 Result: Subject (Year Journal+)*


Schema Update Primitives<br />

● Primitives:<br />

– Insertion, replacing, deletion, creation<br />

● Attributes, elements, content models<br />

– Cardinality changes<br />

– Constraints<br />

● Functional dependencies, types<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Schema Update Primitives<br />

● Guerrini et al in [WIDM 1 , 2005] have shown the<br />

impact of schema update primitives over XML<br />

documents:<br />

– We can propose primitives that may be<br />

consistency-preserving<br />

1<br />

Workshop on Web Information and Data Management<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Schema Update Primitives<br />

● They are:<br />

– Insertion as optional:<br />

● Attributes, elements<br />

– Creation<br />

● Content model<br />

– Element's cardinality changes<br />

● 1 to 0<br />

● 1 or 0 to 0:n<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Approach<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Proposed Conservative Primitives<br />

● Insertion of a sub-element in an element content<br />

model<br />

– ins<br />

● Making an element to be optional:<br />

– makeOpt<br />

● Extending the cardinality<br />

– ExtendCard<br />

● Create a new content model<br />

– createCM<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Insertion<br />

● User tool (prototype):<br />

You have chosen to insert Conference into the content model of Publication<br />

Select an element that has a semantic close to that of Conference:<br />

Subject (Year Journal + )*<br />

Select if you want to insert Conference:<br />

relatively to Journal<br />

relatively to (Journal+)<br />

Select if you want to insert Conference:<br />

as a choice: Journal | Conference<br />

before: Conference Journal<br />

after: Journal Conference<br />

Do you want Conference to be repeated:<br />

Yes<br />

No<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Insertion<br />

● To insert a new sub-element e' into an element e<br />

content model E<br />

– A node n (representing e' to be inserted into E) is<br />

inserted into the corresponding <strong>Glushkov</strong> graph<br />

without orbits G w (in a given position τ and context ζ):<br />

ins (e,e',τ,context,mode, times)<br />

● context= true (e' is inserted relatively to ζ) or false (relatively<br />

to τ)<br />

● mode can be choice, sequence-after or sequence-before<br />

● times = true, e' is decorated with +<br />

ins (Publication, Conference,3,false, choice, false)<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Insertion<br />

Publication : Subject (Year Journal+)*<br />

ins (Publication, Conference,3, false, choice, false)<br />

G<br />

G w<br />

G' w<br />

0 1 2 3 4<br />

0 1 2 3 4<br />

0 1 2 3 4<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

5<br />

H={{3},{2,3},{0,1,2,3,4}}<br />

H'={{5,3},{2,3,5},{0,1,2,3,4,5}}


Insertion<br />

Publication : Subject (Year Journal+)*<br />

ins (Publication, Conference,3, false, choice, false)<br />

G' w<br />

G' w<br />

0 1 2 3 4<br />

0 1 2 3|5 4<br />

+<br />

G' w 0 1 (2 (3|5))*4<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br<br />

5<br />

H'={{5,3},{2,3,5},{0,1,2,3,4,5}}<br />

H'={{5,3},{2,3,5},{0,1,2,3,4,5}}<br />

H'={{5,3},{2,3,5},{0,1,2,3,4,5}}<br />

H'={}<br />

Subject (Year (Journal|Conference)+)*


Other Primitives (Syntax)<br />

● Making an element to be optional:<br />

– makeOpt(e,τ)<br />

● Extending the cardinality<br />

– ExtendCard(e,τ)<br />

● Create a new content model<br />

– createCM(e,E)<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


High Level Primitives<br />

● Express complex updates in a more compact way:<br />

– insSubExp(e,β,τ,context,mode,times)<br />

● Making a sub-expression optional<br />

– makeSubExpOpt(e,β)<br />

● Extending the cardinality<br />

– extendSubExp(e,β)<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Final Considerations<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Conclusions<br />

● <strong>Glushkov</strong> automata (and graphs) allow us to<br />

identify starred sub-expressions of a regular<br />

expression.<br />

● The updates are performed in a intuitive way.<br />

● The proposed framework is consistencypreserving:<br />

– The documents valid wrt the old schema are<br />

valid wrt to the new one.<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Conclusions<br />

● Evolving XML schema is still a challenge:<br />

– Revalidation costs<br />

– Access to the documents to be revalidated<br />

– Data loss (to transform an invalid document into<br />

a valid one)<br />

– If the documents to be revalidated are stored in<br />

different sites: transfer costs.<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


Directions<br />

● The proposed primitives are not complete<br />

● Using our approach together with nonconservative<br />

primitives in a general framework<br />

● Apply this method for document integration<br />

● Consider other types of schema representation<br />

● Extend this approach to treat facet updates<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br


22nd Brazilian Symposium on Data Base<br />

SBBD <strong>2007</strong><br />

Assisting XML Schema Evolution that<br />

Preserves Validity<br />

Thank You!<br />

bouchou@univ-tours.fr denio@unochapeco.edu.br

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!