Title


Probabilistic Uncertainty Management in the DataRing

Description


Sources of uncertainty in data abound: noisy measurements, data resulting from imperfect automatic systems such as information extraction or natural language processing, data by nature imprecise such as a human-made diagnostic, etc. In the context of an autonomous, heterogeneous, decentralized system such as the one investigated in the DataRing project [1], uncertainty also originates from essentially imperfect schema matchings, doubts about the actual presence of a fact or of a whole document on a given peer, or redundancy and contradiction of the information present in different peers. One possible way, among the most natural, to represent this uncertainty is through probabilistic databases.

The objective of this PhD position is to find formal models for the representation and efficient querying of probabilistic databases in a peer-to-peer environment, and to build corresponding prototype systems.

Because of the heterogeneous nature of the information shared in the DataRing, semi-structured (i.e., XML) models should be favored, though the simplicity of the flat-tuple representation of the relational model can also be an inspiration. Previously studied probabilistic semi-structured models [2,3,4] can be a basis for the proposed work. Particular aspects of interest include: - management of the various forms of uncertainty; - routing and distributed computation of probabilistic queries over the peer-to-peer network; - corroboration of information across sources; - ranking of query results and top-k query processing.

Supervision


The 3-year PhD thesis will be supervised by Pierre Senellart and Talel Abdessalem in the Computer Science and Networking Department at TELECOM ParisTech, in interaction with the other partners of the ANR DataRing project, notably Serge Abiteboul's Gemo team at INRIA Saclay.

TELECOM ParisTech, formerly known as ENST, is the leading French engineering school specialized in information technology, and is located inside Paris.

Conditions


Starting date: beginning 2009 (flexible). Prerequisites for applying: Master's degree in computer science (or equivalent diploma), background in applied and theoretical database management. Revenue: ~1500 € monthly net revenue, over 3 years

Please contact Pierre Senellart <pierre.senellart@telecom-paristech.fr> for any information and for applications.

References


[1] S. Abiteboul and N. Polyzotis, The Data Ring: Community Content Sharing. In Proc. CIDR, January 2007, Asilomar, USA.

[2] P. Senellart and S. Abiteboul, On the complexity of managing probabilistic XML data. In Proc. PODS, June 2007, Beijing, China.

[3] B. Kimelfeld, Y. Kosharovski, and Y. Sagiv, Query efficiency in probabilistic XML models. In Proc. SIGMOD, June 2008, Vancouver, Canada.

[4] S. Cohen, B. Kimelfeld, and Y. Sagiv, Incorporating constraints in probabilistic XML. In Proc. PODS, June 2008, Vancouver, Canada.

HIGHLIGHTED POSTS

My wedding photos (17/09/08)

Friendly match - Austria 3 - 4 Netherlands (26/03/08)

5 days in Greece (Day 0 1 2 3 4 5) (15/03/08)

Lần đầu tiên trượt băng (28/02/08)

Uhrenmuseum Wien (24/02/08)

Wien Museum Karlsplatz Part 1 2 3 4 5 (17/02/08)

Slam Dunk (06/02/08)

Nem rán mừng xuân (05/02/08)

Comparisons between West and East's cultures (31/01/08)

Captain Tsubasa (27/01/08)

Bò xào, thật là đơn giản! (24/01/08)

Rambling in the center of Vienna (16/01/08)

The last night of the year 2007 in Vienna (01/01/08)

TOEFL Score (27/12/07)

Snowing in Vienna (16/11/07)

Locations of visitors to this page

page counter