Research

Areas of interest
Peer-to-Peer (P2P) systems
Semantic Web
Continuous queries / Publish-subscribe systems

Research summary (See a Poster of this research presented during Microsoft Research Summer School 2007)
We propose to study the efficient evaluation of continuous RQL queries on top of distributed hash tables. RQL is a declarative query language for RDF/RDFS databases with the ability to express both data and schema queries in a uniform manner. It is used in two of the most prominent RDF stores, Sesame and RSSDB. Distributed hash tables are overlay networks that allow nodes holding data items to self-organize and offer data lookup functionality in a provably efficient, scalable, fault-tolerant and adaptive way. The problem of efficient evaluation of continuous RQL queries on top of distributed hash tables is essentially an open problem at the moment. We plan to extend previous work done in our group in order to deal with conjunctive queries in RQL. Then, we will study how to integrate RDFS reasoning in our algorithms so that schema queries in RQL are answered efficiently. Our next step will be to consider special cases of conjunctive queries (e.g., path queries) that will probably be amenable to more efficient query evaluation strategies. Finally, we will consider nested RQL queries and multiple query optimization issues. Our experimental work will be done by simulation or by using large scale distributed infrastructures such as PlanetLab. We will seek to demonstrate what trade-offs are possible between various performance metrics that need to be optimized in this setting e.g., number of hops, network latency, network bandwidth and load distribution under various data/query workloads.