First Workshop Abstracts

The Next Generation of Search

Ricardo Baeza-Yates, Yahoo! Research, Barcellona

ABSTRACT:
We are transitioning from a Web of pages to a Web of objects. In this Web of objects is even more important to understand the need behind the user query, as people is interested in solving a task and not really in posing a sequence of queries. This trend will change current search engines, personalizing ranking and the user interface to the target task. In this talk we will hint how this can be done, the challenges ahead.

Supporting Ranking In (Uncertain) Database Systems

Ihab Ilyas, University of Waterloo

ABSTRACT:
Ranking queries (also referred to as top-k queries) produce results that are ordered on some computed score and provide the user with the k most important results. Top-k queries are dominant in applications such as multimedia databases, information retrieval, web databases and middleware. Although ranking queries become a crucial part of many emerging applications, current relational query processors handle this type of queries neither natively nor efficiently. In this talk, I will give an overview of the RankDB and URank projects at Waterloo. RankDB focuses on supporting ranking in relational database systems by treating ranking predicates as first class constructs. I will also describe URank, our ongoing efforts in defining and supporting top- k queries in the context of uncertain and probabilistic databases.

Web Data Extraction: The Lixto Project and Future Plans

Georg Gottlob, Oxford University

ABSTRACT:
In this talk I will give a survey of the Lixto project both as a research project and a commercial enterprise. In particular, I will focus on scalability issues and the use of cloud computing for data extraction. I will also speak about current commercial applications of Lixto. Finally, I will outline new research plans towards achieving fully automated data extraction in specific domains, and I will explain how this could relate to the SeCo project.

Dataspaces and Search Computing: New Paradigms or Step Changes

Norman Paton, University of Manchester

ABSTRACT:
Data integration, in various guises, has been the focus of ongoing research in the database community for over 20 years. The objective of this activity has generally been to provide the illusion that a single database is being accessed, when in fact data may be stored in a range of different locations and managed using a diverse collection of technologies. Providing this illusion typically involves the development of a single central schema to which the schemas of individual resources are related using some form of mapping. Given a query over the central schema, the mappings, and information about the capabilities of the resources, a distributed query processor optimizes and evaluates the query. Data integration software is impressive when it works; declarative access is provided over heterogeneous resources, in a setting where the infrastructure takes responsibility for efficient evaluation of potentially complex requests. However, in a world in which there are ever more networked data resources, data integration technologies from the database community are far from ubiquitous. This stems in significant measure from the fact that the development and maintenance of mappings between schemas has proved to be labour intensive. Thus classical data integration represents high-cost, high-quality integration, a combination that is certainly useful, but which does not work for all potential users. As a result, research is now underway, in areas such as dataspaces and search computing, that seeks to reduce integration costs, provide access to different types of data source with different quality characteristics, and thus to make progress in the space between classical information retrieval and database integration. This talk seeks to characterise key features of recent offerings in this space, with a view to identifying key decision points for the designers of information integration platforms who aspire to avoid point solutions in large decision spaces.

Universal Mashup Languages

Fabio Casati, University of Trento

ABSTRACT:
Developing applications by integrating existing components has been the dream and goal of many developers and the business proposition of software vendors. This is also true for e-government or e-commerce applications, where services are provided by integrating data, application, or more recently, user interfaces (as in the case of mashups).
Despite the promises, the current composition/integration landscape is characterized by complex development languages (think about BPEL for service composition, or about ETL languages or even just about the complexity of creating a mashup using ajax), complex deployment/execution (requiring installation and operation of sophisticated middleware platforms) and complex analysis (developing and deploying a business intelligence solution that analyzes composite applications such as those supporting business operations today is a very expensive activity in terms of time and license costs). Indeed, integration technologies have largely failed to deliver on their promises.
In this talk we revisit the problem of integration and composition and, building on the lessons learned from years of successes and failures and from the teaching of the social Web, we make the case for universal integration, which aims at presenting an integration environment that is:

Universal: It can cover data, application, and UI integration (including integration on the Web) by providing a unified component and composition model that can be used in all circumstances.

Simple: Very often, integration technologies have failed or have been adopted to a much lesser degree than expected because of excessive complexity in the model. The Web is teaching us that it is possible to build models that are simple and useful. Along this line, a key aspect of the Universal Integration model is simplicity, or rather, the separation of what can be simple and what instead needs to be complex. Hence, a characteristic of universal integration is to support various models at different levels of complexity, targeted at different kinds of users and different kinds of integration problems.

Hosted: Integration can provided as a service, with zero client-side code.

Manageable: Manageability in universal integration means covering the entire integration lifecycle, much like software development tools cover the entire software development lifecycle. This includes providing support ranging from such aspects as rapid prototyping for requirement collection and validation to built-in business intelligence techniques for execution analysis, all with the same underlying philosophy of sense and simplicity.

I will also discuss current research directions that investigate how universal integration can be combined with the wisdom of the crowd to enable domain expert programming.

Of distributed queries: models and systems

Ioana Manolescu, INRIA Paris

ABSTRACT:
In the first part of this talk, I will discuss a set of lessons learned while contributing to a set of distributed data management projectsin the last ten years (which make up my whole, admittedly short "research lifespan"!) These systems focused on handling relational, XML, or Semantic Web data, table functions or Web services; in mediator systems, in unstructured, or structured peer-to-peer networks. All have been implemented, and some have grown into solid industrial software. Rather than delving into lists of detailed features, I will attempt to extract from the experience of each project, the most relevant points, or those who taught me most for my subsequent work. These lessons can be classified in two directions: models; and, solid code. The last part of the talk will switch focus towards issues of research prototype development and testing. In this part, I will elaborate on the coding lessons learned, as well as on the experience we gathered with the SIGMOD 2008-2009 repeatability initiative. Supporting Ranking In (Uncertain) Database Systems