- WIR Book
Software, Documentation and Datasets
Search Computing (SeCo) is about merging two of the most successful techniques in the history of information processing: the simplicity of search, for opening querying also to non technical users, and the power of join, for correlating data by means of their values. SeCo aims at filling the gap between generalized search systems, which are unable to find information spanning multiple topics, and domain-specific search systems, which cannot go beyond their domain limits.
The Search Computing framework provides a general purpose Web integration platform and search engine upon distinct sources. The main purpose of our framework is lowering the complexity of building search applications. The impact of this work can be potentially high: when the complexity threshold of building structured entity search applications will be lowered, a variety of new[Binaries (preview release V5)] market sectors will become more profitable and accessible. In several scenarios, search-enabled Web access will grow in interest and value when SMEs or local businesses will see the opportunity of building search applications tailored to their market niche or sale region.
The Search Computing Framework in a Nutshell
The Search Computing Framework covers all the phases required for formulating and processing multi-domain search queries. SeCo queries can be addressed to a constellation of Web data sources, including search engine APIs, products, events and people databases (e.g., Amazon, Eventful, LinkedIn), scientific data sources (e.g., DBLP, PubMed), and community curated data sources (e.g., YQL Open Data Tables, DBpedia). These data sources are registered in the framework using the Service Mart Repository, that contains a multi-level description of the callable search services. At the most conceptual level, sources are registered as service marts and characterized by the service name and a collection of attributes (single or multi-valued) exposed by the service.
Such an abstract description is refined into one or more access patterns, i.e., logical signatures that specify whether each attribute is either an input or an output in the service call; output attributes are tagged as ranked if the service produces ordered results based on their value. Access patterns can be joined, when parameters of one service mart match the output parameters of another mart in terms of both type and meaning; then, m parameters are either both tagged as output (yielding to a parallel join) or one is tagged as output and the other one is tagged as input (yielding to a pipe join). Access patterns are next refined into service interfaces, including a name and an endpoint of a concrete search service. The actual service invocation is managed by the Execution Engine, which supervises the interaction with several wrappers used to access Web APIs and databases.
Queries can be submitted to the system in two ways: 1. via the Liquid Query user interface, which is a client-side component accepting pre-defined queries by filling-in runtime parameters through a form; 2. by directly typing natural language questions (currently available for a selection of services only), in which case a server-side component will perform a probabilistic textual interpretation to identify relevant services and their connections.
In both cases, queries are submitted to the Query Orchestrator, a server side component that manages queries, result caching, and user’s sessions. A new query consists of a conjunction of predicates over data sources, join predicates to express connections of the sources, and a global rank criterion for sorting the result set.
Each query undergoes an analysis and translation process, managed by the Query Analyzer, which produces a Query Execution Plan (QEP), i.e. a graph of low-level components that specifies the activities to be executed (e.g., the service calls), their order of precedence, and the strategy to execute joins. The plan is output by the Query Optimizer, which chooses the join implementation (e.g., parallel vs. pipe join) and sets the parameters of the join execution strategy (e.g., the number of times a service is called to retrieve the top-k results of the query).
A QEP is executed by the Execution Engine, which analyses and breaks it recursively into subcomponents, which in turn are either QEPs or atomic service invocations. The results of service calls are accumulated by the engine, which builds progressively the combinations constituting the query response, which are submitted back to the Liquid Query interface for visualization and interaction.
The SeCo concepts are being implemented in a comprehensive architectural infrastructure. Several demonstrators are available online so to allow first-hand experiments with the join of services, NLP queries and the exploratory search paradigm, the multiple visualization of results (tabular, map, and parallel coordinates), and the functioning of the SeCo Execution Engine. [SeCo Demonstrators]
This section contains links to downloadable versions of the SeCo framework. We provide two versions:
The deployment of Web applications integrating data sources by means of the SeCo framework requires a number of software components and quite sophisticated interactions between them; in this Section we list a set of technical documents that describe the SeCo architecture at different levels of granularity, from the architecture to the APIs of the single components. We also provide a set of tutorials, aimed at supporting users and developers of the SeCo framework in using and extending the platform.
- Introduction to The SeCo Architecture
- Documentation (JavaDoc) from the SeCo source code:
- Architecture Installation Guide (DRAFT)
- How to add a new service wrapper
- Adding a New Service To The Platform (DRAFT)
As search is the preferred method to access information in today's computing systems, the Search Comopting research environment blossomed several related research projects, conducted by memebers of the SeCo team.