Automatic Incorporation of Search engines into a Large-Scale Metasearch Engine
Active In SP
Joined: Mar 2010
22-04-2010, 12:29 AM
A metasearch engine supports unified access to multiple component search engines. To build a very large-scale metasearch engine that can access up to hundreds of thousands of component search engines, one major challenge is to incorporate large numbers of autonomous search engines in a highly effective manner. To solve this problem, we propose automatic search engine discovery, automatic search engine connection, and automatic search engine result extraction techniques. Experiments indicate that these techniques are highly effective and efficient.
In the context of a metasearch engine, the process of incorporating search engines consists the process of discovering search engine interfaces, connecting to them and extracting result documents from search engine returned webpages.
A significant problem in building a very large-scale metasearch engine that supports unified access to hundreds of thousands of search engines is the impracticality of manually incorporating these search engines. Even if this were possible, maintenance would be a nightmare. Changes to search engines take place from time to time, often leaving a search engine unusable for metasearch unless corresponding changes are made in the metasearch engine. Manual maintenance therefore is hardly practical. We believe that the entire process of search engine incorporation should be automated, to enable construction and maintenance of very large-scale metasearch engines.
The three major components that are essential to achieve automation are:
1. Automatic search engine discovery.
Discover (identify) search engines from millions of websites on the Web.
2. Automatic search engine connection.
Automatically connect to each discovered search engine so that user queries submitted to the metasearch engine are forwarded to search engines and search results from search engines
are returned to the metasearch engine.
3. Automatic search result extraction.
Automatically analyze each result page returned from a search engine for a query, extract useful information, such as the number of retrieved documents for the queries, URLs of result
documents and so on from the page. The state of the art large-scale metasearch engines, like
profusion, can manage metasearch over around 1,000 search engines but not more. In this paper, through experiments on the initial implementation of the proposed three-component search engine incorporation framework, we demonstrate the potential capability for a metasearch engine to handle much more search engines, even in terms of hundreds of thousands.
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion