Active In SP
Joined: Sep 2010
24-12-2010, 10:47 AM
Under the guidance of
Saroj Kumar Rout
META SEARCH.ppt [Autosaved]soumen newQ (2).ppt (Size: 7.67 MB / Downloads: 60)
What is it?
A meta search engine is a system that supports unified access to multiple local search engines
It does not maintain its own index on web pages .
A metasearch engine is an effective tool to quickly reach a large portion of the deep Web
Increase search coverage of web
Increase the scalability of searching the web
Improve the retrieval effectiveness
When to use?
Need to be used cautiously
Good for simple searches, particularly if search terms are distinctive or unique
Good for testing with a few keywords – and find which individual search engine returns good results
Good for ‘quick and dirty searching’ if you are in a hurry and want to find a few relevant sites quickly
For complex searches, involving many search terms, Boolean logic, etc., it is better to use individual search engines
METASEARCH ENGINE TECHNOLOGY
techniques that identify what search engines are likely to contain useful results for a given query .
methods that determine what pages from selected search engines should be retrieved and how the results from different search engines should be merged are reviewed.
Metasearch Software Component
It is responsible for sending each user query to only potentially useful search engines for processing failing on which it may cause wasteful network traffic
The database selection process can be classified into the three categories
Rough representative approaches
Statistical representative approaches
Rough representative approaches: The representative of a database contains only a few selected key words or paragraphs . It can only provide a very general description about the contents of databases .
Statistical representative approaches: Database representatives have detailed statistical information about the document databases. This detailed statistics allow more accurate estimation of database usefulness with respect to any user query .
This method determines what Web pages should be retrieved from each selected search engine and how the retrieved Web pages from multiple search engines should be merged into a single result list.
It includes two modules:
document selection module (document selector)
result merge module (result merger)
Learning-based approaches: It learns the knowledge regarding which databases are likely to return useful pages to what types of queries from past retrieval experiences.
Document selector: It determines what pages to retrieve from the document database of the search engine .It retrieves as many potentially useful pages as possible, and as few useless pages as possible .
Result merger: It combines the results into a single ranked list .It ranks all returned pages in descending order of their desirability.
Result extractor: URLs of retrieved pages are correctly extracted from the HTML file of each result page. Since different search engines use different ways to organize their result, a separate result extractor needs to be created for each local search engine.
Query dispatcher: It established a connection with the server of the search engine and passes the query to it. HTTP is used for the connection and data transfer.
Advantages & Disadvantages
Query can be run across multiple search engines
User needs to learn only the search interface of the meta search tool
Better results: retrieves top-ranking pages from individual search engines
Unique features of individual search engines is lost
Not exhaustive: use only top results returned by search engines