Context Oriented Search Engine with Web Crawler
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
project topics
Active In SP
**

Posts: 2,492
Joined: Mar 2010
#1
22-04-2010, 12:32 AM


Search Engine with Web Crawler

A web crawler (also known as a Web spider or Web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner.

This process is called Web crawling or spidering. Search engines use spidering as a means of providing up-to-date data. Web crawlers Download and will index web pages to provide fast searches.

A Web crawler starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

There are two important characteristics of the Web that generate a scenario in which Web crawling is very difficult: its large volume and its rate of change, as there are a huge number of pages being added, changed and removed every day. Also, network speed has improved less than current processing speeds and storage capacities.

The large volume implies that the crawler can only download a fraction of the Web pages within a given time, so it needs to prioritize its downloads. The high rate of change implies that by the time the crawler is downloading the last pages from a site, it is very likely that new pages have been added to the site, or that pages have already been updated or even deleted.

The behavior of a Web crawler is the outcome of a combination of policies:

selection policy that states which pages to download.
re-visit policy that states when to check for changes to the pages.
politeness policy that states how to avoid overloading websites.
parallelization policy that states how to coordinate distributed web crawlers.
Implementation

In this search engine project and implimentation the webcrawler will start with some seeds and Will select the pages using some filters and policies. For Example If we are to create a blog Search Engine the crawler will be programmed to download blog related pages only.

The crawler can be developed using a simple java program. The program will download and index the pages in a database for faster searching.

The search in can be developed using JSP/Servlet and Ajax. The Search engine will accept the search keywords and will search the database for the keywords using some search algorithms. Most relevant results will show as list using paging in basis of merit.
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page

Quick Reply
Message
Type your reply to this message here.


Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  WCPS Web Based Claims Processing System A PROJECT REPORT study tips 1 495 05-04-2016, 12:33 PM
Last Post: mkaasees
  WEB BASED CLAIM PROCESSING SYSTEM REPORT study tips 1 677 05-04-2016, 12:32 PM
Last Post: mkaasees
  Web Traffic Analyzer mechanical wiki 3 1,995 01-07-2015, 02:51 PM
Last Post: Guest
  Online Rental House Web Portal smart paper boy 5 4,024 30-01-2015, 12:29 PM
Last Post: bharath9530
  Learn to Personalized Image Search from the Photo Sharing Websites seminar flower 7 4,204 15-11-2014, 08:36 AM
Last Post: Guest
  Developing a web application to transfer image and patient information project report maker 2 4,977 17-03-2014, 03:03 PM
Last Post: MichaelKa
  Publishing Search Logs - A Comparative Study of Privacy Guarantees – JAVA/J2EE seminar flower 2 773 07-03-2014, 04:38 PM
Last Post: seminar project topic
  Bootstrapping Ontologies for Web Services seminar flower 1 815 17-10-2013, 11:37 AM
Last Post: raju26989
  Web Based Blood Bank Management System project report maker 5 9,671 03-10-2013, 01:22 AM
Last Post: Guest
  PERSONALISED SEARCH ENGINE WITH DYNAMIC UPDATION pdf seminar projects maker 0 312 28-09-2013, 12:28 PM
Last Post: seminar projects maker