Crawling, Indexing, and Ranking
The search engines have several major goals and functions. Which includes:
•Crawling and indexing the billions of documents (pages and files) accessible on the Web.
•Responding to user queries by providing lists of relevant pages.
Crawling and Indexing
•Imagine the World Wide Web as a network of stops in a big city subway system.
•Each stop is its own unique document (usually a web page, but sometimes a PDF, JPEG, or other file).
•The search engines need a way to “crawl” the entire city and find all the stops along the way, so they use the best path available: the links between web pages.
•The link structure of the Web serves to bind together all of the pages that were made public as a result of someone linking to them. Through links, search engines’ automated robots, called crawlers or spiders (as displayed in above figure), can reach the many billions of interconnected documents.
•Once the engines find these pages, their next job is to parse the code from them and store selected pieces of the pages in massive arrays of hard drives, to be recalled when needed in a query. To accomplish the monumental task of holding billions of pages that can be accessed in a fraction of a second, the search engines have constructed massive data centres to deal with all this data.