Friday, November 30, 2012

How Do Search Engines Work

What I know
 There are way to many search engines like Yahoo!, Google, Bing, Ask etc. and there are also themed search engines like for example some are related to medicine, business,etc.

What I want to learn

  • How do they work?
  • How do they gather information?
Links
  • http://www.searchengineguide.com/gallianno-cosme/how-do-search-engines-work.php
  • http://www.scientificamerican.com/article.cfm?id=how-do-internet-search-en
  • http://www.infotoday.com/searcher/may01/liddy.htm
What I learned
  • Major elements of search engines: Crawlers/Spiders, Index & Software.
  • Steps:
    • Steps 1-3: Preprocessing 
    • Step 4: Identify elements to index
    • Step 5: Deleting stop words
    • Step 6: Term Stemming
    • Step 7: Extract index entries
    • Step 8: Term weight assignment
    • Step 9: Create index
  • Crawlers read pages and follows the links on them.
  • Crawlers: Googlebot (Google), Slurp (Yahoo!),  MSNBot (MSN Search)
  • Crawlers go through all the webpages on the big index (database)
  • Website owners can avoid having their websites being crawled daily by using a robot.txt.file.
  • A software program called crawler is used to collect web pages. It retrieves extract words to be later stored with their corresponding website link in a index file.
  • The users' inquiries are matched with the index file.
  • To reduce search time, a data structured called tree is used. It duplicates the index list on computers in the search engine.
  • Search engines use ranking methods like term-frequency-inverse document-frequency (TFIDF) to prioritize the results of a user's search.
  • Another method is the link analysis, which is the Google uses. This method consists identifying if a webpage is an authority or a hub.
  • To be able to hold billions of webpages and have access to them in matter of seconds, search engines construct data centers all over the world.
Summary
 I have to confess, I am a daily Google user. I have always found it amusing how fast search engines give us precise results, but I never really researched or ever wonder how complex their process is. It's nice to find out more about something you may use/see everyday without thinking it has a big story behind. 

Questions
  1. Which is the best search engine?  (or is it Google I suppose?)


No comments:

Post a Comment