There are way to many search engines like Yahoo!, Google, Bing, Ask etc. and there are also themed search engines like for example some are related to medicine, business,etc.
What I want to learn
- How do they work?
- How do they gather information?
Links
- http://www.searchengineguide.com/gallianno-cosme/how-do-search-engines-work.php
- http://www.scientificamerican.com/article.cfm?id=how-do-internet-search-en
- http://www.infotoday.com/searcher/may01/liddy.htm
What I learned
- Major elements of search engines: Crawlers/Spiders, Index & Software.
- Steps:
- Steps 1-3: Preprocessing
- Step 4: Identify elements to index
- Step 5: Deleting stop words
- Step 6: Term Stemming
- Step 7: Extract index entries
- Step 8: Term weight assignment
- Step 9: Create index
- Crawlers read pages and follows the links on them.
- Crawlers: Googlebot (Google), Slurp (Yahoo!), MSNBot (MSN Search)
- Crawlers go through all the webpages on the big index (database)
- Website owners can avoid having their websites being crawled daily by using a robot.txt.file.
- A software program called crawler is used to collect web pages. It retrieves extract words to be later stored with their corresponding website link in a index file.
- The users' inquiries are matched with the index file.
- To reduce search time, a data structured called tree is used. It duplicates the index list on computers in the search engine.
- Search engines use ranking methods like term-frequency-inverse document-frequency (TFIDF) to prioritize the results of a user's search.
- Another method is the link analysis, which is the Google uses. This method consists identifying if a webpage is an authority or a hub.
- To be able to hold billions of webpages and have access to them in matter of seconds, search engines construct data centers all over the world.
Summary
I have to confess, I am a daily Google user. I have always found it amusing how fast search engines give us precise results, but I never really researched or ever wonder how complex their process is. It's nice to find out more about something you may use/see everyday without thinking it has a big story behind.
Questions
- Which is the best search engine? (or is it Google I suppose?)





