Search engines are programs that search documents for
specified keywords and return a list
of the documents where the keywords were found. A search engine is really
a general class of programs; however, the term is often used to specifically
describe systems like Google, Bing and Yahoo! Search that enable users to
search for documents on the World Wide
Web.
Web Search Engines
Typically, Web search engines
work by sending out a spider to
fetch as many documents as possible. Another program, called an indexer, then
reads these documents and creates an index
based on the words contained in each document. Each search engine uses a proprietary algorithm to create its
indices such that, ideally, only meaningful results are returned for each query.
HOW DO WEB SEARCH ENGINES WORK?
Search engines are the key to finding specific information on the
vast expanse of the World Wide Web. Without
sophisticated search engines, it would be virtually impossible to locate
anything on the Web without knowing a specific URL. But do you know how search engines work? And do you know what
makes some search engines more effective than others?
When people use the term search
engine in relation to the Web, they are usually referring to the actual search
forms that search through databases of HTML
documents, initially gathered by a robot.
There are basically three types
of search engines: Those that are powered by robots (called crawlers; ants or spiders) and those that are powered by
human submissions; and those that are a hybrid of the two.
Sponsored
Crawler-based search engines are
those that use automated software
agents (called crawlers) that visit a Web site, read the information on the
actual site, read the site's meta tags
and also follow the links that the site connects to performing indexing on all
linked Web sites as well. The crawler returns all that information back to a
central depository, where the data is indexed. The crawler will periodically
return to the sites to check for any information that has changed. The
frequency with which this happens is determined by the administrators of the
search engine.
Human-powered search engines rely on humans to submit
information that is subsequently indexed and catalogued. Only information that
is submitted is put into the index.
In both cases, when you query a
search engine to locate information, you're actually searching through the
index that the search engine has created —you are not actually searching the
Web. These indices are giant databases
of information that is collected and stored and subsequently searched. This
explains why sometimes a search on a commercial search engine, such as Yahoo!
or Google, will return results that are, in fact, dead links. Since the search
results are based on the index, if the index hasn't been updated since a Web
page became invalid the search engine treats the page as still an active link
even though it no longer is. It will remain that way until the index is
updated.
So why will the same search on
different search engines produce different results? Part of the answer to that
question is because not all indices are going to be exactly the same. It
depends on what the spiders find or what the humans submitted. But more
important, not every search engine uses the same algorithm to search through the indices. The algorithm is what the
search engines use to determine the relevance of the information in the index
to what the user is searching for.
One of the elements that a search
engine algorithm scans for is the frequency and location of keywords on a Web
page. Those with higher frequency are typically considered more relevant. But
search engine technology is becoming sophisticated in its attempt to discourage
what is known as keyword stuffing, or spamdexing.
Another common element that
algorithms analyze is the way that pages link to other pages in the Web. By
analyzing how pages link to each other, an engine can both determine what a
page is about (if the keywords of the linked pages are similar to the keywords
on the original page) and whether that page is considered "important"
and deserving of a boost in ranking. Just as the technology is becoming
increasingly sophisticated to ignore keyword stuffing, it is also becoming more
savvy to Web masters who build artificial links into their sites in order to
build an artificial ranking.
Did you know?
The first tool for searching the
Internet, created in 1990, was called "Archie". It downloaded
directory listings of all files located on public anonymous FTP servers;
creating a searchable database of filenames. A year later "Gopher"
was created. It indexed plain text documents. "Veronica" and
"Jughead" came along to search Gopher's index systems. The first
actual Web search engine was developed by Matthew Gray in 1993 and was called
"Wandex". [Source]
KEY TERMS TO
UNDERSTANDING WEB SEARCH ENGINES
Spider traps
A condition of dynamic Web sites
in which a search engine’s spider becomes trapped in an endless loop of code
Search engine
A program that searches documents
for specified keywords and returns a list of the documents where the keywords
were found
A special HTML
tag that provides information about a Web page
Deep link
A hyperlink either on a Web page
or in the results of a search engine query to a page on a Web site other than
the site’s home page.
Robot
A program that runs automatically without human intervention
A program that runs automatically without human intervention
WEB SEARCH ENGINES & DIRECTORIES
According to a 2007 report by Netcraft,
108,810,358 distinct Web sites make up the World Wide Web. When you want to
find out more about a specific topic, service or product, you use an Internet
search engine. Today there are a number of search engines, and while they work
differently, they all use Webcrawlers (also called bots) that are designed to
index pages on the Web and also words found on these pages. The indexing of the
Web enables is what enables users to search for keywords or combinations of
words to find information online. Other types of search engines are called
search directories. They site index content chosen by human editors, rather
than automated indexing done by bots. Today most search engines offer
complementary search-related products such as shopping search, news and other
services that go beyond the basic keyword search function. The following Quick
Reference provides an overview of some of the more popular public Web Search
Engines and Directories, including details on their history,
information on how they work and tips for using each.
According to a 2007 report by
Netcraft, 108,810,358 distinct Web sites make up the World Wide Web. When you
want to find out more about a specific topic, service or product, you use an
Internet search engine. Today there are a number of search engines, and while
they work differently, they all use Webcrawlers (also called bots) that are
designed to index pages on the Web and also words found on these pages. The
indexing of the Web enables is what enables users to search for keywords or
combinations of words to find information online.
Other types of search engines are
called search directories. They site index content chosen by human editors,
rather than automated indexing done by bots. Today most search engines offer
complementary search-related products such as shopping search, news and other
services that go beyond the basic keyword search function.
The following Quick Reference
provides an overview of some of the more popular public Web Search Engines and
Directories, including details on their history, information on how they work
and tips for using each.
BING
Bing is a new search engine from
Microsoft that was launched on May 28, 2009. Microsoft calls it a
"Decision Engine," because it's designed to return search results in
a format that organizes answers to address your needs. When you search on Bing,
in addition to providing relevant search results, the search engine also shows
a list of related searches on the left-hand side of the search engine results
page (SERP). You can also access a
quick link to see recent search history. Bing uses technology from a company
called Powerset, which Microsoft acquired.
Bing launched with several
features that are unique in the search market. For example, when you mouse-over
a Bing result a small pop-up provides additional information for that result,
including a contact e-mail address if available. The main search box
features suggestions as you type, and Bing's travel search is touted as being
the best on the net. Bing is expected to replace Microsoft Live Search.
BING SEARCH TIPS:
- You can search for feeds using feeds: before the query
- To search Bing without a background image use http://www.bing.com/?rb=0
- To turn the background image back on, use http://www.bing.com/?rb=1
- To change the number of search results returned per page, click "Extras" (on top-right of page) and select "Preferences". Under Web Settings / Results you can choose 10, 15, 30 or 50 results
Google
Today Google is the largest public
Internet search engine, in terms of indexed content and number of users.
Company founders Larry Page and Sergey Brin initially collaborated on a search
engine called BackRub that had the capability to analyze the back links
pointing to a given Web site. With financial backing, in 1998 the founders
opened Google. By 2000 Google was handling more than 100 million search queries
a day, and by 2004 Google claims the site index reached 4.28 billion Web pages.
Google's search engine crawler,
called the GoogleBot, travels from Web page to Web page following hyperlinks.
When a new page is found, Googlebot will also crawl all the hyperlinks on that
page as well. A second bot also crawls indexed pages to keep the index updated.
As pages are indexed, they are also given scores based on criteria like how
many times words are displayed (density), link popularity, HTML code, themes,
content (the text) and more. These scores are what determines where the Web
page listing appears in the search results.
Google Search Tips:
- You can search for a phrase by using quotations ["like this"] or with a minus sign between words [like-this].
- You can search by a date range by using two dots between the years [2004..2007].
- When searching with a question mark [?] at the end of your phrase, you will see sponsored Google Answer links, as well as definitions if available.
- Google searches are not case sensitive.
- By default Google will return results which include all of your search terms.
- Google automatically searches for variations of your term, with variants of the term shown in yellow highlight.
- Google lets you enter up to 32 words per search query.
- Google Search Page
Yahoo!
Founded in 1994 by David Filo and Jerry Yang, Yahoo was
initially a directory of Web sites categorized by human editors, a directory
for which it is still most well-known for today. Over time Yahoo began
acquiring search companies and combining the technologies. In 2004 Yahoo
acquired the Overture pay-per-click service (which had bought AltaVista and
AlltheWeb), as well as the Inktomi search database and others. These search
technologies and tools combined makes yahoo what it is today, making Yahoo's
infamous directory search secondary to its main search engine.
The Yahoo Search index is made up
of billions of Web pages, which are populated by a Web crawler. When Yahoo
crawls pages it takes several factors into consideration. The search terms
included in the page's Title and Description tag, page content (the text),
keyword density, inbound hyperlinks and so on Yahoo has also started using page
rank technologies and also takes Yahoo Directory listings and paid inclusions
into consideration when indexing and ranking pages. Users can submit their own pages
directly to the Yahoo Search index and the Yahoo Directory, and also submit
products for inclusion in Yahoo Shopping. Yahoo has also incorporated special
search and services such as Webmail, Local, video, images, shopping and news
search products.
Yahoo Search Tips:
- By default Yahoo returns results that include all of your search terms
- To exclude words use a minus sign [cat -tabby] would show all results about cats with no mention of tabby.
- Yahoo search results also shows related searches, which are based on other searches by users with similar terms
- To search for a map, use map [location]
- To search for dictionary definitions use "define" [define hard drive]
- To search a single domain use site [site: obasimvilla.com/forum DVD] would search Webopedia for the term DVD.
- Yahoo Search Page
Windows Live Search
Microsoft's search engine,
Windows Live Search offers a huge improvement over MSN Search and is also
integrated into Microsoft's Live.com. When it launched September 12, 2006, it
was a new search engine built from scratch using a new algorithmic engine that
was integrated throughout Windows Live and MSN. Some of the features of Windows
Live Search include a nice feature-rich interface — something new and unique in
the Web search space. By signing in to a personalized live search you can add
feeds and subscribe to search results. Windows Live Search also incorporates
specific searches for images, news, academic journals, RSS feeds, maps and
more.
Live Search technologies attempt
to overcome some elements of human error, such as spelling errors, punctuation
and synonyms and also predicts with the intent of providing the best search
results possible for users. To improve ranking in Microsoft Live Search you can
mark your front page as accessible to those with specialist settings on their
browser, keyword density is a factor, include an HTML site map and also use a
distinct list of keyword meta tags for each page on your site.
Windows Live Search Tips:
- Common words such as a, and, and the are ignored unless they're enclosed by quotation marks.
- Category lists may appear on the top of search results - you can click a category to see only the results associated with that category.
- If using a date in your search query, type the name of the month instead of the calendar number.
- To define something, use define followed by the word [define DVD] will show definitions for DVD.
You can enter up to 150 characters in the search box.
HOW SEARCH ENGINES RANK WEB PAGES
Search for anything using your
favorite crawler-based search engine. Nearly instantly, the search engine will
sort through the millions of pages it knows about and present you with ones
that match your topic. The matches will even be ranked, so that the most relevant
ones come first.
Of course, the search engines
don't always get it right. Non-relevant pages make it through, and sometimes it
may take a little more digging to find what you are looking for. But, by and
large, search engines do an amazing job.
As WebCrawler founder Brian
Pinkerton puts it, "Imagine walking up to a librarian and saying,
'travel.' They’re going to look at you with a blank face."
OK -- a librarian's not really
going to stare at you with a vacant expression. Instead, they're going to ask you
questions to better understand what you are looking for.
Unfortunately, search engines
don't have the ability to ask a few questions to focus your search, as a
librarian can. They also can't rely on judgment and past experience to rank web
pages, in the way humans can.
So, how do crawler-based search
engines go about determining relevancy, when confronted with hundreds of
millions of web pages to sort through? They follow a set of rules, known as an
algorithm. Exactly how a particular search engine's algorithm works is a
closely-kept trade secret. However, all major search engines follow the general
rules below.
LOCATION, LOCATION, LOCATION...AND
FREQUENCY
One of the the main rules in a
ranking algorithm involves the location and frequency of keywords on a web
page. Call it the location/frequency method, for short.
Remember the librarian mentioned
above? They need to find books to match your request of "travel," so
it makes sense that they first look at books with travel in the title. Search
engines operate the same way. Pages with the search terms appearing in the HTML
title tag are often assumed to be more relevant than others to the topic.
Search engines will also check to
see if the search keywords appear near the top of a web page, such as in the
headline or in the first few paragraphs of text. They assume that any page
relevant to the topic will mention those words right from the beginning.
Frequency is the other major
factor in how search engines determine relevancy. A search engine will analyze
how often keywords appear in relation to other words in a web page. Those with
a higher frequency are often deemed more relevant than other web pages.
Spice in the Recipe
Now it's time to qualify the
location/frequency method described above. All the major search engines follow
it to some degree, in the same way cooks may follow a standard chili recipe.
But cooks like to add their own secret ingredients. In the same way, search
engines add spice to the location/frequency method. Nobody does it exactly the
same, which is one reason why the same search on different search engines
produces different results.
To begin with, some search
engines index more web pages than others. Some search engines also index web
pages more often than others. The result is that no search engine has the exact
same collection of web pages to search through. That naturally produces
differences, when comparing their results.
Search engines may also penalize
pages or exclude them from the index, if they detect search engine
"spamming." An example is when a word is repeated hundreds of times
on a page, to increase the frequency and propel the page higher in the
listings. Search engines watch for common spamming methods in a variety of
ways, including following up on complaints from their users.
OFF THE PAGE FACTORS
Crawler-based search engines have
plenty of experience now with webmasters who constantly rewrite their web pages
in an attempt to gain better rankings. Some sophisticated webmasters may even
go to great lengths to "reverse engineer" the location/frequency
systems used by a particular search engine. Because of this, all major search
engines now also make use of "off the page" ranking criteria.
Off the page factors are those
that a webmasters cannot easily influence. Chief among these is link analysis.
By analyzing how pages link to each other, a search engine can both determine
what a page is about and whether that page is deemed to be
"important" and thus deserving of a ranking boost. In addition,
sophisticated techniques are used to screen out attempts by webmasters to build
"artificial" links designed to boost their rankings.
Another off the page factor is
click through measurement. In short, this means that a search engine may watch
what results someone selects for a particular search, and then eventually drop
high-ranking pages that aren't attracting clicks, while promoting lower-ranking
pages that do pull in visitors. As with link analysis, systems are used to
compensate for artificial links generated by eager webmasters.
Learning More
The Search Engine Features Chart
has a section that summarizes key areas of how crawler-based search engines
rank web pages. The Search
Engine Placement Tips page also summarizes key tips that will help you
improve the relevancy of your pages with crawler-based search engines.
Search Engine Watch members have access to the How Search Engines Work
section. This section provides detailed information about how each major search
engine gathers its listings and an additional tip on enhancing your position in
their results. Learn more about becoming a Search Engine Watch member and the
many benefits members receive by visiting the Membership page.
2 Comments
I'm looking to start my own blog in the near future but I'm having a hard time selecting between BlogEngine/Wordpress/B2evolution and
Drupal. The reason I ask is because your design and style seems different then most blogs and
I'm looking for something completely unique.
P.S My apologies for being off-topic but I had to ask!
Here is my page :: Houston SEO Service
Be the first to comment!
Don't just read and walk away, Your Feedback Is Always Appreciated. I will always reply to your queries.
Regards:
Noble J Ozogbuda
Back To Home