Between 75% and 98.8% of visitors to
Web sites come from searches made at search engines. If
you're going to get high levels of traffic - and hence the
levels of ROI you're looking for - it's very important
that the search engines can access all the information on
your Web site.
Do the search engines know about all of your pages?
You can find out which pages on your site the search
engines know about by using a special search. If you
search for 'site:' and your Web site address, the search
engine will tell you all of the pages on your Web site it
For example, search for: site:webpositioningcentre.co.uk
in Google. Yahoo or MSN Search, and it will tell you how
many pages they know about.
If the search engines haven't found some of the pages on
your Web site, it is probably because they are having
trouble spidering them. ('Spidering' is when the search
engine uses an automated robot to read your Web pages.)
Spiders work by starting off on a page which has been
linked to by another Web site, or that has been submitted
to the search engine. They then read and follow any links
they find on the page, gradually working their way through
your whole Web site.
At least, that's the theory.
The problem is, it's easy to confuse the spiders -
especially as they are designed to be wary of following
certain kinds of link.
Links which confuse spiders
the spider may not be able to find them, and will not be
able to follow the links to your other pages.
This can happen if you have 'rollovers' as your navigation
- for instance, pictures that change color or appearance
when you hover your mouse pointer over them. The
enough for the spiders to ignore it rather than try to
find links inside.
If you think your rollovers are blocking your site from
being spidered, you will need to talk to your Web
designers about changing the code in to a 'clean link' - a
standard HTML link, with no extra code around it - that is
much easier for the spiders to follow.
Links like these will look something like this:
<a href="index.html">Home Page</a>
Page addresses to avoid
Spiders will also ignore pages if they don't like the URL
(the address needed to find the page).
For example, a Web site that has URLs containing several
variables can cause spiders to ignore the page content.
You can spot pages like these as they have a ? in them,
and &, for instance:
This URL has three variables, the parts with the = in
them, between the ? and &s. We find that if a page has
one variable, or even two, the top search engines will
spider them without any problems. But if a URL has more
than that, often the search engines will not spider them.
Spiders particularly avoid URLs that look like they have
'session IDs' in them. They look something like this:
The set of numbers and letters do not make much sense to
humans, but some Web sites use them to keep track of who
you are, as you click through their Web site.
Spiders will generally avoid URLs with Session IDs in
them, so if your Web site has them, you need to talk to
the people who developed the site about re-writing it so
they do not use these IDs, or at least that you can get
around the Web site without them.
Clean links = happy spiders
If you use clean, easy to follow links without several
variables in them, your Web site should be spidered
without problem. There are, of course, many other facets
to successful Search Engine Optimization, but if the
search engines can't spider your content, your site will
fall at the first hurdle.