Search Engine Spiders

Search engine spiders are small, fairly simple programs that search engines run in order to collect data to index. They are sometimes called robots. They just come out to your page, capture certain data, then return with that data to the search engine they came from. They do not in any way help determine what your page rankings are–a page can have been spidered but not yet indexed. Until it is indexed and processed, it is not available to be searched.

Spiders see the internet as a world of URLs, or Uniform Resource Locators. They don’t see images (or at least not that well), they don’t see audio or video files, they can barely see into flash files.  Their one sole job is to crawl content, capture certain information, record links and take it all back for processing. They are not terribly smart, cannot think, and they don’t get context.

Most search engine spiders are rarely updated with new features since their functions don’t really change all that much.

Ways to design for Search Engine Spiders

  • Focus on content freshness: update frequently
  • You do not need to include a robots revisit tag in your HTML
  • Focus on the content earlier in the page. Spiders only index the first 1000 bytes of data or so. Search engines also consider content that shows up earlier in content more important than that which comes later.
  • Put your content on top: make sure your css and JavaScript are not what the spider runs into first. Better to keep those in their own external files.
  • Make sure your site has a good site map and that each page of your site has a link to it. You have no idea how a spider will enter your site, so be sure you do what you can so it can crawl your entire site.
  • Be sure to tell robots what directories NOT to search with a robots.txt file
  • Try to design your site with as few layers as possible. No splash pages!

Leave a Reply

Your email address will not be published. Required fields are marked *