If there isn’t a spiderable link to a page, it is invisible to the spiders.
Common Reasons Pages Aren’t Reachable
- Links in submission-required forms. Spiders will not submit forms—any content accessible only via a form are invisible to the engines. This includes user logins, search boxes, and some types of pull-down lists.
- Links in non-parsable JavaScript. If you use JS for links, search engines will either not crawl them or may give them very little weight.
- Links in Powerpoint or PDF files. Sometimes engines report links in these, but we don’t know how much weight they are given.
- Links in Flash, Java, or other plugins. Depending on the type of text in a Flash file, search engines may be able to read it. This is too risky for reliable results.
- Links pointing to pages blocked by the meta robots tag, rel=”nofollow”, or robots.txt file. Use of the NoFollow attribute on a link, or placement of the meta robots tag on the page containing the link works to tell the spider not to credit the link with value.
- Links on pages with many hundreds or thousands of links. Google stops spidering at 100 links or so on a page.
- Links in frames or iframes. These can be crawled, but present structural issues for the engines in terms of organization and following.
XML Sitemaps
You can supply the search engines with a list of all the URLs you would like them to crawl and index. Adding a URL to a sitemap does not guarantee it will be crawled and indexed, but it can help the spider find pages otherwise inaccessible. XML Sitemaps are a complement to, not a replacement of. the normal link-based spidering.
Benefits of Sitemapping
- For pages they already know, they use the metadata–the last date content was modified, frequency page is changed–to improve how they spider your site.
- For pages they don’t know, they use the additional URLS to increase crawl coverage.
- For URLs that have duplicates, the search engines can use the XML data to help choose the canonical version.
- Verification/registration of XML sitemaps may indicate positive trust/authority signals.
- Second-order positive effects: improved rankinga, greater internal link popularity.
To create an XML sitemap: xml-sitemaps.com.
Upload your sitemap to the highest-level directory uyou want the search engines to crawl (usually the root directory). If you include URLs that are at a higher level than your sitemap, the search engines will be unable to include those URLS as a part of the sitemap submission.
You can monitor the results of adding your sitemap using Google Webmaster Tools. Update your sitemap with Bing and Google whenever you add pages to your site. Keep the file as up-to-date as possible.
Optimal Information Architecture
Search engines are trying to reproduce the human process of sorting relevant web pages by quality. They have to use easy-to-discern metrics to do this, like link measurement. They have analyzed every facet of the link structure of the web and have extraordinary abilities to infer trust, quality and authority.
Flat Site Architecture
Flat sites require a minimal number of clicks to get to any given page. For nearly every site with fewer than 10,000 pages, all content should be accessible through a maximum of 3 clicks from the homepage/sitemap.
Avoid Pagination Wherever Possible
- Provides no topical relevance.
- Content that moves into different pagination can create duplicate content issues.
- Can create spider traps and loads of extraneous, low quality pages.