.
This is the first in a two part series on advanced web master setup. Producing compelling content is only half the battle. Making it easy for Google to crawl and index your content is equally important. In today's post we will look at five technical best practices for improving the crawlability of your web site.
1. Custom 404 Page
A 404 error page is served when a user or a search engine attempts to view a non-existent page on your website. Often caused by a broken link or a URL that has been moved or deleted.
Broken links hinder Google's ability to crawl your site. Creating a custom 404 error page ensures Google continues consuming content on your website after viewing a broken link.
404 Error Page Best Practices:
- Include your sites navigation and a link to your site map.
- Put a message on the page that informs the user they have landed on a page that no longer exists.
- Do you use Google Analtyics? If so, you can track where the referring broken link is located on your own site, and fix it.
- Here is an example of 404 error page from SEOmoz.
2. XML Sitemaps
An XML site map ensures Google has access to all of your web pages. XML site maps do not guarantee that your site will be crawled, indexed and rank, they are simply a first step towards making it easy for Google to see your most recent content. Also, you can assign authority to your pages and directories on a scale from 0-1, with 1 being your homepage and directories .8, .6, etc.
Getting Started: Setup a Google Webmaster Tools account. If you already have an account, go create a sitemap. For more information of setting up XML sitemaps visit XML-Sitemaps.com.
3. 301 Redirects/REL canonicals
When you update a URL, remove a page or change the location of a page you need to let Google know the new location of your content. There are two kinds of 301's to watch out for. Let's review the differences between the two types of redirects.
URL Canonicalization - Tells Google multiple pages on your site should be considered as one page (just like a 301 redirect), without redirecting to a new URL. This is common if you have two or more pages of content that are similar or identical and you only want one version of the content to appear in search.
301 redirects - Instruct the search engine to stop indexing a specific URL and replace it with a new url. A 301 redirect points all traffic (engines & human visitors) to the new page and passes the majority of link value from the old page to the new. If you remove or change the name of a URL, be sure to implement 301 permanent redirects
Google offers a great Rel canonical resource where you can learn more. Detailed instructions on setting up 301 redirects.
4. Robots.txt
Search engines look at the root domain of your website for a file called "robots.txt". This file tells the search engine the files on your site it does not have access to. Setting up a Robots.txt file is an important step if your site contains sensitive or confidential information you do not want indexed by Google.
The Robots.txt file lives at the top level directory on your site, for example: http://wwww.mysite.com/robots.txt
A Robots.txt file consists of two fields - a User-agent line and one or more of the Disallow lines. If a search engine robot wanted to crawl the page: http://www.example.com/welcome.html, before it visits the page, it checks for http://www.example.com/robots.txt and finds:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any of the files listed. For more information, visit the Robots.txt organization web site.
5. HTML/CSS Organization
Spiders crawl the code of your site from the top down, placing your meta data and page copy at the top helps Google to find this content faster. JavaScript, Cascading Style Sheets (CSS) and other non HTML languages are commonly used to design websites. These non HTML languages can block Google from accessing your content.
To ensure Google can find your content, externally reference all JavaScript and externally reference all style information via CSS. Here are some free tools to help you validate the code on your web site:
CSS validator
HTML validator
CSS Compressor
This article is written by Shaun Pinney, a member of our consultant team at HubSpot. Check out
Shaun's Bio.
 | Learn how to redesign your website with an internet marketing strategy in mind with Mike Volpe, HubSpot's VP of Marketing.
Download the Webinar Nowand learn how to turn your website into an internet marketing machine. |