Inbound Internet Marketing Blog

SEO, Blogging, Social Media, Landing Pages, Lead Generation and Analytics

SUBSCRIBE

The HubSpot Inbound Internet Marketing blog covers all of inbound marketing - SEO, blogging, social media, lead generation, email marketing, lead nurturing & management, and analytics. Join 57,702 others and subscribe now!

Subscribe to RSS feed Add us on Facebook! Follow us on Twitter

Get Free Marketing Info!

Get the world's best marketing resources right to your inbox! Join more than 817,000 inbound marketers!

Subscribe by email

Your email:

HubSpot's Inbound Internet Marketing Blog

Current Articles | RSS Feed RSS Feed

5 Advanced Web Master Technical Setup Tips - Part 1

 

.

This is the first in a two part series on advanced web master setup. Producing compelling content is only half the battle. Making it easy for Google to crawl and index your content is equally important. In today's post we will look at five technical best practices for improving the crawlability of your web site.

1. Custom 404 Page

A 404 error page is served when a user or a search engine attempts to view a non-existent page on your website. Often caused by a broken link or a URL that has been moved or deleted.

Broken links hinder Google's ability to crawl your site. Creating a custom 404 error page ensures Google continues consuming content on your website after viewing a broken link.

404 Error Page Best Practices:

  • Include your sites navigation and a link to your site map.
  • Put a message on the page that informs the user they have landed on a page that no longer exists.
  • Do you use Google Analtyics? If so, you can track where the referring broken link is located on your own site, and fix it.
  • Here is an example of 404 error page from SEOmoz.

2. XML Sitemaps

An XML site map ensures Google has access to all of your web pages. XML site maps do not guarantee that your site will be crawled, indexed and rank, they are simply a first step towards making it easy for Google to see your most recent content. Also, you can assign authority to your pages and directories on a scale from 0-1, with 1 being your homepage and directories .8, .6, etc.

Getting Started: Setup a Google Webmaster Tools account. If you already have an account, go create a sitemap. For more information of setting up XML sitemaps visit XML-Sitemaps.com.

3. 301 Redirects/REL canonicals

When you update a URL, remove a page or change the location of a page you need to let Google know the new location of your content. There are two kinds of 301's to watch out for. Let's review the differences between the two types of redirects.

URL Canonicalization - Tells Google multiple pages on your site should be considered as one page (just like a 301 redirect), without redirecting to a new URL. This is common if you have two or more pages of content that are similar or identical and you only want one version of the content to appear in search.

301 redirects - Instruct the search engine to stop indexing a specific URL and replace it with a new url. A 301 redirect points all traffic (engines & human visitors) to the new page and passes the majority of link value from the old page to the new. If you remove or change the name of a URL, be sure to implement 301 permanent redirects

Google offers a great Rel canonical resource where you can learn more. Detailed instructions on setting up 301 redirects.

4. Robots.txt

Search engines look at the root domain of your website for a file called "robots.txt". This file tells the search engine the files on your site it does not have access to. Setting up a Robots.txt file is an important step if your site contains sensitive or confidential information you do not want indexed by Google.

The Robots.txt file lives at the top level directory on your site, for example: http://wwww.mysite.com/robots.txt

A Robots.txt file consists of two fields - a User-agent line and one or more of the Disallow lines. If a search engine robot wanted to crawl the page: http://www.example.com/welcome.html, before it visits the page, it checks for http://www.example.com/robots.txt and finds:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any of the files listed. For more information, visit the Robots.txt organization web site.

5. HTML/CSS Organization

Spiders crawl the code of your site from the top down, placing your meta data and page copy at the top helps Google to find this content faster. JavaScript, Cascading Style Sheets (CSS) and other non HTML languages are commonly used to design websites. These non HTML languages can block Google from accessing your content.

To ensure Google can find your content, externally reference all JavaScript and externally reference all style information via CSS. Here are some free tools to help you validate the code on your web site:

CSS validator
HTML validator

CSS Compressor

This article is written by Shaun Pinney, a member of our consultant team at HubSpot. Check out Shaun's Bio.

Free Webinar: Website Redesign for 2010

website redesign webinarLearn how to redesign your website with an internet marketing strategy in mind with Mike Volpe, HubSpot's VP of Marketing.

Download the Webinar Nowand learn how to turn your website into an internet marketing machine.

Posted by Shaun Pinney on Mon, Apr 12, 2010 @ 12:00 PM

COMMENTS

Here's a few helpful links: 
 
Setting up Google Analytics to track the referring URL of the broken link 
 
Several robots.txt file examples

posted on Monday, April 12, 2010 at 12:08 PM by Ben Griffiths


Here's a few helpful links:  
 
Setting up Google Analytics to track the referring URL of the broken link 
 
Several robots.txt file examples

posted on Monday, April 12, 2010 at 12:09 PM by Ben Griffiths


The explanation on the google webmaster blog was a little vague. If I were to create a REL canonical for two, identical pages, would google combine the relevance and authority of the two pages into a, “superpage,” rather than having them each lower in the rankings separately?

posted on Monday, April 12, 2010 at 12:23 PM by Jamie Contonio


If a page that I don't want a bot to visit is password protected, so I still need to list it in the robots.txt file?

posted on Monday, April 12, 2010 at 1:08 PM by EM


Great post. Certainly reminded me to tidy up a few things and some really useful links. Could you change the HTML Validator link please as this seems to incorrectly link to the css compressor page, Thanks, Simon.

posted on Monday, April 12, 2010 at 2:59 PM by Simon


Great post. I'm glad to see more people putting emphasis on 301 redirects and using Google Webmaster Tools. Check analytics and Google's webmaster tools often for crawl errors and fix them.

posted on Thursday, April 15, 2010 at 1:40 PM by Gregory Feathers


Comments have been closed for this article.