It's easy to be fooled into thinking SEO is just about link building. There are so many posts covering the latest developments on what links are good or bad, that we sometimes forget about the huge gains we can make by simply fixing problems with our own site.
One of the biggest culprits for lost traffic and rankings is duplicate content. Luckily, you have control over your own site, so you have the power to fix it.
What Is Duplicate Content?
Duplicate content exists when there is more than one version of a page indexed by the search engines. Where there are multiple versions of a page indexed, it’s difficult for search engines to decide what page to show for a relevant search query.
Search engines aim to provide users with the best experience possible, which means they will rarely show duplicate pieces of content. Instead, they will be forced to choose what version they feel is the best fit for that query.
Causes of Duplicate Content
Three of the biggest offenders for causing duplicate content are:
1) URL Parameters
URLs can often contain additional parameters because of how they are being tracked (marketing campaign IDs, analytics IDs), or the CMS a website is using adds its own custom parameters.
For example, the following URLs could all lead to the same page:
2) Printer friendly pages
Often a web page will have an option to produce a printer friendly version of that page. This can often lead to duplicate content issues. For example, the following URLs would lead to the same page.
3) Session IDs
Sites may often want to track a user's session across their website. For example, sites can offer personalized features based upon who that user is and their past interactions with the site, or an ecommerce store may remember what that person added to their shopping cart on their last visit.
Session ids get appended to the URL and this causes duplicate versions of a page to exist. For example, the following URLs would lead to the same page.
Duplicate Content Problems
The biggest issues caused by duplicate content are:
- Search engines don’t know which version of the page they should index
- Search engines don’t know what page the link authority should be assigned to, or if it should be divided across multiple versions.
- Search engines don’t know what version of the page to rank for a relevant search query.
This can result in web pages losing both rankings and organic traffic.
Finding Duplicate Content
There are two tools you can use to find duplicate content problems for your site: Google Webmaster Tools and Screaming Frog.
1) Google Webmaster Tools
Using Google Webmaster Tools you can easily find pages with both duplicate titles and meta descriptions. You simply click on “HTML Improvements” under “Search Appearance”.
Clicking on one of these links will show you what pages have duplicate meta descriptions and page titles.
2) Screaming Frog
You can download the screaming frog web crawler and use it to crawl 500 pages for free. This application lets you do a lot of different things, including finding duplicate content problems.
Page Titles/Meta Descriptions
You can find duplicate page titles by simply clicking on the tab “Page Titles” or “Meta Description” and filtering for “Duplicate.”
You can also find pages that have multiple URL versions by simply clicking on the “URL” tab and sorting by “Duplicate."
For a complete guide on all the different things you can do with Screaming Frog, check out this post from SeerInteractive.
Fixing Duplicate Content
Duplicate content is a problem that can impact both your organic traffic and web rankings, but it’s something that you can easily fix. The three quickest ways to address duplicate content problems are:
1) Canonical Tag
Using the canonical tag you can tell search engines what version of a page you want to return for relevant search queries. The canonical tag is found in the header of a web page.
The canonical tag is the best approach when you want to have multiple versions of a page available to users. If you're using the HubSpot COS, this will be taken care of automatically, so no manual labor required.
2) 301 Redirect
A 301 redirect will redirect all legacy pages to a new URL. It tells Google to pass all the link authority from these pages to the new URL and to rank that URL for relevant search queries.
The 301 redirect is the best option when you don’t have any need for multiple versions of a page to be available.
3) Meta Tags
You can use meta tags to tell search engines not to index a particular page.
<Meta Name=”Robots” Content=”noindex, nofollow”>
Meta tags work best when you want that page to be available to the user but not indexed, e.g. terms and conditions.
Duplicate content is a real problem for sites, but one that can be easily solved using the advice above. If you want to learn more about duplicate content, watch this video series from the SEO experts at Dejan SEO on how you can fix it for your site.