It may not be that obvious for the general Internet users, but it’s quite common for websites to make same content accessible via different URLs. The search engines refer to this as duplicate content issues, and they don’t like it – it’s the same content that appears in more than one place on the Internet so it’s understandable.
There are different types of duplicate content issues. For example when someone steals your web copy or articles and publishes them on another website, he/she is creating duplicates. You can fight these content stealers using website duplicate content checker PlagSpotter to automatically scan and also closely monitor your pages in the future. But here we’ll talk about a different type of duplicate content – the duplicate content that’s on your own website, excluding ecommerce websites which are pretty specific when considering these issues.
Duplicate Content and Canonical URLs
As per Google Webmasters guidelines:
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.
And here’s what they say on canonicalization:
To gain more control over how your URLs appear in search results, and to consolidate properties, such as link popularity, we recommend that you pick a canonical (preferred) URL as the preferred version of the page. You can indicate your preference to Google in a number of ways. We recommend them all, though none of them is required (if you don’t indicate a canonical URL, we’ll identify what we think is the best version).
So How This All Applies To Your Website?
Your website has potential duplicate content issues if your pages are accessible through different URLs. If you can access the About Page to your website from http://yourwebsite.com/AboutPage, http://yourwebsite.com/aboutpage, http://yourwebsite.com/aboutpage.html etc. You can implement 301 redirects to switch users to the version you prefer, or tell Google which version to index and prevent the issues from affecting your whole domain.
Google cannot know how do you prefer your URLs if you don’t provide them with the information; do you want the links to include the ‘www’ part, or a slash at the end, the name of the file, uppercase or lowercase letters, etc.
Sometimes the duplicate content problems also happen because our CMS creates different dynamic URLs and adds them to the initial ones. If the search engines aren’t told differently they will index them as individual pages as though they actually are not.
How To Find Duplicate Content On Your Website?
You can easily find duplicate content on your website, and here are few ways to do so:
- To check for duplicate content issues on a domain level you can use the free tool Redirect Check, or you can Google search a specific page on your site. If the results show more than one link you should look into them to find out why. To check how a certain type of file is indexed by the search engines you can type in site:yourwebsite.com filetype: pdf (or any other you want to look up).
- Check your sitemap to see if you are telling search engines to index multiple URLs for the same pages on your website. You should only include pages you want search engines to know about in your sitemap. But due to issues with dynamic URL parameters and preferred domains they often index more web pages. You can do a quick check by simply Googling site: yourwebsite.com inurl: the dynamic URL and see if Google has indexed it.
- Examine the URLs that have already sent organic traffic to your website. Check where your visitors come from logging into your Analytics account and write down all the URLs you see but don’t want the engines to index to fix them later.
Recommended: Is Your SEO Strategy Ready for 2013?
Free Tools To Check For Duplicate Content
To start solving the problem you must first understand it; to find out the exact duplicate content issues your website is facing you should get a list of all the pages that Google has indexed on your website. You can either use The Screaming Frog SEO Spider tool, or follow these instructions:
- Download and enable the SEO Quake browser plugin.
- Turn off Instant results in Google Preferences and set the Results per page to 100.
- Search site:yourwebsite.com on Google. The results will only show pages from your website.
- Check the information and icons under the search box and click Show as CSV or Save to download all indexed web pages.
- Open the file in Excel and sort out the URLs you want and don’t want in the search results.
Now, Let’s Fix Duplicate Content Issues.
You should first decide what domain URL you prefer and set it in the Webmaster Tools. Then you should redirect all other versions to the preferred one with 301 redirect pages, or if you aren’t certain ask your hosting provider to assist you.
To remove the web pages you previously selected in your Excel sheet log into your Google Webmaster Tools account and go to Optimization/Remove URLs. Enter the URLs you want to remove, select “Remove page from a search result and cache” and submit the request. Repeat this for all web pages you don’t want to see in the search results. On this page you can also track the status of each removal request to know when the submitted URLs have been deleted. Usually it is done within 12 hours.
You can set Google to continue ignoring certain dynamic pages using parameter handling. Check the URL Parameters Google has already picked up for you and edit the options where needed. In addition to this, also implement canonicalization on your website to make sure Google understands which one is the correct version.