Even though Google does not penalize websites for duplicate content, it does affect indexing and SERP rankings. For many sites, an essential part of search engine optimization involves avoiding this issue.
If you’re operating a blog, it’s usually not a big deal. There’s at least one WordPress plugin that automatically adds the canonical to your blog post.
However, e-commerce websites are a different story. They need to be vigilant about duplicate content, especially those using faceted or filtered navigation. Because search engines crawl URLs and not pages, you can end up with situations like this:
They’re the same page with basically the same content but different URLs.
To remove duplicate content issues, Google highly recommends webmasters to use canonical tags, aka “rel canonical.” However, its implementation is not that simple. If the tag has errors when it’s added to your site’s backend, it can hamper the crawling of your website. Therefore, you must learn every intricate detail about canonical tags and their correct implementation.
In this post, I’ll cover everything that you must know about canonical tags. Also, I’ll help you learn how to implement a canonical tag to your site’s backend correctly.
What is a Canonical Tag?
A Canonical Tag is an HTML command that is used to prevent duplicate content issues. The purpose of this tag is to inform search engines that the current page is a duplicate of some other page (original). It also tells the crawler to transfer all the link equity to the specified original page.
A Canonical Tag can be regarded as a reference to the content’s source. This tag will help search engines differentiate the original page from its duplicate. And based on this differentiation, search engines rank the original page in the SERPs and direct all the link equity received by the duplicate pages towards it.
How to implement a Canonical Tag?
There are five different ways to specify the canonical page:
- rel=canonical tag element (for HTML pages)
- rel=canonical HTTP header (for PDFs and other non-HTML pages)
- Sitemap (east to implement but less powerful than rel=canonical)
- 301 redirect (a permanent redirect used when deprecating a duplicate page)
- AMP variant (if one of the variant pages is an AMP page)
The canonical tag is pretty straightforward and the most popular method of implementation, so we’ll focus on technique.
Here’s a canonical code:
<link rel= “canonical” href= “link”>
You need to add this code to a given duplicate page and specify the authentic/original page by adding its link like this:
That’s how easy it is to implement a canonical tag. However, there are specific tips that you need to keep in mind while optimizing the code.
Tips to CORRECTLY implement a Canonical Tag
The process of applying a canonical tag can be easy, but to get the best out of it, you need to remember these tips:
1. Add the code to the <head> section
The canonical tag must be included in the <head> section of your duplicate page’s HTML code. This helps crawlers discover the original variant before utilizing the crawl budget on the given page. Also, this allows the link equity to transfer from the duplicate page to the canonicalized one.
2. Specify the correct domain version
If you have performed the switch from HTTP to HTTPS, you must change the URLs in your canonical tags. Otherwise, the search engine bot will get confused about the domain to rank; this can affect the crawling and indexing of your website. Wherever you have used ‘http’ in rel=canonical tag, you must replace it with ‘https’.
3. Limit canonical tags to ONE per page
You must include only one canonical tag per page. If a crawler discovers more than one canonical tag on a given page, then it will simply ignore them all.
Common mistakes when implementing a Canonical Tag
A canonical tag is an excellent way of telling the search engines that the given webpage is a duplicate of this page. And all the link equity received by the copy must be redirected to the linked authentic page.
However, there are certain mistakes that webmasters unintentionally commit when applying a canonical tag. It is crucial that you not perform these canonical tag misconfigurations:
1. ONLY include canonicalized URLs in the Sitemaps
When you are preparing a sitemap for your website, remember to add only the canonicalized URLs and not its duplicates. It is logical to do so because all those duplicate pages will direct the crawler to a specified canonicalized URL.
If you include both the copy of a page and its canonicalized version, then this will waste the crawl budget assigned to your website. This will lead to your site being crawled inefficiently, and due to this, the important pages of your site might not get indexed.
2. Block the indexing of canonicalized URLs
It’s incredible the difference one character can make! The following lines in the robots.txt file allow search engine bots to access all files:
While these lines block all bots from those files:
If you block the indexing of a canonicalized URL through robots.txt or noindex tag, then no link equity will be transferred from the non-canonical page to the canonical one. Therefore, you must make sure that you are not blocking any canonicalized URLs.
3. Applying a 4XX HTTP Status Code
Applying a 4XX error code serves the same purpose as the noindex or robots.txt tag. The crawler will not be able to crawl the canonicalized URL and discover the canonical version of the page. Therefore, the crawler cannot transfer the link equity from the duplicate to the original.
4. Relative URLs used instead of Absolute URLs
The crawlers read the relative URLs mentioned in the canonical tag as relevant to the current page. So if you include:
<link rel= “canonical” href= “images/donuts.jpg”>
Then, the crawler will regard this as an image under the ‘image’ subdirectory of your domain (yourwebsite.com).
However, sometimes, websites use the relative URL:
Now, the crawler will read this as:
And this is not a valid URL; therefore, Google recommends websites to use absolute URLs:
This way your purpose of using the rel=canonical is fulfilled efficiently.
5. Adding the canonical tag in <body>
Google recommends websites to include the rel=canonical on the page as early as possible, which means under the <head> tag. However, sometimes webmasters place the canonical tag under <body>, and in this case, Google disregards the canonical tag.
This mistake can be resolved by a simple double-check. Once you’ve added the canonical tag, you should always recheck its placement to ensure it is done correctly.
6. Not Using self-referencing canonical tag for paginated pages
Sometimes webmasters apply rel=canonical to the first page of the paginated series; this is not considered a correct usage of the canonical tag. When you canonicalize other pages in the paginated series to the first one, only the first page gets indexed. Hence, it is advised that you use self-referencing canonicals on all the paginated pages. Apart from this, you can use the rel=prev and rel=next tags for pagination.
How does Canonical Tag affect SEO?
The correct implementation of canonical tags can boost your site’s SEO. When you apply a canonical tag on a page’s duplicate, all the link equity, traffic, gets redirected to the canonicalized webpage. This improves the page’s authority. Also, through this tag, you can inform the crawlers about the authority page that must be ranked.
Applying a canonical tag on a page may seem easy, but there are certain tips that you need to remember for effective implementation. In the initial stages, the whole process might be a bit complicated to understand; however, after a while, you will know EXACTLY how a canonical should be used.
Just keep in mind these tips and steer clear of the common mistakes, and you’ll be just fine.
What you should do now
When you’re ready… here are 3 ways we can help you publish better content, faster:
- Book time with MarketMuse Schedule a live demo with one of our strategists to see how MarketMuse can help your team reach their content goals.
- If you’d like to learn how to create better content faster, visit our blog. It’s full of resources to help scale content.
- If you know another marketer who’d enjoy reading this page, share it with them via email, LinkedIn, Twitter, or Facebook.
Sahil is the CEO and Founder of Rankwatch - a platform, which helps companies and brands stay ahead with their SEO efforts in the ever growing internet landscape. Sahil likes making creative products that can help in automation of mundane tasks and he can spend endless nights implementing new technologies and ideas. You can connect with him and the Rankwatch team on Facebook or Twitter.