Posted by meghanpahinui

What is duplicate content, and why is it a concern for your website? Better yet, how can you find it and fix it?

In this week's episode of Whiteboard Friday, Moz Learn Team specialist, Meghan, walks through some handy (and hunger-inducing) analogies to help you answer these questions! 

Anatomy of a Perfect Pitch Email

Click on the whiteboard image above to open a high resolution version in a new tab!

Video Transcription

Hey, Moz fans. Welcome to another edition of Whiteboard Friday. I'm Meghan, and I'm part of the Learn Team here at Moz. Today we're going to talk a bit about duplicate content. 

So why are we talking about duplicate content?

Well, this is a pretty common issue, and it can often be a bit confusing. What is it? How is it determined? Why are certain pages on my site being flagged as duplicates of one another? And most importantly, how do I resolve it if I find that this is something that I want to tackle on my site? 

What is duplicate content?

So first off, what is duplicate content?

Essentially, duplicate content is content that appears in more than one place on the Internet. But this may not be as cut and dry as it seems. Content that is too similar, even if it isn't identical, may be considered duplicates of one another. 

When thinking about duplicate content, it's important to remember that it's not just about what human visitors see when they go to your site and compare two pages. It's also about what search engines and crawlers see when they access those pages. Since they can't see the rendered page, they typically go off of the source code of the page, and if that code is too similar, the crawler may think that it's looking at two versions of the same page. 

Imagine that you go to a bakery and there are two cupcakes in front of you that look almost identical. They don't have any signs. How do you know which one you want? That's what happens when a search engine encounters two pages that are too similar. 

This confusion between pieces of content can lead to things like ranking issues, because search engines may not be able to figure out which page they should rank or they may rank the incorrect page. Within the Moz tools, we have a 90% threshold for duplicate content, which means that any pages with code that is at least 90% the same will be flagged as duplicates of one another.

Solutions

So now that we've briefly covered what duplicate content is, what do we do about it? There are a few different ways that you can resolve duplicate content. 

301 redirects

First is the option to implement 301 redirects. This option would be similar to having a VHS copy of a movie, which maybe isn't so relevant anymore.

So you want to be sure to provide folks with the digital version that's streaming online. On your site, you can redirect older versions of pages to new, updated versions. This is relevant for issues with subdomain or protocol changes as well as content updates where you no longer want people to be able to access that older content.

Rel=canonicals

Next is the option to implement rel=canonicals on your page. Say you're at a bake sale and you have two types of cookies with you, sugar and chocolate chip. You consider your sugar cookies to be top-notch. So when folks ask you which one they should try, you point them to the sugar cookies even though they still have the option to try the chocolate chip.

On your site, this would be similar to having two items for sale that are different colors. You want human visitors to be able to see and access both colors, but you would use a canonical tag to tell crawlers which one is the more relevant page to rank. 

Meta noindex

You also have the option to mark pages as meta noindex.

For example, you may have two editions of your favorite book. You're going to read and reference that second edition because it's the newest and most relevant. But you still want to be able to read and access edition one should you need to. Meta noindex tags tell the crawler that they can still crawl that duplicate page, but they shouldn't include it in their index. This can help with duplicate content issues due to things like pagination. 

Add content

But what if you have two pages that really aren't duplicates of one another? They are about different topics, and they should be treated as separate pieces of content. Well, in this case, you may opt to add more content to each of these pages so it's less confusing to the crawler.

This would allow them to stand out from one another, and it would be similar to say adding sprinkles and a cherry to one cupcake and maybe a different color frosting to the other. 

Use Moz Pro to help identify and resolve duplicate content

If you ever need help identifying which pages on your site may be considered duplicates of one another, Moz Pro Site Crawl and On-Demand Crawl can help.

Within both of these tools, we'll flag which pages are considered duplicates of one another, and you can even export that data to CSV so you can analyze it outside of the tool. Just a little pro tip here. In the CSV export of that data, the duplicate content group will tell you which pages are considered duplicates of one another.

So any pages with the same duplicate content group number are part of the same group of duplicate pages. This is by no means an exhaustive list of the ways you can resolve duplicate content, but I do hope that it helps to point you in the right direction when it comes to tackling this issue. If you're interested in learning more about SEO fundamentals and strategy, be sure to check out the SEO Essentials Certification that's offered through the Moz Academy.

Thanks for watching.

Video transcription by Speechpad.com


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!