Thursday, July 5, 2007

Are you creating your own duplicate content?

This was sent to me from Michael & Steven Grzywacz, creators of DupeFreePro, a duplicate content prevention software. This is an excellent read.

Sounds crazy but it's possible you might be creating duplicates of your own content without even knowing it.
If the Search Engines see duplicates of your pages they automatically choose which page to rank and shove the rest in to their supplemental index (the black hole of Search Engine traffic).
It's important you understand how to avoid duplicating your own content so that you can stay in control of which of your pages rank in the Search Engines.
The two main possible causes for this are:
1) Duplicate Domain URL's2) Internal Duplicates
--------------------------------------------------Duplicate Domain URL's--------------------------------------------------
The Search Engines view all the following URL's as *separate* pages even though they all actually point to the same page...
http://yourdomain.com
http://yourdomain.com/
http://yourdomain.com/index.html
http://www.yourdomain.com
http://www.yourdomain.com/
http://www.yourdomain.com/index.html
If you (or others) are linking to your site using a variety of these different URL's you'll not only be diluting PR (Page Rank) on your site but you also stand the chance of having your content labelled as duplicate.
At the time of writing Google is known to be aware of this issue and are working to solve it. However, I urge you to not leave it to fate. Take control of the situation as soon as you can.
Fortunately the work around is very simple and only involves a small code to be placed in to your htaccess file on your webserver (this works on Apache servers only).
Jason Katzenback has created a video tutorial on the PortalFeeder Blog showing the code you need and how to use it. Check it out here:
http://portalfeeder.com/blog/?p=57
--------------------------------------------------Internal Duplicates--------------------------------------------------
Are you 100% certain you do not have duplicates of your own content within your own sites?
If you are using one of the popular free content management systems (i.e. WordPress) your site might already be suffering from this.
For example, WordPress the popular Blog management system, automatically creates archive and category pages on your Blog. The default settings of WordPress result in these archive and category pages containing duplicates of the exact same posts appearing elsewhere in your Blog.
When Google finds all the multiple versions of your post their bot tries to determine which page to rank and places all the rest in to the supplemental index.
This might not sound like a major problem because one way or another your content it still getting ranked but if it is left up to the Search Engine bots you may not get the page *you* want to rank.
Some content management systems create other kinds of internal duplicates such as different formats of the same page (i.e. PDF, text, word doc).
Perform the following Google search if you want to see how many pages your website has in the supplemental index:
site:www.yourdomain.com ***-view
(make sure you replace yourdomain.com with your actual domain name)
Any pages listed from this search will be pages of your website that Google has choose to move to their supplemental index. (You'll see the green text 'Supplemental Result' under each result).
Pages in the supplemental index are known to hardly ever get traffic if at all. This is until they move out of the supplemental index, however, many report this as hard to achieve.
The work around for this issue is to tell the Search Engine spiders to ignore specific locations on your website. This will enable you to control which pages will be indexed and ranked.
You can do this by adding the following code to a 'robots.txt' file at the root of your website:
User-agent:*Disallow: /example/directory/Disallow: /another/example/directory/Disallow: /one/more/example/directory/
The first part 'User-agent:*' causes the following statements to apply to all search engine bots that read the robots.txt file.
The 'Disallow: /.../' lines are where you list each directory location on your webserver that you want the Search Engine bots to ignore (i.e. NOT index).
So in our above example we are telling all search engine bots to*not* index any webpage or indexable file located in the three stated directory locations on our website.
Doing this correctly can really help you control which pages are chosen by the SE's to rank in their results.
If you're not sure how all this works please do take the time to understand the robots.txt properly before you implement it. Search Google for info on robots.txt and also check out the Wikipedia pagebelow:
http://en.wikipedia.org/wiki/Robots.txt
If you are putting in all the effort required to make sure your content is unique you really don't want to fall over at this last hurdle.
I hope if you weren't aware of these potential pitfalls before that you take the simple action necessary to ensure you don't fall victim to self-imposed duplicate content.
Talk soon,
Michael & Steven Grzywacz
DupeFree Pro http://www.dupefreepro.com/

At DupeFreePro they have a free duplicate content checker that I use for every article I write, The best thing about it is that it's free!! Go there, get it, use it! As you will notice, I did not use an affiliate link for DupeFreePro, I like the product and respect the guys enough to give you the direct link.

For more info about article writing go to my website at www.topshelfarticles.com/Professional_Article_Writing.html

No comments: