article directory
 

Just How Is Duplicate Content Defined And Does It Really Matter? - By: Don Saunders

The discussion over exactly what duplicate content is and whether duplicate content is a problem has been underway for some time now and there is no sign that it is going to go away. So exactly what is classed as duplicate content and does it matter?

It is widely felt that duplicate content does matter and, though one well known search engine optimization expert recently expressed the opposite view, even a cursory trawl through the mountain of material that has been written on this subject in recent weeks will clearly demonstrate that this is very much a minority opinion.

If we agree that duplicate content does matter, then how are we going to go about defining duplicate content? If I produce an original article for submission to an article directory and then alter that same article for submission to a second article directory how are the search engines going to analyze my two articles and decide whether they contain duplicate content? The answer is quite simply that we do not know, but here is this writer's opinion.

When checking for duplicate content was first done by the search engines it was very much a matter of looking at one web page as a whole against another and no attempt was made to start to cut apart the two pages and compare individual page elements. Back then you could make use of identical content and just add an introductory and concluding paragraph to one of the two pages to avoid the problem of duplicate content. Unfortunately for many publishers these days are now a distant memory.

The search engines now dissect the two pages to permit them to examine individual elements and it is this which is the core of today's argument. It is generally accepted that attention is now concentrated on the informational content of a page rather than the structure of the page. Many website designers use templates to create their pages which set the structure of each page including such things as navigation bars, headers and footers. This is widely considered to be quite acceptable and the search engines do not view this as duplicate content. What the search engines are examining is the central content that is contained within the body of the page. But exactly how do they go about their examination of this page content?

Some people believe that this comparison is done at 'block' level (checking individual sentences or paragraphs), although other people think that filters search for phrases or even individual words. Nobody really knows the answer but it might seem reasonable to conclude that the most likely basis of checking would be to use either phrase or sentence matching.

Sentence matching is reasonably straightforward and simply involves cutting both pages up into chunks on the basis of the punctuation of the page. Take, for instance, this sentence:

It is relatively easy to get a good deal on a health insurance plan, providing you know what to look for.

This could be classed as either one single sentence or two sentences, depending upon whether you use the strict definition of a full-stop as indicating the end of a sentence or adopt a flexible approach and make use of other punctuation marks, like commas.

Matching at the phrase level is somewhat more difficult. What is the definition of a phrase? Should it be 2 words or 3 words or 4 words or�?

For the moment let us say that we are going to define a phrase as 3 words. If this were the case the following phrases would all be seen as duplicate content if they appeared on two pages which were being checked:

In those days
Take a look
In the end
One way to
You can get
At that time
Day to day

All of these phrases are normal everyday phrases that could be used on pages about building a greenhouse, flying kites, autoresponders or anything else you care to mention. Now there are some people who would say that the search engines do check pages down to this level. For example, when I questioned the support staff of one popular duplicate checker (Dupecop) about the basis on which they checked for duplicate content they said:

"DupeCop compares both individual words and 3-word phrases. It also ignores all punctuation and scans across sentences"

It was not a surprise therefore that when I ran a number of articles through this system (comparing articles on the subject of Christmas dec�r against articles about gun dogs) I found they had an average of 25% duplicate content!

Taking this into account, I think it would be ridiculous that the search engines would have their filters set to this level. But where might the level be set? Should they be at 4 words or 5 words or�? Quite honestly, your guess would be as good as mine.

Over the years I have written and published hundreds of articles and watched the results in terms of duplicate content penalties, as far as it is possible for anyone to do this. Based upon my own experience I would say that filtering is not carried out down to the level of 3 or 4 word phrases but almost certainly ends at sentence level. As a result, as long as you are re-writing content down to sentence level, you should have no problem in escaping the content filters. Indeed, even if a couple of sentences are duplicated you will still be okay.

About the Author

WebMarketingCentre.com provides information on article writing and article submission and is also an article directory where you can pick up free articles for your website or ezine and to which you can submit articles on a wide variety of topics including writing for the web and much more.

Article Directory Source: http://www.articlerich.com/profile/Don-Saunders/17211




Click the XML Icon Above to Receive Articles Via RSS!

Page copy protected against web site content infringement by Copyscape

Do not copy content from the page unless you comply with our terms of service.
Plagiarism will be detected by Copyscape.