WordLog

A weblog authored by Carthik about the latest in the WordPress world.

Monday, May 14, 2007

Supplemental Results and WordPress

Filed under: — Carthik @ 7:46 am

I happened upon a curious trick to find out all your pages listed as “supplemental results” in google, and some other associated supplemental result tricks. A lot of you might already know these tricks, but I think reading the rest of this article might get you thinking about these supplemental results in a new way. I spent a good part of 1.5 hours playing with this stuff and reading up on it, which I try to summarize here.

To start off, let us look at the tricks.

Finding all supplemental results for your blog

The trick is to do a search for the string “site:wordlog.com *** -spght” in google. That gives you all the pages on your wordpress blog listed as supplemental results. The search result that google returns will have a “Supplemental Result” in the text that follows the url and the short excerpt, and as you can see, all the results for the string I refer to above have that after the results. The spghy can be changed to some other random characters – it doesn’t matter.

Finding all results that are not supplemental results

The following query will show all results that are not supplemental results:
site:wordlog.com -allinurl:wordlog.com“.

So, for wordlog.com, there are 227 non-supplemental results and 196 supplemental results. However, a search for “site:wordlog.com” returns 325 results, and 196+227 = 423. So I think some of the results returned for “site:wordlog.com” are supplemental results. At the time this article was written, page 25 of the results has two supplemental results right at the top.

What are Supplemental Results?

According to Google,

A supplemental result is just like a regular web result, except that it’s pulled from our supplemental index.

and, additionally, Google maintains that

…the index in which a site is included is completely automated; there’s no way to select or change the index in which a site appears. Please also be assured that the index in which a site is included doesn’t affect its PageRank.

So we know that there is no way to formally request supplemental index pages to be moved to the main results pages. However, one thing bothers me, sort of.

Most of the non-supplemental results for wordlog.com are the archive and category pages. I believe the individual posts should be there in stead. I have noticed, many times, that when I search for a term, I am most often led to the category or date-based archives of a blog, and then I have to manually search for the term again in Firefox, and then, since many themes display only excerpts in these pages, i have to click the article to read it to get the information I need. This is annoying, to say the least.

Fixing the supplemental results problems

There is a duplicate content cure plugin for wordpress that promises to reduce the duplicate content indexed by google by way of your archive and category pages. It does so by adding directives to google to not index archive and category pages by means of meta tags in the page headers. One would think this would cure the supplemental results problem too, and make all your blog posts preferred over the archive pages.

As a small experiment to test this theory that the duplicate content cure plugin will help alleviate the supplemental index problem, I did searches for supplemental and non-supplemental results for seologs.com, the site that published the plugin. Amazingly, seologs has 385 supplemental results and 243 non-supplemental results! So now it appears that the plugin is not the silver bullet for the problem. However, as promised by the plugin, the archive pages are missing from the pages indexed by google. Is this a good thing, though? If the number of indexed, non-supplemental pages are the metric, then it is not. Without the plugin, all of wordlog’s archives are indexed and probably will be returned as search results for some terms. The duplicate content cure plugin prevents some pages from being indexed, totally – it would be nice if it did not do that, really. It is better to have visitors find useful content via your archives if not via a direct link to the relevant article.

Ideally, I would love for the archives pages to be indexed too, with the blog posts being indexed in the main index. Heck, I would love to have all the pages in the supplemental index to be in the main index instead. There are lots of suggested tricks to avoid the supplemental index. The issue with archive pages in wordpress blogs being indexed more prominently is because all WordPress blogs have relative links to the archives pages that look like the following if you look into the source of the page:

"<link rel='archives' title='May 2007' href='http://wordlog.com/archives/2007/05/' />"

In addition to this, you also have links to the archives from the sidebar, which is probably displayed on all pages of your site. The indexing robots should think these pages are really important, since you seem to link to them from every page on your site.

So, a simple way to fix the problem, or at least try to get some pages in the main index might be to have a sitemap containing each and every post on every page in your blog. That would make the pages huge! An alternative would be to have an html sitemap and link to it from the the sidebar or footer. You could also link to posts you think are important from the sidebar. The important things to remember are that:
1) It’s better to have a page in the main index than the supplemental index.
2) It’s better to have a page in the supplemental index than to not have the page indexed at all!

I have a couple of ideas floating around in my brain that I will implement to accomplish item #1 above without violating item #2. I will try them out and let you know if the results are worth mentioning. Do you have any ideas that have worked, that can be verified in a straightforward manner? Blame it on what I do for a living, but I have come to trust verifiable results over speculation and hypothesis.

8 Comments

  1. Whether the statements by Google are true or not, they are certainly partial:

    “A supplemental result is just like a regular web result, except that it’s pulled from our supplemental index.”
    And you have to look for a very long time to find any supplemental result in the SERPs.

    “Please also be assured that the index in which a site is included doesn’t affect its PageRank.”
    Which again is meaningless- it’s the results, not the (Toolbar) PR, that matter. Oh, and I was under the vague impression that PR applied to a page not a site….

    Comment by Lisa — 5/14/2007 @ 9:32 am

  2. It would be swell if WordPress had a field for a meta description (and possibly keywords) that would apply to an individual post or page and override the generic blog description tags when the individual post/page is indexed.

    I have a few “feature” pages on my site that I run through WP, but I use a custon template for only that page. Those invariably get better search results than general blog posts. And like you, I’ve definitely noticed plenty of search hits landing in category or archive pages instead of on the actual page the person wants.

    Comment by David — 5/14/2007 @ 10:54 am

  3. I changed my archives format to /foo/ rather than date-based archives. I 301 redirect all date requests to the site map, where I have a link to every page and post on my site, including the categories.

    That seems to work very well, as I’m now getting my top referals [sic] from various Google sites (Google Images is almost always in second).

    Comment by Jonathan — 5/14/2007 @ 5:25 pm

  4. If you really dig into this issue, you would find that archives, categories, site maps, and similar pages are considered “normal” supplemental pages and not penalized as duplications. A lot of people freaked out and took this too far when it first came out without reading the fine print. They spend a LOT of time on this issue, like you, for something they really didn’t have to worry about. It’s easy, if you are fully hosted, to remove all those date archives from the header and rely upon categories and site maps, which some did, but it won’t hurt you.

    Normal blog content isn’t penalized. The issue of duplication comes from splogs using duplicate information on one or more of their web pages and duplicate, already published content from others. Those who use more blockquotes than original content might be penalized if the issue of duplication is really taken seriously by Google.

    Comment by Lorelle — 5/15/2007 @ 10:36 am

  5. Lorelle, the thing is – my archives are not “supplemental” and my posts are in the supplemental index! So the point is that things seem to be strangely inverted.

    I agree that it would be natural to have the posts in the main index and the archives, both date- and category-based in the supplemental index.

    Comment by Carthik — 5/15/2007 @ 12:40 pm

  6. Hi, just wanted to say you have a very nice blog, am gonna add it in my website in the blogs section, but i think you should replace your background with a better one.. but this is just my opinion :)
    cheeeers.

    Comment by Mark87 — 6/6/2007 @ 6:56 pm

  7. Just a question… If a page is a supplemental result, would it be a supplemental result for anyone who want to exchange link with that page?

    Comment by roberto — 6/11/2007 @ 6:12 pm

  8. @14

    that’s a good question

    Comment by jan weidenbach — 8/30/2007 @ 4:50 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

 

Powered by WordPress

eXTReMe Tracker