WordLog

A weblog authored by Carthik about the latest in the WordPress world.

Thursday, August 26, 2004

WordPress Permalinks and Google – An Experiment

Filed under: — Carthik @ 7:37 pm

I confess. I have been laying the ground to stage a small experiment involving WordPress and Google, without telling you, my precious readers.

WordLog.com does not appear to have been crawled and indexed by google yet. A search for wordlog.com, returns nothing. I even saved the following screenshot for future reference :
Wordlog not on google yet
This is inspite of the fact that several pages, with high PageRanks link to WordLog and send approx. 600-900 unique visitors this way. I am guessing that google is shy because the URIs here are not neat — they are in a query-string form.

I know that a lot of folks have talked about how query string make for bad search-engine PR. There is a Google Answers Answer on How to get Google to Spider Dynamically Gererated ASP Pages, and dynamic php pages, like the one WordLog is currently serving are similar to ASP pages in the eyes of the bot, I guess.

I have not submitted this site to google, nor do I intend to, for the next month, for the sake of this experiment, since it’s been a little over a month since I made the first post on WordLog.

Now what’s the experiment, you ask?
I am going to start using WordPress’ nicer URIs option, by turning on the Permalink Options. Let us see if turning it on makes any difference with respect to whether google wants to index and archive my pages or not.

Update: An interesting side-effect of my switching this blog to use permalinks is that the RSS feed now has new URLs for the posts, and so feed (news) readers show duplicate posts. Mine does, for one. Pretty obvious, but some subscribers might get irritated. If you are one of them, please accept my apologies.

20 Comments

  1. [...] I find funny and odd, because it found my site. The author (Carthik) is going to try an experiment, to see if it will fix this problem. But I don’t do any [...]

    Pingback by WhatLess ? » Google experiment — 8/27/2004 @ 1:20 am

  2. I think that this is an extremely worthwhile experiment, Carthik!

    Comment by Geof — 8/26/2004 @ 8:10 pm

  3. I think so too, Geof. One way or the other, it will prove something. The only bias is that the blog with the permalinks is not a fresh one, so it probably has a lead start, but, like the screenshot shows, it’s not in google yet. I hope that is fair enough.

    Comment by Carthik — 8/26/2004 @ 8:20 pm

  4. My blog wasn’t indexed neither by Google. So I used the ‘rewrite’ module from Apache to create “nice” urls. Two weeks later my site was correctly indexed. So I guess the URLS must not contain “&” and “?” characters.

    Comment by stef — 8/26/2004 @ 8:37 pm

  5. I use .htaccess mod-rewrite rules for the “nice” permalinks, and Google has extensively indexed my site.

    Comment by Carsten — 8/26/2004 @ 8:48 pm

  6. I’m with Carsten. Sincce I bagan with nice URLs, ElTintero has been indexed nicely by Google, and when I had /index.php?p=1, it was indexed, but not as much as with nice urls.

    Comment by Jesus Vargas — 8/26/2004 @ 9:02 pm

  7. I use messy urls for my wordpress site and google indexes them just fine and without prompting.

    Comment by Paul — 8/26/2004 @ 9:06 pm

  8. stef, Carsten, Jesus :
    Yes, that is what I am trying to prove — that using WordPress’ Permalinks does lead to better indexing by google. That’s the idea of the experiment.
    I have used it succesfully at my blog.

    Comment by Carthik — 8/26/2004 @ 9:17 pm

  9. Paul,
    Google only indexes the blog’s main url :
    google search for excitingness.com.

    It does not returen results for searches of particular posts , like this one for your finding nemo post.

    So without permalinks, though your blog url is indexed, none of your posts themselves are.

    Comment by Carthik — 8/26/2004 @ 9:23 pm

  10. My feed reader, SharpReader, shows no duplicate posts. It’s probably all in how the specific program identifies and stores and individual post. Your reader probably relies on the URL, while mine probably uses the timestamp, title, or a combination.

    Good luck with the experiment. I’ll give a guess of 6 days to be indexed. Keep up the good wordpress news.

    Comment by Chad Brandos — 8/26/2004 @ 9:49 pm

  11. Carthik congratulation on being the second person I don’t know to see my site.

    What I find surprising is that if I do a search for “wordlog” I can’t find your site in the first ten pages of results. However there are links that link to your site on the first page. So I think you should see what happens if you drop the dotCom. And give yourself a little bit more unique title like WordPress Wordlog or WordLog. I just think you should make it part of the experiment, see what happens.

    Comment by WhatLess — 8/26/2004 @ 10:14 pm

  12. Chad,
    I think the index is updated once a month or so.
    I’d be happy to see it happen within a week.
    Let’s see how it goes.

    Comment by Carthik — 8/26/2004 @ 10:19 pm

  13. WhatLess, Glad to have visited you!

    Well, the way I look at it, I want the content in wordlog indexed, so if someone searches for help on some of the things posted here, they can find it. It is not only about the title, really :)

    Comment by Carthik — 8/26/2004 @ 10:21 pm

  14. Good to try an experiement every now and then, especially in the world of SEO. I hope it makes some improvement. However, I am with Paul above, my site is using the default URI’s and I have no problems getting indexed and BTW Carthik, most of the majors index more than just the index page of your blog.

    I have tried a few things in the past to see what was going on with regard to SEO, pagerank and keyword padding.

    Keyword Padding Test
    http://skebrown.com/index.php?p=38

    Blog Software IS Healthy bot-food
    http://skebrown.com/index.php?p=197

    I hope we see some results soon?

    Thanks for the space.

    Comment by skebrown — 8/26/2004 @ 10:49 pm

  15. We use old styled URIs at: http://searchenginejournal.com/ and Google does fine.

    Comment by Sushubh — 8/27/2004 @ 1:19 am

  16. I dont understand why Google wouldnt crawl your site. My blog is crawled about daily by Googlebot, and a community site I run, which uses “non pretty url” (ie /news.shtml?12145212364562 and stuff) is also crawled more than daily. I’ve never submitted any to Google, it just came by itself following links I guess.

    Comment by Ozh — 8/27/2004 @ 1:50 am

  17. Oh… Anothor point not to forget : Google does not read application/xml ot appliation/xml+html.

    Comment by stef — 8/27/2004 @ 8:23 am

  18. Carthik: You’re right, the individual pages are not indexed. However the monthly pages are, which also use messy urls. How confusing!

    Comment by Paul — 8/27/2004 @ 10:07 am

  19. Funny. I actually just found your site ranked on the first page of Google for “wordpress permalinks”. Perhaps it just needed more time. Generally Google will index a new websites homepage more quickly than the rest of the site. A “deep crawl” of all the pages can often take 2 months or more, depending on how many links are pointing to your site.

    Comment by Jon — 10/22/2004 @ 5:20 pm

  20. Jon,

    That is a urlizer.com link. I exchanged a couple of emails with google.com. They said wordlog.com will not be indexed, since I “violated” some webmaster guideline. No amount of protesting on my part seems to clear that up. I guess the reason they think that way is because when this was launched, a lot of people linked to it, all at once, and so they might have thought it is a link farm or something…

    Comment by Carthik — 10/22/2004 @ 6:32 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

 

Powered by WordPress

eXTReMe Tracker