Over the past few months, I’ve noticed several aspects of the website that were damaging our ranking in google. They’ve been fixed, with the fixes ranging from standard to hideously messy. Here they are.

  1. Watch out for the ‘www.’ prefix. Most of the time, a website named ‘website.ext’ can be accessed both as ‘website.ext’ and ‘www.website.ext’. Google will think they are two different websites, index your site twice, and thus halve your page rank. I also suspect (but am not sure) that Google will penalize you for content duplication (suspected plagiarism and worthless content in any case). This can be fixed with the following addition to the htaccess file:

    RewriteCond %{HTTP_HOST} !^robot.mbhs.edu$ [NC]
    RewriteRule ^(.*)$ http://robot.mbhs.edu/$1 [L,R=301]
    
  2. Watch out for duplicates of individual pages. For simple sites without dedicated serving scripts, this is almost never a problem, since Apache is intelligent enough to do the necessary redirects for you. If, however, you have serving scripts, you may discover that Google has indexed both http://robot.mbhs.edu/contact and https://robot.mbhs.edu/contact/. As noted above, this will decrease your pagerank, and I suspect Google may penalize you for content duplication. Fixing this is more involved, and really depends on your serving script. I managed to fix by adding, to my htaccess, the following:

    DirectorySlash off
    
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteCond %{REQUEST_URI} !alumni
    RewriteCond %{REQUEST_URI} !(.*)/$
    RewriteRule ^(.*)$ http://robot.mbhs.edu/$1/ [R=301,L]
    

    and then adding to my serve.php:

    if(ereg(".+/$",$_SERVER["REQUEST_URI"])) {
        header("HTTP/1.1 301 Moved Permanently");
        header("Location: http://robot.mbhs.edu$betterurl");
        return;
    }
    

    I have the special line for ‘alumni’ in the htaccessbecause that is a directory in addition to a page served by my serving script. The ‘$betterurl’ variable is just the url, what those 5 lines of code do is just strip the slash at the end of the url of the served page and cause a 301 redirect.

  3. Page aliases cause trouble with links. When designing my serving script, I had thought, “oh, cool, I can make it so that the user can get to a page from multiple URLs! That way, they don’t have to remember they exact URL, and we can make allowances for mistakes!”. Hahaha. Internal links went that way too, which meant that at one point, Google had listed several pages 3 and 4 times. The solution to this is actually pretty simple – just use a 301 redirect in the php script. This is accomplished with:

        header("HTTP/1.1 301 Moved Permanently");
        header("Location: http://robot.mbhs.edu$betterurl");
    

Related posts: