Posts Tagged SEO

Maintaining a Separate Draft Copy of a Website

On websites with traffic outside of a group of coworkers, you may find it desirable to modify a copy of the website, and then periodically upload the new version. Both the main robotics website and TMS (”Team Management System”) are developed apart from the main website, and then the drafts are periodically “pushed” onto the main site.

While this technique may not seem particularly impressive to some, some problems come up when you actually try to implement it. The most major problem is that when you push, all the links are now broken. A link to /draft/page.html needs to become /page.html when the page is pushed, and this is hard to automate. (A simple regexp is not enough: what about favicons and stylesheets?) The more minor problem that comes up is design-based, and depends on how you plan to store your pages. If you use a simple filesystem-oriented storage method, there will be no problem.

Read the rest of this entry »

Tags: , , , , , ,

1 Comment

Things to Watch Out For: SEO

Over the past few months, I’ve noticed several aspects of the website that were damaging our ranking in google. They’ve been fixed, with the fixes ranging from standard to hideously messy. Here they are.

  1. Watch out for the ‘www.’ prefix. Most of the time, a website named ‘website.ext’ can be accessed both as ‘website.ext’ and ‘www.website.ext’. Google will think they are two different websites, index your site twice, and thus halve your page rank. I also suspect (but am not sure) that Google will penalize you for content duplication (suspected plagiarism and worthless content in any case). This can be fixed with the following addition to the htaccess file:

    RewriteCond %{HTTP_HOST} !^robot.mbhs.edu$ [NC]
    RewriteRule ^(.*)$ http://robot.mbhs.edu/$1 [L,R=301]
    
  2. Watch out for duplicates of individual pages. For simple sites without dedicated serving scripts, this is almost never a problem, since Apache is intelligent enough to do the necessary redirects for you. If, however, you have serving scripts, you may discover that Google has indexed both http://robot.mbhs.edu/contact and https://robot.mbhs.edu/contact/. As noted above, this will decrease your pagerank, and I suspect Google may penalize you for content duplication. Fixing this is more involved, and really depends on your serving script. I managed to fix by adding, to my htaccess, the following:

    DirectorySlash off
    
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteCond %{REQUEST_URI} !alumni
    RewriteCond %{REQUEST_URI} !(.*)/$
    RewriteRule ^(.*)$ http://robot.mbhs.edu/$1/ [R=301,L]
    

    and then adding to my serve.php:

    if(ereg(".+/$",$_SERVER["REQUEST_URI"])) {
        header("HTTP/1.1 301 Moved Permanently");
        header("Location: http://robot.mbhs.edu$betterurl");
        return;
    }
    

    I have the special line for ‘alumni’ in the htaccessbecause that is a directory in addition to a page served by my serving script. The ‘$betterurl’ variable is just the url, what those 5 lines of code do is just strip the slash at the end of the url of the served page and cause a 301 redirect.

  3. Page aliases cause trouble with links. When designing my serving script, I had thought, “oh, cool, I can make it so that the user can get to a page from multiple URLs! That way, they don’t have to remember they exact URL, and we can make allowances for mistakes!”. Hahaha. Internal links went that way too, which meant that at one point, Google had listed several pages 3 and 4 times. The solution to this is actually pretty simple – just use a 301 redirect in the php script. This is accomplished with:

        header("HTTP/1.1 301 Moved Permanently");
        header("Location: http://robot.mbhs.edu$betterurl");
    

Tags: , , , ,

6 Comments