Posts Tagged Apache

Calling a CGI Script from PHP

For any given weird, seemingly pointless action, you will be able to find at least 3 good, interesting reasons for wanting to take that action (provided you look around hard enough). In my case, the action was calling a cgi script from php, and the reason was to implement a good logging system. I run a bugzilla installation here, and I didn’t want to go editing every page to get a logger. I also didn’t want to use Apache’s own loggers, since a MySQL database is much easier to handle than a super-sized file.

My solution was to create an index.php file and modify the htaccess so that every request to the bugzilla installation went through index.php. Then, I could have index.php log the event, and call the appropriate cgi script.

Read the rest of this entry »

Tags: , , , , , , ,

1 Comment

Causes of Lighttpd 403 Forbidden

On a fresh installation of lighttpd (which I chose instead of Apache because the server was dreadfully old and slow), I discovered that although html and other client-side files worked fine, trying to browse php and cgi files resulted in a “403 Forbidden” message. Being an Apache veteran, I checked the htaccess (there was none), made sure the permissions were properly set (world-readable), and looked through the config file to make sure I had gotten rid of all of the lines that instructed the server to return 403 on .php requests (I had had a problem with those in Apache once). Nothing. I checked the error log – nothing noteworthy.

I next checked to make sure the modules were being loaded. Well, the instruction to load the modules was right there:

server.modules = (
            "mod_access",
            "mod_alias",
            "mod_accesslog",
            "mod_compress",
            "mod_fastcgi",
#           "mod_rewrite",
#           "mod_redirect",
#           "mod_evhost",
#           "mod_usertrack",
#           "mod_rrdtool",
#           "mod_webdav",
#           "mod_expire",
#           "mod_flv_streaming",
#           "mod_evasive"
)

Later on in the file, I found instructions to load php via fast-cgi:

static-file.exclude-extensions = ( ".php", ".pl", ".fcgi",".cgi")

The ultimate solution was trivial. Lighttpd apparently has pretty bad error reporting – the modules were not, in fact, being loaded. I had to move the appropriate files from /etc/lighttpd/conf-available to /etc/lighttpd/conf-enabled.

Tags: , , , , ,

No Comments

Minifying CSS and JS

Many web developers want to make use of large, flashy Javascript libraries that allow fancy effects. (Most such libraries also come with large CSS files). Team 449’s website uses Prototype (an AJAX library) and Scriptaculous (a JS effects library built on top of prototype). While many or most viewers of the website may enjoy the experience, others, who have slower connections or are farther away or have the bad luck to view the website during peak viewing time, get frustrated with the long load time.

The solution to this is twofold. First, install gzipping on the server – I’ll discuss this in a later entry. Second, use a minification program, such as the one Yahoo provides. These programs take the JS or CSS files and remove unnecessary whitespace and comments to decrease the file size. The end result – the load time is often cut in half or better.

Yahoo’s compressor must be called from the command line and told to actually compress the file. Naturally, most people will want to automate this process, so that they don’t have to remember to call the compressor everytime they install a new version of scriptaculous or prototype, or make a change.

One possibility is to set up a serving script that first calls the compressor and then serves the page. This can be done in the htaccess like so:

Options +FollowSymlinks
RewriteEngine on

RewriteRule ^js/(.*)$ jserve.php/$1 [L]
RewriteRule ^style/(.*)$ cserve.php/$1 [L]

This is grossly inefficient. For a medium or high-traffic website, it will completely kill your server. The better solution (and the one we used at the Blair Robot Project) is to have the compressor be called in your crontab, and then have the serving script serve the minified file. The crontab entry would then look like:

*/30 * * * * /var/www/robot/scripts/manage-roboweb.sh

The above entry runs manage-roboweb.sh every 30 minutes. In manage-roboweb.sh, we have a call to a c minification program that automatically finds every file in a specified directory (in this case /var/www/robot/js) that has a specified extension, and processes that file with a call to YUI (the Yahoo compression application).

This can be improved by having the serving script first check to see if the minified file is up-to-date, by checking the last modified times of each. You can also have the managing script do the same, so that the files are only re-minified if they were modified. I’ve placed the code we used for both serving scripts below.

cserve.php

  1. function minify_version($fn) {
  2.   $min_fn = str_replace(".css", ".min.css", $fn);
  3.   $min_stat = stat($min_fn);
  4.   $norm_stat = stat($fn);
  5.   return $min_stat && $min_stat['mtime'] >= $norm_stat['mtime']
  6.     ? $min_fn
  7.     : $fn
  8.     ;
  9. }
  10. if (substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip')) {
  11.      ob_start("ob_gzhandler");
  12. } else {
  13.   ob_start();
  14. }
  15. $offset=1000*3600*48;//48 hour cache
  16. header("Expires: ".gmdate("D, d M Y H:i:s",time()+$offset)." GMT");
  17. header("Cache-Control: max-age=$offset, must-revalidate");
  18. $gmdate_mod = gmdate('D, d M Y H:i:s', time()) . ' GMT';
  19. header("Last-Modified: $gmdate_mod");
  20. header("Pragma: public");
  21. header("Content-Type: text/css");
  22. $url=$_SERVER["REQUEST_URI"];
  23. $fn=minify_version("/var/www/robot/style/" .
  24.         str_replace("/style/","",$url));
  25. $file=file_get_contents($fn);
  26. echo $file;

jserve.php

  1. if (substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip')) {
  2.      ob_start("ob_gzhandler");
  3. } else {
  4.   ob_start();
  5. }
  6.  
  7. function minify_version($fn) {
  8.   $min_fn = str_replace(".js", ".min.js", $fn);
  9.   $min_stat = stat($min_fn);
  10.   $norm_stat = stat($fn);
  11.   return $min_stat && $min_stat['mtime'] >= $norm_stat['mtime']
  12.     ? $min_fn
  13.     : $fn;
  14. }
  15. $offset=1000*3600*48;//48 hour cache
  16. header("Expires: ".gmdate("D, d M Y H:i:s",time()+$offset)." GMT");
  17. header("Cache-Control: max-age=$offset, must-revalidate");
  18. $gmdate_mod = gmdate('D, d M Y H:i:s', time()) . ' GMT';
  19. header("Last-Modified: $gmdate_mod");
  20. header("Pragma: public");
  21. header("Content-Type: text/javascript");
  22. $url=$_SERVER["REQUEST_URI"];
  23.  
  24. $fn = minify_version("/var/www/robot/js/" .
  25.         str_replace("/js/", "", $url));
  26.  
  27. $contents=file_get_contents($fn);
  28.  
  29. echo $contents;

Tags: , , , , , , , , , ,

3 Comments

Things to Watch Out For: SEO

Over the past few months, I’ve noticed several aspects of the website that were damaging our ranking in google. They’ve been fixed, with the fixes ranging from standard to hideously messy. Here they are.

  1. Watch out for the ‘www.’ prefix. Most of the time, a website named ‘website.ext’ can be accessed both as ‘website.ext’ and ‘www.website.ext’. Google will think they are two different websites, index your site twice, and thus halve your page rank. I also suspect (but am not sure) that Google will penalize you for content duplication (suspected plagiarism and worthless content in any case). This can be fixed with the following addition to the htaccess file:

    RewriteCond %{HTTP_HOST} !^robot.mbhs.edu$ [NC]
    RewriteRule ^(.*)$ http://robot.mbhs.edu/$1 [L,R=301]
    
  2. Watch out for duplicates of individual pages. For simple sites without dedicated serving scripts, this is almost never a problem, since Apache is intelligent enough to do the necessary redirects for you. If, however, you have serving scripts, you may discover that Google has indexed both http://robot.mbhs.edu/contact and https://robot.mbhs.edu/contact/. As noted above, this will decrease your pagerank, and I suspect Google may penalize you for content duplication. Fixing this is more involved, and really depends on your serving script. I managed to fix by adding, to my htaccess, the following:

    DirectorySlash off
    
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteCond %{REQUEST_URI} !alumni
    RewriteCond %{REQUEST_URI} !(.*)/$
    RewriteRule ^(.*)$ http://robot.mbhs.edu/$1/ [R=301,L]
    

    and then adding to my serve.php:

    if(ereg(".+/$",$_SERVER["REQUEST_URI"])) {
        header("HTTP/1.1 301 Moved Permanently");
        header("Location: http://robot.mbhs.edu$betterurl");
        return;
    }
    

    I have the special line for ‘alumni’ in the htaccessbecause that is a directory in addition to a page served by my serving script. The ‘$betterurl’ variable is just the url, what those 5 lines of code do is just strip the slash at the end of the url of the served page and cause a 301 redirect.

  3. Page aliases cause trouble with links. When designing my serving script, I had thought, “oh, cool, I can make it so that the user can get to a page from multiple URLs! That way, they don’t have to remember they exact URL, and we can make allowances for mistakes!”. Hahaha. Internal links went that way too, which meant that at one point, Google had listed several pages 3 and 4 times. The solution to this is actually pretty simple – just use a 301 redirect in the php script. This is accomplished with:

        header("HTTP/1.1 301 Moved Permanently");
        header("Location: http://robot.mbhs.edu$betterurl");
    

Tags: , , , ,

6 Comments