The Blair Robot Project Blog
Writing a Search Engine - Part 2 (2009-07-02 01:47:32) by Scott
Note: this article is a continuation of a previous article search engines.
After a bit over a week coding (and learning various Perl libraries), I have completed stages 1 and 2 of the search, although stage 2, the indexer, could do with a little improvement. Both are written in perl, and as usual, the complete code listings are below. I decided to write the entire spider and indexer in perl and optimize as necessary later on, so that I could get done with the thing and not get bogged down in C code. If the perl turns out not to be fast enough as the site grows, then I plan to port to C. Likewise, the actual search part (stage 3) will be written in PHP to save time. If the PHP is not fast enough, I'll rewrite it in C - but I expect there to be no problems.
Read moreCalling a CGI Script from PHP (2009-06-28 02:02:06) by Scott
For any given weird, seemingly pointless action, you will be able to find at least 3 good, interesting reasons for wanting to take that action (provided you look around hard enough). In my case, the action was calling a cgi script from php, and the reason was to implement a good logging system. I run a bugzilla installation here, and I didn't want to go editing every page to get a logger. I also didn't want to use Apache's own loggers, since a MySQL database is much easier to handle than a super-sized file.
My solution was to create an index.php file and modify the htaccess so that every request to the bugzilla installation went through index.php. Then, I could have index.php log the event, and call the appropriate cgi script.
Using Access Control Lists (ACL) (2009-06-25 00:56:34) by Scott
If you've been using unix or linux long enough, you've probably found yourself wishing for more powerful permissions management. For example, if your web development group has been cooperating with another group on a somewhat sensitive project, you've probably wished you could easily set it so that those two groups - and only those two groups - could read and write to certain files. Or, as in my case, wished that you could allow the web server user to read and write to files without having to mess with any group permissions. ACLs are the tool that lets you do that.
Read moreWriting a Search Engine - Part 1 (2009-06-24 02:04:41) by Scott
I decided that the website had grown to the point where it needed a search engine. I didn't want to use a google search or an embedded yahoo search - they look disgusting. I also didn't want to use any of the third party searching scripts, since most of them were costly and all of them had commercial licenses. I like free software. So I set out to write my own.
General Considerations
Let me start off by saying that what I have below is not a magic, easy solution to writing a search engine. If you are planning to write the world's "next google", I have a recommendation: go to http://bing.com - Microsoft's "next google". Notice how "copycatted" it looks. Then search around (on google, please) to find out exactly how popular it is. Hint: not very. Microsoft tried and failed. Don't waste your time. My problem is to build an internal search engine, which only needs to deal with a small number of pages, and is low traffic so it doesn't have to be super fast. When I told my co-working friends about the project, the responses I got varied from "maniac" to "shoot yourself now rather than afterwards - save some time". And that's with a highly simplified version of the problem.
Read moreCreating an Eclipse Plug-in Update Site - And Fixing Hidden Errors in Plug-ins (2009-06-22 02:40:20) by Scott
Today, I set out to create an Eclipse update site to host Team 449's plug-ins. (If you're in a rush for answers, just scroll to the bottom.)
Eclipse is actually one of the less buggy of applications. It doesn't crash every three minutes like Bloodshed Dev-C++, it doesn't crash every two minutes like Internet Explorer, and it doesn't crash every minute like Windows. The Java development environment is beautiful, the task management system is well-designed if poorly-integrated, and the C++ development environment is second only to Emacs, Vim, Ed, and the ever-elusive butterfly technique (its not actually that great). Eclipse has a unique system of managing views and perspectives that allows a huge amount of flexibility to the user. But Eclipse's plug-in development environment is less than satisfactory to the developer.
To begin with, Eclipse plug-ins are based on a less-than-easy-to-use API. Sure, its easy to do the things the designers anticipated you would be doing - like compiling their sample projects. But there's no level of abstraction between "here's a method to do exactly what you want" and rewriting half the API from scratch. Furthermore, the Eclipse developers insisted on creating their own system for everything, from user interface functionality to file system access. A good deal of the documentation is totally obscure, and the sample projects are no help.
Now suppose you've suffered through the experience of writing a plug-in, you're almost done, and you're ready to export. Or more likely, you've just begun, and you're ready to export just to see what happens. There's just one little problem - that tiny icon over in the "package explorer" with the red X or the yellow exclamation point - Eclipse doesn't like something. Such was my situation as I was trying to export my plugin, so that I could export my feature, so that I could test my update site.
Read more

