Posts Tagged perl

MySQL vs PostgreSQL: Benchmarking Data

After looking into migrating to PostgreSQL, which seems to be a popular pastime among database people (migrating, not looking into), I decided to do my own benchmarks. Here are the results of the simple ones (I have yet to code the complex ones). I wrote all the code in perl, and ran it on quentin. I use InnoDB for mysql, with defaults, and everything default on postgres.

Read the rest of this entry »

Tags: , , , ,

2 Comments

Risky Plans

It’s been a strange week – it seems that I decided, once I was almost done with the search engine, that it was time to start some more crazy projects. The disease is catching – Eric has his own project, not yet approved, that even I consider to be extreme. We won’t go into it here.

My three projects are, in order of dubiousness of completion: an in-house listserv to be hosted on ogodei, an IRC server, and an analysis system for FIRST team data (not yet online). The first two will both be linked to the Eric’s “Team Management System” – which is what makes them especially challenging. The third is designed to find patterns (defined by me as a lack of randomness) in team data, including sponsors, team size, geographic location, and performance at competition. Obviously there are some trivial patterns with which I can test the system: “hey! most teams in the D.C. region go to the D.C. regional!” My hope is to find something slightly more useful than that: “hey! everytime ARL decides to sponsor a team, Microsoft joins them the next year!”

Read the rest of this entry »

Tags: , , , , , , ,

No Comments

Writing a Search Engine – Part 3

Note: this article is a continuation of previous articles on search engines (part 1, part 2).

After testing out an alpha version of my search engine for a while, I realized that its greatest flaw (other than printing out the results in a downright ugly format) was that it couldn’t recognize “programs” as a variant of the word “program”. I briefly considered programming it to automatically check for a limited set of common variants, but I decided that this was probably too much effort for what would be a decidedly low-quality result. I needed to find a list of, for every word, all of its variants.

What I was looking for (I discovered after about 45 minutes of IRC chat, google, and man pages) was the ispell english dictionary. ISpell dictionaries have a list of ‘roots’, and then, for every root, they have a list of flags that describe how that root can be transformed to form valid words. I could enter this information, via perl script, into a mysql table, and then retrieve it quickly both while indexing and while searching.

Read the rest of this entry »

Tags: , , , , ,

No Comments

Writing a Search Engine – Part 2

Note: this article is a continuation of a previous article on search engines, and has been continued with part 3.

After a bit over a week coding (and learning various Perl libraries), I have completed stages 1 and 2 of the search, although stage 2, the indexer, could do with a little improvement. Both are written in perl, and as usual, the complete code listings are below. I decided to write the entire spider and indexer in perl and optimize as necessary later on, so that I could get done with the thing and not get bogged down in C code. If the perl turns out not to be fast enough as the site grows, then I plan to port to C. Likewise, the actual search part (stage 3) will be written in PHP to save time. If the PHP is not fast enough, I’ll rewrite it in C – but I expect there to be no problems.

Read the rest of this entry »

Tags: , , , , , , ,

No Comments

Writing a Search Engine – Part 1

I decided that the website had grown to the point where it needed a search engine. I didn’t want to use a google search or an embedded yahoo search – they look disgusting. I also didn’t want to use any of the third party searching scripts, since most of them were costly and all of them had commercial licenses. I like free software. So I set out to write my own.

General Considerations

Let me start off by saying that what I have below is not a magic, easy solution to writing a search engine. If you are planning to write the world’s “next google”, I have a recommendation: go to http://bing.com – Microsoft’s “next google”. Notice how “copycatted” it looks. Then search around (on google, please) to find out exactly how popular it is. Hint: not very. Microsoft tried and failed. Don’t waste your time. My problem is to build an internal search engine, which only needs to deal with a small number of pages, and is low traffic so it doesn’t have to be super fast. When I told my co-working friends about the project, the responses I got varied from “maniac” to “shoot yourself now rather than afterwards – save some time”. And that’s with a highly simplified version of the problem.

Read the rest of this entry »

Tags: , , , , , , ,

1 Comment