Posts Tagged search engine

Writing a Search Engine – Part 3

Note: this article is a continuation of previous articles on search engines (part 1, part 2).

After testing out an alpha version of my search engine for a while, I realized that its greatest flaw (other than printing out the results in a downright ugly format) was that it couldn’t recognize “programs” as a variant of the word “program”. I briefly considered programming it to automatically check for a limited set of common variants, but I decided that this was probably too much effort for what would be a decidedly low-quality result. I needed to find a list of, for every word, all of its variants.

What I was looking for (I discovered after about 45 minutes of IRC chat, google, and man pages) was the ispell english dictionary. ISpell dictionaries have a list of ‘roots’, and then, for every root, they have a list of flags that describe how that root can be transformed to form valid words. I could enter this information, via perl script, into a mysql table, and then retrieve it quickly both while indexing and while searching.

Read the rest of this entry »

Tags: , , , , ,

No Comments

Writing a Search Engine – Part 2

Note: this article is a continuation of a previous article on search engines, and has been continued with part 3.

After a bit over a week coding (and learning various Perl libraries), I have completed stages 1 and 2 of the search, although stage 2, the indexer, could do with a little improvement. Both are written in perl, and as usual, the complete code listings are below. I decided to write the entire spider and indexer in perl and optimize as necessary later on, so that I could get done with the thing and not get bogged down in C code. If the perl turns out not to be fast enough as the site grows, then I plan to port to C. Likewise, the actual search part (stage 3) will be written in PHP to save time. If the PHP is not fast enough, I’ll rewrite it in C – but I expect there to be no problems.

Read the rest of this entry »

Tags: , , , , , , ,

No Comments

Writing a Search Engine – Part 1

I decided that the website had grown to the point where it needed a search engine. I didn’t want to use a google search or an embedded yahoo search – they look disgusting. I also didn’t want to use any of the third party searching scripts, since most of them were costly and all of them had commercial licenses. I like free software. So I set out to write my own.

General Considerations

Let me start off by saying that what I have below is not a magic, easy solution to writing a search engine. If you are planning to write the world’s “next google”, I have a recommendation: go to http://bing.com – Microsoft’s “next google”. Notice how “copycatted” it looks. Then search around (on google, please) to find out exactly how popular it is. Hint: not very. Microsoft tried and failed. Don’t waste your time. My problem is to build an internal search engine, which only needs to deal with a small number of pages, and is low traffic so it doesn’t have to be super fast. When I told my co-working friends about the project, the responses I got varied from “maniac” to “shoot yourself now rather than afterwards – save some time”. And that’s with a highly simplified version of the problem.

Read the rest of this entry »

Tags: , , , , , , ,

1 Comment