Archive for the 'Full-text Search Engine' Category

PHP full-text search engine

May 24, 2006 in PHP, Full-text Search Engine

After intensive research on full-text search engine for PHP, I found the following solutions if I want to use CakePHP as the main underlying framework.
Utilize Java Lucene by using a PHP JAVA Bridge

This is quite challenging. The performance seems promising, but it does look complicated. Basically we will have to develop the indexing with Java and process input/display results with PHP. Now… will I wanna do that?

Pros

  • You can use a unported Apache Lucene
  • Tons of resources on using Lucene
  • Can upgrade Lucene libraries and use it. Rather than waiting for the ported version to be fix

Cons

  • Complicated
  • Too many technologies to master

Zend Framework integration with CakePHP

The strong point of Zend Framework is the ported Apache Lucene, the zend_search_lucene library. That’s about the only thing I like about Zend Framework, that’s why I might wanna integrate this zend library into CakePHP, this tutorial about using zend lucene looks familiar enough.

Pros

  • Those who used Apache Lucene before will feel like home
  • Lucene IS the best Java full-text search engine out there

Cons

  • Works only in PHP5, CakePHP works on both PHP4 and PHP5. The integration will kill one of CakePHP’s strong point
  • Have to learn another framework. We had enough right?

MySQL full-text searching capabilities

MySQL has a built-in full text search feature. This article here has some information about it. But I would need to write MySQL statements rather than using CakePHP ActiveRecord pattern, and the application would be MySQL database-dependent.

Pros

  • Should be quite fast, since it is a native MySQL operation
  • Can be converted to a plugin for various framework

Cons

  • Will be lock down to MySQL
  • Building a search engine is not a fun task

Xapian using PHP-Binding

Xapian looks promising as well. I think the performance of Xapian should be quite good as it is programmed with C++. I can’t comment much on this. You guys might wanna check it out and probably help me out.

Solr

I just found this a minute ago. Solr is built based on Lucene by CNET Networks who donated the source code to Apache. The unqiue feature of Solr is the XML/HTTP APIs and it is actually a server! So you can have a Java application… a PHP application or any type of application which is capable of handling XML, all of those application can access Solr. This looks like a good solution, but I am a bit worried about the security of transfering XML between servers. I will keep an extra eye on this one.