PHP full-text search engine
May 24, 2006 in PHP, Full-text Search Engine
After intensive research on full-text search engine for PHP, I found the following solutions if I want to use CakePHP as the main underlying framework.
Utilize Java Lucene by using a PHP JAVA Bridge
This is quite challenging. The performance seems promising, but it does look complicated. Basically we will have to develop the indexing with Java and process input/display results with PHP. Now… will I wanna do that?
Pros
- You can use a unported Apache Lucene
- Tons of resources on using Lucene
- Can upgrade Lucene libraries and use it. Rather than waiting for the ported version to be fix
Cons
- Complicated
- Too many technologies to master
Zend Framework integration with CakePHP
The strong point of Zend Framework is the ported Apache Lucene, the zend_search_lucene library. That’s about the only thing I like about Zend Framework, that’s why I might wanna integrate this zend library into CakePHP, this tutorial about using zend lucene looks familiar enough.
Pros
- Those who used Apache Lucene before will feel like home
- Lucene IS the best Java full-text search engine out there
Cons
- Works only in PHP5, CakePHP works on both PHP4 and PHP5. The integration will kill one of CakePHP’s strong point
- Have to learn another framework. We had enough right?
MySQL full-text searching capabilities
MySQL has a built-in full text search feature. This article here has some information about it. But I would need to write MySQL statements rather than using CakePHP ActiveRecord pattern, and the application would be MySQL database-dependent.
Pros
- Should be quite fast, since it is a native MySQL operation
- Can be converted to a plugin for various framework
Cons
- Will be lock down to MySQL
- Building a search engine is not a fun task
Xapian using PHP-Binding
Xapian looks promising as well. I think the performance of Xapian should be quite good as it is programmed with C++. I can’t comment much on this. You guys might wanna check it out and probably help me out.
Solr
I just found this a minute ago. Solr is built based on Lucene by CNET Networks who donated the source code to Apache. The unqiue feature of Solr is the XML/HTTP APIs and it is actually a server! So you can have a Java application… a PHP application or any type of application which is capable of handling XML, all of those application can access Solr. This looks like a good solution, but I am a bit worried about the security of transfering XML between servers. I will keep an extra eye on this one.




