Constellio review

Constellio is an open source search engine based on Apache Lucene/Solr. I really like Constellio. The search results are relevant, the admin system works great and they support several search filters to interact with the search results. Unfortunately I am experiencing a bug that’s files up the disk every 3-4 days making the application crash. This make me question how mature the system really is.

You can try a demo of Constellio online here.

Search result page

Constellio has a classic looking search result page. Constellio also supports several filters, you can use to restrict your search documents based on document language, tags, document type and last modified date.

Constellio search result page

Hits has a title or url at the top, three lines of hillited text extract and then the url at the bottom.

Constellio snippet

Constellio automatically categorize documents to help you navigate of unstructured content.

Constellio tag function

Administration

Constellio has a good web based administration interface.

Overview of collections

Constellio admin gui

Setting up a new crawl

Constellio admin gui

Status for crawling

Constellio has a cool status function that show you the latest crawled and indexed pages in real time.

Constellio admin gui

Installation notes

I installed it on a wm client with 8 GB disk space. It did run out of space after 12 182 files crawled. The web gui reports “INDEX SIZE ON DISK : 208.77 MB”, but uses 4,9 GB ( 1.7G in constellio/tomcat/temp and 3.2G in constellio/tomcat/webapps ). Did try to delete all the indexed files by cliking the delete all button for that collection, but that didn’t do anything, except logging me out of the admin gui.

I also experienced several timeouts, and the crawler wold stop to crawl at about 20 000 documents.

After conacting Constellio support i found out that Constellio can have some issues with large data sets when one uses the default configuration. I was advised to do he following changes to resolve the problems with timeouts and crawler stopping:

1) Change the database engine. As default Constellio uses the Derby database, but can be configured to use MySQL insted.
2) Increase the  maximum heap size the Java virtual machine can use from 1024mb to 2048mb by setting -Xmx2048m in the start_constellio.sh file .

After thus changes I was able to crawl the test sets successfully.

Unfortunately constellio/tomcat/temp is still being filled up with temporary files. This will eventually filled up the whole disk, and make Constellio stop responding, making it necessary to manually remove the temporarily files and restart Constellio ( sometime the whole server, thus a lot of Linux services don’t handle the that the disk is full either ).

5 thoughts on “Constellio review

Leave a Reply

Your email address will not be published. Required fields are marked *