Skip to content

Apache Lucene: The Heart Of Search

Search Book

Apache Lucene is a free and open source NoSQL search library providing powerful searching capabilities. How powerful? you ask. It comes with built-in spellchecking, highlighting, tokenization and analysis capabilities. It comes under the Apache Software License, and this means that developers can freely use, modify, distribute and sell a software. Along with such amazing capabilities it is high performance, fast and production ready.

Apache Software Foundation

Apache Lucene is built and maintained in the Java language, but it also has a python wrapper called pyLucene. It manages an index over a dynamic collection of documents. It is built to provide very rapid updates to the index and as a result it has faster document updation. An index may store a different set of documents, with any number of fields that may vary by a document in any number of ways. It indexes the terms, and therefore it performs searches over terms.

A term combines a field name with a token. The terms created from the non-text fields are pairs consisting of the field name and the field value. The terms created from the text fields are pairs of the field name and the token. The Lucene index provides a mapping from terms to documents so It reverse or inversion of the usual mapping. It is therefore known as an “inverted index”. Because of the inverted index, the documents can now be scored in the search results. The document that have more numbers of the search terms pointing towards it and therefore is considered as more relevant. The relevant the documents the higher the ranking.

You can read more on how to integrate it with your projects here. If you want to protect your project from Bots and Scrapers you can read Protect Your Website From Bots and Scrapers.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.