Lucene logo

Lucene

A high-performance, full-featured text search engine library written entirely in Java

What is Lucene?

Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
Lucene is a tool in the Search Engines category of a tech stack.

Who uses Lucene?

Companies
43 companies reportedly use Lucene in their tech stacks, including Twitter, Slack, and Evernote.

Developers
121 developers on StackShare have stated that they use Lucene.

Lucene Integrations

Pros of Lucene
1
Fast
1
Small

Blog Posts

Lucene's Features

  • over 150GB/hour on modern hardware
  • small RAM requirements -- only 1MB heap
  • incremental indexing as fast as batch indexing
  • index size roughly 20-30% the size of text indexed
  • ranked searching -- best results returned first
  • many powerful query types: phrase queries, wildcard queries, proximity queries, range queries
  • fielded searching (e.g. title, author, contents)
  • sorting by any field
  • multiple-index searching with merged results
  • allows simultaneous update and searching
  • flexible faceting, highlighting, joins and result grouping
  • fast, memory-efficient and typo-tolerant suggesters
  • pluggable ranking models, including the Vector Space Model and Okapi BM25
  • configurable storage engine (codecs)

Lucene Alternatives & Comparisons

What are some alternatives to Lucene?
Solr
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.
Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
Sphinx
It lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with it pretty much as with a database server.
Apache Solr
It uses the tools you use to make application building a snap. It is built on the battle-tested Apache Zookeeper, it makes it easy to scale up and down.
Hadoop
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
See all alternatives

Lucene's Followers
229 developers follow Lucene to keep up with related blogs and decisions.