Elasticsearch

What is Elasticsearch?

Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).

Elasticsearch is a tool in the Search category of a tech stack.

Key Features

Distributed and Highly Available Search EngineMulti Tenant with Multi TypesVarious set of APIs including RESTfulClients available in many languages including Java, Python, .NET, C#, Groovy, and moreDocument orientedReliable, Asynchronous Write Behind for long term persistency(Near) Real Time SearchBuilt on top of Apache LucenePer operation consistencyInverted indices with finite state transducers for full-text queryingBKD trees for storing numeric and geo dataColumn store for analyticsCompatible with Hadoop using the ES-Hadoop connectorOpen Source under Apache 2 and Elastic License

Elasticsearch Discussions

Discover why developers choose Elasticsearch. Read real-world technical decisions and stack choices from the StackShare community.

Sami Jan

Nov 30, 2018

Needs adviceon

Elasticsearch

Algolia

I chose Elasticsearch for my organization's #search stack due to financial and legal regulations but chose Algolia for a hobby e-commerce comparison engine

0 views0

Comments

Patrick Sun

Software Engineer at Stitch Fix

Sep 13, 2018

Needs adviceon

Amazon S3

Elasticsearch

Amazon EC2 Container Service

To load data from our Amazon S3 data warehouse into the Elasticsearch cluster, I developed a Spark application that uses PySpark to extract data from S3, partition, then batch-send each partition to Elasticsearch to increase parallelism. The Spark job enables fielddata: true for text columns with low cardinality to allow sub-aggregations by text columns and prevents data duplication by adding a unique _id field to each row in the dataframe.

The job can then be run by data scientists in Flotilla, an internal data platform tool for running jobs on Amazon EC2 Container Service, with environment variables specifying which schema and table to load.

0 views0

Comments

Patrick Sun

Software Engineer at Stitch Fix

Sep 13, 2018

Needs adviceon

Kibana

Elasticsearch

Elasticsearch's built-in visualization tool, Kibana, is robust and the appropriate tool in many cases. However, it is geared specifically towards log exploration and time-series data, and we felt that its steep learning curve would impede adoption rate among data scientists accustomed to writing SQL. The solution was to create something that would replicate some of Kibana's essential functionality while hiding Elasticsearch's complexity behind SQL-esque labels and terminology ("table" instead of "index", "group by" instead of "sub-aggregation") in the UI.

Elasticsearch's API is really well-suited for aggregating time-series data, indexing arbitrary data without defining a schema, and creating dashboards. For the purpose of a data exploration backend, Elasticsearch fits the bill really well. Users can send an HTTP request with aggregations and sub-aggregations to an index with millions of documents and get a response within seconds, thus allowing them to rapidly iterate through their data.

0 views0

Comments

Patrick Sun

Software Engineer at Stitch Fix

Sep 13, 2018

Needs adviceon

Victory

Apache Spark

React

As a frontend engineer on the Algorithms & Analytics team at Stitch Fix, I work with data scientists to develop applications and visualizations to help our internal business partners make data-driven decisions. I envisioned a platform that would assist data scientists in the data exploration process, allowing them to visually explore and rapidly iterate through their assumptions, then share their insights with others. This would align with our team's philosophy of having engineers "deploy platforms, services, abstractions, and frameworks that allow the data scientists to conceive of, develop, and deploy their ideas with autonomy", and solve the pain of data exploration.

The final product, code-named Dora, is built with React, Redux and Victory, backed by Elasticsearch to enable fast and iterative data exploration, and uses Apache Spark to move data from our Amazon S3 data warehouse into the Elasticsearch cluster.

0 views0

Comments

Tim Specht

‎Co-Founder and CTO at Dubsmash

Sep 13, 2018

Needs adviceon

Elasticsearch

Algolia

Memcached

Although we were using Elasticsearch in the beginning to power our in-app search, we moved this part of our processing over to Algolia a couple of months ago; this has proven to be a fantastic choice, letting us build search-related features with more confidence and speed.

Elasticsearch is only used for searching in internal tooling nowadays; hosting and running it reliably has been a task that took up too much time for us in the past and fine-tuning the results to reach a great user-experience was also never an easy task for us. With Algolia we can flexibly change ranking methods on the fly and can instead focus our time on fine-tuning the experience within our app.

Memcached is used in front of most of the API endpoints to cache responses in order to speed up response times and reduce server-costs on our side.

#SearchAsAService

0 views0

Comments

Elasticsearch Discussions

Discover why developers choose Elasticsearch. Read real-world technical decisions and stack choices from the StackShare community.

Sami Jan

Nov 30, 2018

Needs adviceon

Elasticsearch

Algolia

I chose Elasticsearch for my organization's #search stack due to financial and legal regulations but chose Algolia for a hobby e-commerce comparison engine