What is Zookeeper and what are its top alternatives?
Top Alternatives to Zookeeper
- Consul
Consul is a tool for service discovery and configuration. Consul is distributed, highly available, and extremely scalable. ...
- etcd
etcd is a distributed key value store that provides a reliable way to store data across a cluster of machines. It’s open-source and available on GitHub. etcd gracefully handles master elections during network partitions and will tolerate machine failure, including the master. ...
- Yarn
Yarn caches every package it downloads so it never needs to again. It also parallelizes operations to maximize resource utilization so install times are faster than ever. ...
- Eureka
Eureka is a REST (Representational State Transfer) based service that is primarily used in the AWS cloud for locating services for the purpose of load balancing and failover of middle-tier servers. ...
- Ambari
This project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. It provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. ...
- Kafka
Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design. ...
- Redis
Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams. ...
- Kubernetes
Kubernetes is an open source orchestration system for Docker containers. It handles scheduling onto nodes in a compute cluster and actively manages workloads to ensure that their state matches the users declared intentions. ...
Zookeeper alternatives & related posts
- Great service discovery infrastructure60
- Health checking35
- Distributed key-value store29
- Monitoring26
- High-availability23
- Web-UI12
- Token-based acls10
- Gossip clustering6
- Dns server5
- Not Java3
- Docker integration1
related Consul posts
As we've evolved or added additional infrastructure to our stack, we've biased towards managed services. Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.
We also use managed Amazon ElastiCache instances instead of spinning up Amazon EC2 instances to run Redis workloads, as well as shifting to Amazon Kinesis instead of Kafka.
Since the beginning, Cal Henderson has been the CTO of Slack. Earlier this year, he commented on a Quora question summarizing their current stack.
Apps- Web: a mix of JavaScript/ES6 and React.
- Desktop: And Electron to ship it as a desktop application.
- Android: a mix of Java and Kotlin.
- iOS: written in a mix of Objective C and Swift.
- The core application and the API written in PHP/Hack that runs on HHVM.
- The data is stored in MySQL using Vitess.
- Caching is done using Memcached and MCRouter.
- The search service takes help from SolrCloud, with various Java services.
- The messaging system uses WebSockets with many services in Java and Go.
- Load balancing is done using HAproxy with Consul for configuration.
- Most services talk to each other over gRPC,
- Some Thrift and JSON-over-HTTP
- Voice and video calling service was built in Elixir.
- Built using open source tools including Presto, Spark, Airflow, Hadoop and Kafka.
- For server configuration and management we use Terraform, Chef and Kubernetes.
- We use Prometheus for time series metrics and ELK for logging.
etcd
- Service discovery11
- Fault tolerant key value store6
- Secure2
- Bundled with coreos2
- Consol integration1
- Privilege Access Management1
- Open Source1
related etcd posts
- Incredibly fast85
- Easy to use22
- Open Source13
- Can install any npm package11
- Works where npm fails8
- Workspaces7
- Incomplete to run tasks3
- Fast2
- 16
- Sends data to facebook7
- Should be installed separately4
- Cannot publish to registry other than npm3
related Yarn posts
Our whole Node.js backend stack consists of the following tools:
- Lerna as a tool for multi package and multi repository management
- npm as package manager
- NestJS as Node.js framework
- TypeScript as programming language
- ExpressJS as web server
- Swagger UI for visualizing and interacting with the API’s resources
- Postman as a tool for API development
- TypeORM as object relational mapping layer
- JSON Web Token for access token management
The main reason we have chosen Node.js over PHP is related to the following artifacts:
- Made for the web and widely in use: Node.js is a software platform for developing server-side network services. Well-known projects that rely on Node.js include the blogging software Ghost, the project management tool Trello and the operating system WebOS. Node.js requires the JavaScript runtime environment V8, which was specially developed by Google for the popular Chrome browser. This guarantees a very resource-saving architecture, which qualifies Node.js especially for the operation of a web server. Ryan Dahl, the developer of Node.js, released the first stable version on May 27, 2009. He developed Node.js out of dissatisfaction with the possibilities that JavaScript offered at the time. The basic functionality of Node.js has been mapped with JavaScript since the first version, which can be expanded with a large number of different modules. The current package managers (npm or Yarn) for Node.js know more than 1,000,000 of these modules.
- Fast server-side solutions: Node.js adopts the JavaScript "event-loop" to create non-blocking I/O applications that conveniently serve simultaneous events. With the standard available asynchronous processing within JavaScript/TypeScript, highly scalable, server-side solutions can be realized. The efficient use of the CPU and the RAM is maximized and more simultaneous requests can be processed than with conventional multi-thread servers.
- A language along the entire stack: Widely used frameworks such as React or AngularJS or Vue.js, which we prefer, are written in JavaScript/TypeScript. If Node.js is now used on the server side, you can use all the advantages of a uniform script language throughout the entire application development. The same language in the back- and frontend simplifies the maintenance of the application and also the coordination within the development team.
- Flexibility: Node.js sets very few strict dependencies, rules and guidelines and thus grants a high degree of flexibility in application development. There are no strict conventions so that the appropriate architecture, design structures, modules and features can be freely selected for the development.
So when starting a new project you generally have your go to tools to get your site up and running locally, and some scripts to build out a production version of your site. Create React App is great for that, however for my projects I feel as though there is to much bloat in Create React App and if I use it, then I'm tied to React, which I love but if I want to switch it up to Vue or something I want that flexibility.
So to start everything up and running I clone my personal Webpack boilerplate - This is still in Webpack 3, and does need some updating but gets the job done for now. So given the name of the repo you may have guessed that yes I am using Webpack as my bundler I use Webpack because it is so powerful, and even though it has a steep learning curve once you get it, its amazing.
The next thing I do is make sure my machine has Node.js configured and the right version installed then run Yarn. I decided to use Yarn because when I was building out this project npm had some shortcomings such as no .lock
file. I could probably move from Yarn to npm but I don't really see any point really.
I use Babel to transpile all of my #ES6 to #ES5 so the browser can read it, I love Babel and to be honest haven't looked up any other transpilers because Babel is amazing.
Finally when developing I have Prettier setup to make sure all my code is clean and uniform across all my JS files, and ESLint to make sure I catch any errors or code that could be optimized.
I'm really happy with this stack for my local env setup, and I'll probably stick with it for a while.
- Easy setup and integration with spring-cloud21
- Web ui9
- Monitoring8
- Health checking8
- Circuit breaker7
- Netflix battle tested components6
- Service discovery6
- Open Source4
related Eureka posts
- Ease of use1
related Ambari posts
- High-throughput126
- Distributed119
- Scalable92
- High-Performance86
- Durable66
- Publish-Subscribe38
- Simple-to-use19
- Open source18
- Written in Scala and java. Runs on JVM12
- Message broker + Streaming system8
- Robust4
- KSQL4
- Avro schema integration4
- Suport Multiple clients3
- Partioned, replayable log2
- Flexible1
- Extremely good parallelism constructs1
- Fun1
- Simple publisher / multi-subscriber model1
- Non-Java clients are second-class citizens32
- Needs Zookeeper29
- Operational difficulties9
- Terrible Packaging4
related Kafka posts
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
As we've evolved or added additional infrastructure to our stack, we've biased towards managed services. Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.
We also use managed Amazon ElastiCache instances instead of spinning up Amazon EC2 instances to run Redis workloads, as well as shifting to Amazon Kinesis instead of Kafka.
- Performance884
- Super fast541
- Ease of use512
- In-memory cache443
- Advanced key-value cache323
- Open source193
- Easy to deploy182
- Stable164
- Free155
- Fast121
- High-Performance42
- High Availability40
- Data Structures34
- Very Scalable32
- Replication24
- Pub/Sub22
- Great community22
- "NoSQL" key-value data store19
- Hashes15
- Sets13
- Sorted Sets11
- Lists10
- BSD licensed9
- NoSQL9
- Integrates super easy with Sidekiq for Rails background8
- Async replication8
- Bitmaps8
- Open Source7
- Keys with a limited time-to-live7
- Lua scripting6
- Strings6
- Awesomeness for Free5
- Hyperloglogs5
- Written in ANSI C4
- LRU eviction of keys4
- Networked4
- Outstanding performance4
- Runs server side LUA4
- Transactions4
- Feature Rich4
- Performance & ease of use3
- Data structure server3
- Object [key/value] size each 500 MB2
- Simple2
- Scalable2
- Temporarily kept on disk2
- Dont save data if no subscribers are found2
- Automatic failover2
- Easy to use2
- Existing Laravel Integration2
- Channels concept2
- Cannot query objects directly15
- No secondary indexes for non-numeric data types3
- No WAL1
related Redis posts
We use MongoDB as our primary #datastore. Mongo's approach to replica sets enables some fantastic patterns for operations like maintenance, backups, and #ETL.
As we pull #microservices from our #monolith, we are taking the opportunity to build them with their own datastores using PostgreSQL. We also use Redis to cache data we’d never store permanently, and to rate-limit our requests to partners’ APIs (like GitHub).
When we’re dealing with large blobs of immutable data (logs, artifacts, and test results), we store them in Amazon S3. We handle any side-effects of S3’s eventual consistency model within our own code. This ensures that we deal with user requests correctly while writes are in process.
I'm working as one of the engineering leads in RunaHR. As our platform is a Saas, we thought It'd be good to have an API (We chose Ruby and Rails for this) and a SPA (built with React and Redux ) connected. We started the SPA with Create React App since It's pretty easy to start.
We use Jest as the testing framework and react-testing-library to test React components. In Rails we make tests using RSpec.
Our main database is PostgreSQL, but we also use MongoDB to store some type of data. We started to use Redis for cache and other time sensitive operations.
We have a couple of extra projects: One is an Employee app built with React Native and the other is an internal back office dashboard built with Next.js for the client and Python in the backend side.
Since we have different frontend apps we have found useful to have Bit to document visual components and utils in JavaScript.
Kubernetes
- Leading docker container management solution164
- Simple and powerful128
- Open source106
- Backed by google76
- The right abstractions58
- Scale services25
- Replication controller20
- Permission managment11
- Cheap8
- Supports autoscaling8
- Simple8
- No cloud platform lock-in5
- Reliable5
- Self-healing5
- Quick cloud setup4
- Promotes modern/good infrascture practice4
- Scalable4
- Open, powerful, stable4
- Runs on azure3
- Captain of Container Ship3
- Cloud Agnostic3
- Custom and extensibility3
- Backed by Red Hat3
- A self healing environment with rich metadata3
- Gke2
- Everything of CaaS2
- Sfg2
- Expandable2
- Golang2
- Easy setup2
- Poor workflow for development15
- Steep learning curve15
- Orchestrates only infrastructure8
- High resource requirements for on-prem clusters4
- Too heavy for simple systems2
- Additional vendor lock-in (Docker)1
- More moving parts to secure1
- Additional Technology Overhead1
related Kubernetes posts
How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:
Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.
Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:
https://eng.uber.com/distributed-tracing/
(GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)
Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark
Our first experience with .NET core was when we developed our OSS feature management platform - Tweek (https://github.com/soluto/tweek). We wanted to create a solution that is able to run anywhere (super important for OSS), has excellent performance characteristics and can fit in a multi-container architecture. We decided to implement our rule engine processor in F# , our main service was implemented in C# and other components were built using JavaScript / TypeScript and Go.
Visual Studio Code worked really well for us as well, it worked well with all our polyglot services and the .Net core integration had great cross-platform developer experience (to be fair, F# was a bit trickier) - actually, each of our team members used a different OS (Ubuntu, macos, windows). Our production deployment ran for a time on Docker Swarm until we've decided to adopt Kubernetes with almost seamless migration process.
After our positive experience of running .Net core workloads in containers and developing Tweek's .Net services on non-windows machines, C# had gained back some of its popularity (originally lost to Node.js), and other teams have been using it for developing microservices, k8s sidecars (like https://github.com/Soluto/airbag), cli tools, serverless functions and other projects...