Alternatives to Spring Batch logo

Alternatives to Spring Batch

Hadoop, Talend, Spring Boot, Apache Spark, and Kafka are the most popular alternatives and competitors to Spring Batch.
130
159
+ 1
0

What is Spring Batch and what are its top alternatives?

It is designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. It also provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.
Spring Batch is a tool in the Frameworks (Full Stack) category of a tech stack.
Spring Batch is an open source tool with 1.9K GitHub stars and 1.9K GitHub forks. Here’s a link to Spring Batch's open source repository on GitHub

Top Alternatives to Spring Batch

  • Hadoop

    Hadoop

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ...

  • Talend

    Talend

    It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms. ...

  • Spring Boot

    Spring Boot

    Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can "just run". We take an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need very little Spring configuration. ...

  • Apache Spark

    Apache Spark

    Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. ...

  • Kafka

    Kafka

    Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design. ...

  • AWS Batch

    AWS Batch

    It enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. It dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. ...

  • Node.js

    Node.js

    Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices. ...

  • Django

    Django

    Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. ...

Spring Batch alternatives & related posts

Hadoop logo

Hadoop

2K
2K
55
Open-source software for reliable, scalable, distributed computing
2K
2K
+ 1
55
PROS OF HADOOP
  • 38
    Great ecosystem
  • 11
    One stack to rule them all
  • 4
    Great load balancer
  • 1
    Amazon aws
  • 1
    Java syntax
CONS OF HADOOP
    Be the first to leave a con

    related Hadoop posts

    Conor Myhrvold
    Tech Brand Mgr, Office of CTO at Uber · | 7 upvotes · 990.8K views

    Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

    Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

    https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

    (Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

    See more
    Shared insights
    on
    Kafka
    Hadoop
    at

    The early data ingestion pipeline at Pinterest used Kafka as the central message transporter, with the app servers writing messages directly to Kafka, which then uploaded log files to S3.

    For databases, a custom Hadoop streamer pulled database data and wrote it to S3.

    Challenges cited for this infrastructure included high operational overhead, as well as potential data loss occurring when Kafka broker outages led to an overflow of in-memory message buffering.

    See more
    Talend logo

    Talend

    110
    171
    0
    A single, unified suite for all integration needs
    110
    171
    + 1
    0
    PROS OF TALEND
      Be the first to leave a pro
      CONS OF TALEND
        Be the first to leave a con

        related Talend posts

        Spring Boot logo

        Spring Boot

        16.1K
        14K
        910
        Create Spring-powered, production-grade applications and services with absolute minimum fuss
        16.1K
        14K
        + 1
        910
        PROS OF SPRING BOOT
        • 134
          Powerful and handy
        • 127
          Easy setup
        • 118
          Java
        • 85
          Spring
        • 82
          Fast
        • 42
          Extensible
        • 34
          Lots of "off the shelf" functionalities
        • 29
          Cloud Solid
        • 23
          Caches well
        • 21
          Many receipes around for obscure features
        • 20
          Modular
        • 20
          Productive
        • 19
          Integrations with most other Java frameworks
        • 18
          Spring ecosystem is great
        • 18
          Fast Performance With Microservices
        • 16
          Community
        • 16
          Auto-configuration
        • 13
          Easy setup, Community Support, Solid for ERP apps
        • 13
          One-stop shop
        • 12
          Cross-platform
        • 12
          Easy to parallelize
        • 11
          Powerful 3rd party libraries and frameworks
        • 11
          Easy setup, good for build erp systems, well documented
        • 10
          Easy setup, Git Integration
        • 3
          It's so easier to start a project on spring
        • 3
          Kotlin
        CONS OF SPRING BOOT
        • 18
          Heavy weight
        • 17
          Annotation ceremony
        • 10
          Many config files needed
        • 8
          Java
        • 5
          Reactive
        • 4
          Excellent tools for cloud hosting, since 5.x

        related Spring Boot posts

        Is learning Spring and Spring Boot for web apps back-end development is still relevant in 2021? Feel free to share your views with comparison to Django/Node.js/ ExpressJS or other frameworks.

        Please share some good beginner resources to start learning about spring/spring boot framework to build the web apps.

        See more
        Praveen Mooli
        Engineering Manager at Taylor and Francis · | 14 upvotes · 1.9M views

        We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.

        To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas

        To build #Webapps we decided to use Angular 2 with RxJS

        #Devops - GitHub , Travis CI , Terraform , Docker , Serverless

        See more
        Apache Spark logo

        Apache Spark

        2.3K
        2.7K
        132
        Fast and general engine for large-scale data processing
        2.3K
        2.7K
        + 1
        132
        PROS OF APACHE SPARK
        • 58
          Open-source
        • 48
          Fast and Flexible
        • 7
          One platform for every big data problem
        • 6
          Easy to install and to use
        • 6
          Great for distributed SQL like applications
        • 3
          Works well for most Datascience usecases
        • 2
          Machine learning libratimery, Streaming in real
        • 2
          In memory Computation
        • 0
          Interactive Query
        CONS OF APACHE SPARK
        • 3
          Speed

        related Apache Spark posts

        Eric Colson
        Chief Algorithms Officer at Stitch Fix · | 21 upvotes · 2M views

        The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

        Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

        At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

        For more info:

        #DataScience #DataStack #Data

        See more
        Conor Myhrvold
        Tech Brand Mgr, Office of CTO at Uber · | 7 upvotes · 990.8K views

        Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

        Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

        https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

        (Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

        See more
        Kafka logo

        Kafka

        14.8K
        13.9K
        562
        Distributed, fault tolerant, high throughput pub-sub messaging system
        14.8K
        13.9K
        + 1
        562
        PROS OF KAFKA
        • 120
          High-throughput
        • 114
          Distributed
        • 86
          Scalable
        • 79
          High-Performance
        • 64
          Durable
        • 35
          Publish-Subscribe
        • 18
          Simple-to-use
        • 14
          Open source
        • 10
          Written in Scala and java. Runs on JVM
        • 6
          Message broker + Streaming system
        • 4
          Avro schema integration
        • 2
          Suport Multiple clients
        • 2
          Robust
        • 2
          KSQL
        • 2
          Partioned, replayable log
        • 1
          Fun
        • 1
          Extremely good parallelism constructs
        • 1
          Simple publisher / multi-subscriber model
        • 1
          Flexible
        CONS OF KAFKA
        • 27
          Non-Java clients are second-class citizens
        • 26
          Needs Zookeeper
        • 7
          Operational difficulties
        • 2
          Terrible Packaging

        related Kafka posts

        Eric Colson
        Chief Algorithms Officer at Stitch Fix · | 21 upvotes · 2M views

        The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

        Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

        At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

        For more info:

        #DataScience #DataStack #Data

        See more
        John Kodumal

        As we've evolved or added additional infrastructure to our stack, we've biased towards managed services. Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.

        We also use managed Amazon ElastiCache instances instead of spinning up Amazon EC2 instances to run Redis workloads, as well as shifting to Amazon Kinesis instead of Kafka.

        See more
        AWS Batch logo

        AWS Batch

        65
        184
        4
        Fully Managed Batch Processing at Any Scale
        65
        184
        + 1
        4
        PROS OF AWS BATCH
        • 2
          Scalable
        • 2
          Containerized
        CONS OF AWS BATCH
        • 1
          More overhead than lambda
        • 1
          Image management

        related AWS Batch posts

        Node.js logo

        Node.js

        116.9K
        95.8K
        8.4K
        A platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications
        116.9K
        95.8K
        + 1
        8.4K
        PROS OF NODE.JS
        • 1.4K
          Npm
        • 1.3K
          Javascript
        • 1.1K
          Great libraries
        • 1K
          High-performance
        • 794
          Open source
        • 482
          Great for apis
        • 473
          Asynchronous
        • 419
          Great community
        • 388
          Great for realtime apps
        • 294
          Great for command line utilities
        • 80
          Node Modules
        • 78
          Websockets
        • 67
          Uber Simple
        • 55
          Great modularity
        • 55
          Allows us to reuse code in the frontend
        • 39
          Easy to start
        • 34
          Great for Data Streaming
        • 30
          Realtime
        • 25
          Awesome
        • 23
          Non blocking IO
        • 17
          Can be used as a proxy
        • 15
          High performance, open source, scalable
        • 14
          Non-blocking and modular
        • 13
          Easy and Fun
        • 12
          Same lang as AngularJS
        • 11
          Easy and powerful
        • 10
          Future of BackEnd
        • 9
          Fast
        • 8
          Cross platform
        • 8
          Fullstack
        • 8
          Scalability
        • 7
          Mean Stack
        • 7
          Simple
        • 5
          React
        • 5
          Great for webapps
        • 5
          Easy concurrency
        • 4
          Friendly
        • 4
          Easy to use and fast and goes well with JSONdb's
        • 4
          Typescript
        • 4
          Fast, simple code and async
        • 3
          Fast development
        • 3
          Great speed
        • 3
          Its amazingly fast and scalable
        • 3
          Scalable
        • 3
          Isomorphic coolness
        • 3
          Control everything
        • 2
          Less boilerplate code
        • 2
          It's fast
        • 2
          Blazing fast
        • 2
          Scales, fast, simple, great community, npm, express
        • 2
          Not Python
        • 2
          TypeScript Support
        • 2
          Easy to learn
        • 2
          Easy to use
        • 2
          Javascript2
        • 2
          Great community
        • 2
          One language, end-to-end
        • 2
          Easy
        • 2
          Performant and fast prototyping
        • 2
          Sooper easy for the Backend connectivity
        • 1
          Lovely
        • 0
          Event Driven
        CONS OF NODE.JS
        • 46
          Bound to a single CPU
        • 40
          New framework every day
        • 34
          Lots of terrible examples on the internet
        • 28
          Asynchronous programming is the worst
        • 22
          Callback
        • 16
          Javascript
        • 11
          Dependency based on GitHub
        • 10
          Dependency hell
        • 10
          Low computational power
        • 7
          Can block whole server easily
        • 6
          Very very Slow
        • 6
          Callback functions may not fire on expected sequence
        • 3
          Unneeded over complication
        • 3
          Unstable
        • 3
          Breaking updates
        • 1
          No standard approach

        related Node.js posts

        Nick Rockwell
        SVP, Engineering at Fastly · | 44 upvotes · 1.6M views

        When I joined NYT there was already broad dissatisfaction with the LAMP (Linux Apache HTTP Server MySQL PHP) Stack and the front end framework, in particular. So, I wasn't passing judgment on it. I mean, LAMP's fine, you can do good work in LAMP. It's a little dated at this point, but it's not ... I didn't want to rip it out for its own sake, but everyone else was like, "We don't like this, it's really inflexible." And I remember from being outside the company when that was called MIT FIVE when it had launched. And been observing it from the outside, and I was like, you guys took so long to do that and you did it so carefully, and yet you're not happy with your decisions. Why is that? That was more the impetus. If we're going to do this again, how are we going to do it in a way that we're gonna get a better result?

        So we're moving quickly away from LAMP, I would say. So, right now, the new front end is React based and using Apollo. And we've been in a long, protracted, gradual rollout of the core experiences.

        React is now talking to GraphQL as a primary API. There's a Node.js back end, to the front end, which is mainly for server-side rendering, as well.

        Behind there, the main repository for the GraphQL server is a big table repository, that we call Bodega because it's a convenience store. And that reads off of a Kafka pipeline.

        See more
        Conor Myhrvold
        Tech Brand Mgr, Office of CTO at Uber · | 38 upvotes · 4M views

        How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:

        Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.

        Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:

        https://eng.uber.com/distributed-tracing/

        (GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)

        Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark

        See more
        Django logo

        Django

        25.2K
        22K
        3.7K
        The Web framework for perfectionists with deadlines
        25.2K
        22K
        + 1
        3.7K
        PROS OF DJANGO
        • 626
          Rapid development
        • 465
          Open source
        • 398
          Great community
        • 349
          Easy to learn
        • 261
          Mvc
        • 212
          Beautiful code
        • 209
          Elegant
        • 190
          Free
        • 189
          Great packages
        • 175
          Great libraries
        • 65
          Restful
        • 63
          Comes with auth and crud admin panel
        • 63
          Powerful
        • 57
          Great documentation
        • 55
          Great for web
        • 41
          Python
        • 36
          Great orm
        • 32
          Great for api
        • 25
          All included
        • 20
          Web Apps
        • 19
          Fast
        • 16
          Used by top startups
        • 14
          Clean
        • 13
          Sexy
        • 12
          Easy setup
        • 10
          Convention over configuration
        • 7
          ORM
        • 7
          The Django community
        • 7
          Allows for very rapid development with great libraries
        • 5
          Its elegant and practical
        • 5
          Great MVC and templating engine
        • 4
          Fast prototyping
        • 4
          Easy Structure , useful inbuilt library
        • 4
          King of backend world
        • 4
          Have not found anything that it can't do
        • 4
          Mvt
        • 4
          Easy to develop end to end AI Models
        • 4
          Full stack
        • 3
          Easy
        • 3
          Easy to use
        • 3
          Cross-Platform
        • 3
          Batteries included
        • 2
          Scaffold
        • 2
          Many libraries
        • 2
          Python community
        • 2
          Just the right level of abstraction
        • 2
          Great peformance
        • 2
          Zero code burden to change databases
        • 2
          Full-Text Search
        • 2
          Map
        • 2
          Modular
        • 2
          Very quick to get something up and running
        • 1
          Easy to change database manager
        • 1
          Test
        • 0
          Node js
        CONS OF DJANGO
        • 25
          Underpowered templating
        • 19
          Underpowered ORM
        • 18
          Autoreload restarts whole server
        • 15
          URL dispatcher ignores HTTP method
        • 10
          Internal subcomponents coupling
        • 7
          Not nodejs
        • 7
          Admin
        • 6
          Configuration hell
        • 3
          Python
        • 3
          Not as clean and nice documentation like Laravel
        • 3
          Bloated admin panel included
        • 3
          Not typed
        • 2
          Overwhelming folder structure
        • 1
          InEffective Multithreading

        related Django posts

        Dmitry Mukhin

        Simple controls over complex technologies, as we put it, wouldn't be possible without neat UIs for our user areas including start page, dashboard, settings, and docs.

        Initially, there was Django. Back in 2011, considering our Python-centric approach, that was the best choice. Later, we realized we needed to iterate on our website more quickly. And this led us to detaching Django from our front end. That was when we decided to build an SPA.

        For building user interfaces, we're currently using React as it provided the fastest rendering back when we were building our toolkit. It’s worth mentioning Uploadcare is not a front-end-focused SPA: we aren’t running at high levels of complexity. If it were, we’d go with Ember.js.

        However, there's a chance we will shift to the faster Preact, with its motto of using as little code as possible, and because it makes more use of browser APIs. One of our future tasks for our front end is to configure our Webpack bundler to split up the code for different site sections. For styles, we use PostCSS along with its plugins such as cssnano which minifies all the code.

        All that allows us to provide a great user experience and quickly implement changes where they are needed with as little code as possible.

        See more

        Hey, so I developed a basic application with Python. But to use it, you need a python interpreter. I want to add a GUI to make it more appealing. What should I choose to develop a GUI? I have very basic skills in front end development (CSS, JavaScript). I am fluent in python. I'm looking for a tool that is easy to use and doesn't require too much code knowledge. I have recently tried out Flask, but it is kinda complicated. Should I stick with it, move to Django, or is there another nice framework to use?

        See more