I have an integration service that pulls data from third party systems saves it and returns it to the user of the service. We can pull large data sets with the service and response JSON can go up to 5MB with gzip compression. I currently use Rails 6 and Ruby 2.7.2 and Puma web server. Slow clients tend to prevent other users from accessing the system. Am considering a switch to Unicorn.
Consider trying to use puma workers first.
puma -w basically. That will launch multiple puma processes to manage the requests, like unicorn, but also run threads within those processes. You can turn the number of workers and number of threads to find the right memory footprint / request per second balance.