Hey there! We are looking at Datadog, Dynatrace, AppDynamics, and New Relic as options for our web application monitoring.
Current Environment: .NET Core Web app hosted on Microsoft IIS
Future Environment: Web app will be hosted on Microsoft Azure
Tech Stacks: IIS, RabbitMQ, Redis, Microsoft SQL Server
Requirement: Infra Monitoring, APM, Real - User Monitoring (User activity monitoring i.e., time spent on a page, most active page, etc.), Service Tracing, Root Cause Analysis, and Centralized Log Management.
Please advise on the above. Thanks!
We are looking for a centralised monitoring solution for our application deployed on Amazon EKS. We would like to monitor using metrics from Kubernetes, AWS services (NeptuneDB, AWS Elastic Load Balancing (ELB), Amazon EBS, Amazon S3, etc) and application microservice's custom metrics.
We are expected to use around 80 microservices (not replicas). I think a total of 200-250 microservices will be there in the system with 10-12 slave nodes.
We tried Prometheus but it looks like maintenance is a big issue. We need to manage scaling, maintaining the storage, and dealing with multiple exporters and Grafana. I felt this itself needs few dedicated resources (at least 2-3 people) to manage. Not sure if I am thinking in the correct direction. Please confirm.
You mentioned Datadog and Sysdig charges per host. Does it charge per slave node?
Can't say anything to Sysdig. I clearly prefer Datadog as
- they provide plenty of easy to "switch-on" plugins for various technologies (incl. most of AWS)
- easy to code (python) agent plugins / api for own metrics
- brillant dashboarding / alarms with many customization options
- pricing is OK, there are cheaper options for specific use cases but if you want superior dashboarding / alarms I haven't seen a good competitor (despite your own Prometheus / Grafana / Kibana dog food)
IMHO NewRelic is "promising since years" ;) good ideas but bad integration between their products. Their Dashboard query language is really nice but lacks critical functions like multiple data sets or advanced calculations. Needless to say you get all of that with Datadog.
Need help setting up a monitoring / logging / alarm infrastructure? Send me a message!
you are right. Building based on your stack something with open source is heavy lifting. A lot of people I know start with such a set-up, but quickly run into frustration as they need to dedicated their best people to build a monitoring which is doing the job in a professional way.
As you are microservice focussed and are looking for 'low implementation and maintenance effort', you might want to have a look at INSTANA, which was built with modern tool stacks in mind. https://www.instana.com/apm-for-microservices/
We have a public sand-box available if you just want to have a look at the product once and of course also a free-trial: https://www.instana.com/getting-started-with-apm/
Let me know if you need anything on top.
I have hands on production experience both with New Relic and Datadog. I personally prefer Datadog over NewRelic because of the UI, the Documentation and the overall user/developer experience.
NewRelic however, can do basically the same things as Datadog can, and some of the features like alerting have been present in NewRelic for longer than in Datadog. The cool thing about NewRelic is their last-summer-updated pricing: you no longer pay per host but after data you send towards New Relic. This can be a huge cost saver depending on your particular setup
I'd go for Datadog, but given you have lots of containers I would also make a cost calculation. If the price difference is significant and there's a budget constraint NewRelic might be the better choice.
Coming from a Ruby background, we've been users of New Relic for quite some time. When we adopted Elixir, the New Relic integration was young and missing essential features, so we gave AppSignal a try. It worked for quite some time, we even implemented a
I haven't heard much about Datadog until about a year ago. Ironically, the NewRelic sales person who I had a series of trainings with was trash talking about Datadog a lot. That drew my attention to Datadog and I gave it a try at another client project where we needed log handling, dashboards and alerting.
In 2019, Datadog was already offering log management and from that perspective, it was ahead of NewRelic. Other than that, from my perspective, the two tools are offering a very-very similar set of tools. Therefore I wouldn't say there's a significant difference between the two, the decision is likely a matter of taste. The pricing is also very similar.
The reasons why we chose Datadog over NewRelic were:
- The presence of log handling feature (since then, logging is GA at NewRelic as well since falls 2019).
- The setup was easier even though I already had experience with NewRelic, including participation in NewRelic trainings.
- The UI of Datadog is more compact and my experience is smoother.
- The NewRelic UI is very fragmented and New Relic One is just increasing this experience for me.
- The log feature of Datadog is very well designed, I find very useful the tagging logs with services. The log filtering is also very awesome.
Bottom line is that both tools are great and it makes sense to discover both and making the decision based on your use case. In our case, Datadog was the clear winner due to its UI, ease of setup and the awesome logging and alerting features.
I chose Datadog APM because the much better APM insights it provides (flamegraph, percentiles by default).
The drawbacks of this decision are we had to move our production monitoring to TimescaleDB + Telegraf instead of NR Insight
NewRelic is definitely easier when starting out. Agent is only a lib and doesn't require a daemon
Sign up to add or upvote prosMake informed product decisions
Sign up to add or upvote consMake informed product decisions
What is Datadog?
What is Librato?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
We're a real-time financial services messaging company, so being able to monitor our servers and applications in real-time is important to us. We also like a good deal, so $15/server seemed a bargain.
What were we looking for?
We wanted to monitor our MS infrastructure (servers, SQL) and apps (C#) to understand performance issues and be able to rectify. We also want to be able to do long-term trending. And we wanted to go from nothing to live in a short time.
Installing the Datadog agent on the servers was a breeze and enabling the integrations for SQL and Windows trivial.
Using the StatsD based API was also very easy - no worrying about JSON or UDP calls. The ability to add tags to all metrics is also a key benefit. We run multiple (100+) instances of a single application and being able to distinguish events from each one via tagging, or to see aggregates, is extremely useful.
In all it took 2 days R&D to instrument our key applications sufficiently for production deployment. Deploying the agent to our production servers took 30 mins, giving our Ops team complete visibility for the 1st time.
What have we learned
Since we've been live Datadog has given us numerous insights into the way our system behaves, from uneven server loadings and sporadic memory usage to performance tuning a key application that resulted in a 50% increase in throughput. Knowing what's taking the time has been a boon.
The other nice surprise has been the evolving nature of Datadog. It seems like every couple of weeks there's a new feature on the site.
- I like the transparent pricing. Services that won't show me the price without having to talk to a sales person are really annoying.
- Support has been good. We've contacted them several times with questions and always had a quick response (time zone considered...we're in London) and a helpful answer.
So What's bad?
Probably the weakest aspect at the moment is the long term trending of data. Whilst you can wind the time bar back to see what happened last week you can't ask questions like "show me the peak period each day for the last x months". The "get data" API is also fairly weak. Neither are concerns at the moment, and I'm sure they're on the to-do list.
I've been a systems administrator most of my career. Everywhere I went, I'd have to rebuild the same monitoring + graphing system. And then make sure that every machine wrote to that system and every application handed up the proper metrics through whatever mechanism seemed good at the time.
Then, as CTO of SimpleReach, single-handedly managing over 200 servers in addition to everything else, I found Datadog. We were already using statsd to instrument our applications, now it was just a matter of getting that data to Datadog. We use Chef, so I installed the Datadog agent on every machine in about 10 minutes and we were up and running.
The best part was that we had a deploy problem the next day with one of our main applications and troubleshooting took minutes instead of hours (and Datadog immediately paid for itself). Now no new features go out without instrumentation and no machine gets created without being monitored.
Datadog just scales with us. Great service and I highly recommend it to anyone not looking to reinvent the wheel with monitoring and instrumentation.
Datadog makes running a service with 800,000 unique users a month possible as a single developer/maintainer. I bought a separate monitor just to keep my datadog dashboards always visible and rely on triggers to keep watch over 20+ servers.
We use datadog to monitor our servers and some application metrics. Easy to get started and scale to many servers. Datadog support engineers are always quick to respond to bugs and other challenges.
Free Heroku add-on. Not particularly useful for us. Rails profilers tend to do a better job at the app level. And I can never really figure out what’s going on with Heroku by looking at New Relic. I don’t know if we’re just not using New Relic correctly or if it really does just suck for our use case. But I guess some insight is better than none.
How do you know what parts of the workflow need improvement? Measure it. With New Relic in place, we have graphs of our API performance and can directly see if a server or zone is causing trouble, and the impact of our changes. There’s no comparison between a real-time performance graph and “Strange, the site seems slow, I should tail the logs”.
We just started looking into Datadog, but from what we see, it's like New Relic meets Loggly. It's really easy to plugin different services (like the one on this list) and get detailed analysis of what is happening on your servers and services. It makes tracking down sparse and difficult to understand problems possible.
We monitor and troubleshoot our app's performance using New Relic, which gives us a great view into each type of request that hits our servers. It also gives us a nice weekly summary of error rates and response times so that we know how well we've done in the past week.
Monitoring day-to-day operations of multiple high-performance computing assets distributed across several networks. Monitoring vendor provided data and setting up alerts when things do not show up on time.
Datadog was used as an agent for monitoring and as for the statsd daemon included. This way we are able to have automated system stats and include whatever other metrics we want to track.
I'm trying to wring more instrumentation out of New Relic as it pertains to Rack, but for the time being, New Relic is monitoring/alerting uptime and some basic performance metrics.
Just like we care about errors, we care about metrics - especially around performance. You'd be crazy not to use it - and not surprisingly, it's a one-click add-on in Heroku.
Datadog is used because it has a great free tier and it provides us with great insights and integrations into our infrastructure and tools.
Powerful all-in-one monitoring solution as a service. Good integration with AWS. Very affordable price for small-scale startups.