What is Datadog?
Who uses Datadog?
Here are some stack decisions, common use cases and reviews by companies and developers who chose Datadog in their tech stack.
We just launched the Segment Config API (try it out for yourself here) — a set of public REST APIs that enable you to manage your Segment configuration. Behind the scenes the Config API is built with Go , GRPC and Envoy.
At Segment, we build new services in Go by default. The language is simple so new team members quickly ramp up on a codebase. The tool chain is fast so developers get immediate feedback when they break code, tests or integrations with other systems. The runtime is fast so it performs great at scale.
For the newest round of APIs we adopted the GRPC service #framework.
The Protocol Buffer service definition language makes it easy to design type-safe and consistent APIs, thanks to ecosystem tools like the Google API Design Guide for API standards,
uber/prototool for formatting and linting .protos and
lyft/protoc-gen-validate for defining field validations, and
grpc-gateway for defining REST mapping.
With a well designed .proto, its easy to generate a Go server interface and a TypeScript client, providing type-safe RPC between languages.
For the API gateway and RPC we adopted the Envoy service proxy.
segmentapis.com endpoint is an Envoy front proxy that rate-limits and authenticates every request. It then transcodes a #REST / #JSON request to an upstream GRPC request. The upstream GRPC servers are running an Envoy sidecar configured for Datadog stats.
The result is API #security , #reliability and consistent #observability through Envoy configuration, not code.
We experimented with Swagger service definitions, but the spec is sprawling and the generated clients and server stubs leave a lot to be desired. GRPC and .proto and the Go implementation feels better designed and implemented. Thanks to the GRPC tooling and ecosystem you can generate Swagger from .protos, but it’s effectively impossible to go the other way.
Our primary source of monitoring and alerting is Datadog. We’ve got prebuilt dashboards for every scenario and integration with PagerDuty to manage routing any alerts. We’ve definitely scaled past the point where managing dashboards is easy, but we haven’t had time to invest in using features like Anomaly Detection. We’ve started using Honeycomb for some targeted debugging of complex production issues and we are liking what we’ve seen. We capture any unhandled exceptions with Rollbar and, if we realize one will keep happening, we quickly convert the metrics to point back to Datadog, to keep Rollbar as clean as possible.
We use Segment to consolidate all of our trackers, the most important of which goes to Amplitude to analyze user patterns. However, if we need a more consolidated view, we push all of our data to our own data warehouse running PostgreSQL; this is available for analytics and dashboard creation through Looker.
My team is divided on using Centreon or Zabbix for enterprise monitoring and alert automation. Can someone let us know which one is better? There is one more tool called Datadog that we are using for cloud assets. Of course, Datadog presents us with huge bills. So we want to have a comparative study. Suggestions and advice are welcome. Thanks!
We are looking for a centralised monitoring solution for our application deployed on Amazon EKS. We would like to monitor using metrics from Kubernetes, AWS services (NeptuneDB, AWS Elastic Load Balancing (ELB), Amazon EBS, Amazon S3, etc) and application microservice's custom metrics.
We are expected to use around 80 microservices (not replicas). I think a total of 200-250 microservices will be there in the system with 10-12 slave nodes.
We tried Prometheus but it looks like maintenance is a big issue. We need to manage scaling, maintaining the storage, and dealing with multiple exporters and Grafana. I felt this itself needs few dedicated resources (at least 2-3 people) to manage. Not sure if I am thinking in the correct direction. Please confirm.
You mentioned Datadog and Sysdig charges per host. Does it charge per slave node?
Hey there! We are looking at Datadog, Dynatrace, AppDynamics, and New Relic as options for our web application monitoring.
Current Environment: .NET Core Web app hosted on Microsoft IIS
Future Environment: Web app will be hosted on Microsoft Azure
Tech Stacks: IIS, RabbitMQ, Redis, Microsoft SQL Server
Requirement: Infra Monitoring, APM, Real - User Monitoring (User activity monitoring i.e., time spent on a page, most active page, etc.), Service Tracing, Root Cause Analysis, and Centralized Log Management.
Please advise on the above. Thanks!
Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.
Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”
There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.
Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.
Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.
Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.
- 14-day Free Trial for an unlimited number of hosts
- 200+ turn-key integrations for data aggregation
- Clean graphs of StatsD and other integrations
- Slice and dice graphs and alerts by tags, roles, and more
- Easy-to-use search for hosts, metrics, and tags
- Alert notifications via e-mail and PagerDuty
- Receive alerts on any metric, for a single host or an entire cluster
- Full API access in more than 15 languages
- Overlay metrics and events across disparate sources
- Out-of-the-box and customizable monitoring dashboards
- Easy way to compute rates, ratios, averages, or integrals
- Sampling intervals of 10 seconds
- Mute all alerts with 1 click during upgrades and maintenance
- Tools for team collaboration