Needs advice
on
ElasticsearchElasticsearch
and
SplunkSplunk

We are currently exploring Elasticsearch and Splunk for our centralized logging solution. I need some feedback about these two tools. We expect our logs in the range of upwards > of 10TB of logging data.

READ LESS
3 upvotes·896.9K views
Replies (1)
DevOps Engineer at Axual·

Well, first of all, you will quickly learn that sanitizing your logging will save you $$$. So start small rather than big bang. I've evaluated Splunk for logs in a data center use case and it was wonderful. But... Getting all the data in is easy, making sense of it and making it human-friendly is another. For instance, your Cisco switch can send you all the data on IP addresses and what they do. But you'll need to add the context of your DTAP environments, hostnames, application names much more to make it really usable. The same goes for VMware host log. Once you add in the context names of your (virtual) switches and environments and purposes, you really start to get a deep insights into what happens. And I was able to identify misconfigured SAN switches, host NICs, failing bonding and these things. However, the Storage and Network guys didn't like me for I was on their turf and telling them about their mistakes.

So make sure you have a delegate from EVERY department who's logs you will gather and let them be the messenger to those teams!

That said, you can do the same with Elastic, Datadog or NewRelic instead of Splunk. It depends on the functionality you really need and its use case. They all have great plugins and modules for specific systems and applications. The first step is getting the data in, that's easy, esp when using Cloud SaaS services and maitaining your logging platform is a BIG job too, don't underestimate that! I'd advize you to go to the cloud in a limited setup for year, learn about adding logic and context to the data, trimming your dataset and sanitizing your logs. But that will come at cost, however, you won't have to worry about the platform. If cost is still too high after a year, spend 6 mos on transferring it to on-prem while you continue with the team to onboard more logs from more systems.

You'll need 2 years to get fully up and running and have everyone onboard and convinced of letting you have all their logs. After that, you can focus at reducing OPEX at the cost of higher CAPEX because you'll move from the cloud to on-prem will be costly and slow

READ MORE
4 upvotes·113 views
Avatar of Jai Soma