Advertisement
Whats everyone use?.., or even better, whats your favorite monitoring package? I like Nagios very much, it's reliable as hell and aside from initial config easy to use. I did though, just deploy a Zenoss machine (Zenoss 2.0.6 under CentOS) that is really impressing me. It's using a mix of SNMP and SSH based tools for monitoring, uses Nagios plugins too. Zope powered, which I can take or leave but it requires a somewhat beefy machine ( instance I built is on a 1000Mhz Opteron x4 w/ 3G of memory... Zope moves pretty fast ;) to run things on.
Anyone looking toward "self healing" types of scenarios? (or doing it?) Things that restart when they die etc... load balancing network/services, things along those lines? Thats what I really want to get into place ( make on call easier :) for the long term.
Anyone looking toward "self healing" types of scenarios? (or doing it?) Things that restart when they die etc... load balancing network/services, things along those lines? Thats what I really want to get into place ( make on call easier :) for the long term.
Advertisement
Advertisement
-
Re: Monitoring
Thu, October 11, 2007 - 9:53 PMI took a liking to Zabbix early on - very light-weight, agent-based, snmp functionality is excellent, decent event trigger system, and excellent graphing/trending capabilities. The web-interface is great as well. I've never used Zenoss, but I've had plenty of experience with Zope (running plone). I liked Plone as a CMS, but Zope is a hairy, hoggy, memory, I/O and CPU beast. Not surprised at all by the specs on your CentOS box. I've also been a long time rrdtool hacker and netflow, of course....
As far as self-healing.. I've used a mixture of cfengine and pikt to reflexively recover from process failures, handle incremental mysql backups, process and filter log files, assist in spam tuning, run periodic benchmarks, regression testing, kick-off regular nessus scans of local hosts, etc. I've gotten a lot of mileage out of cfengine, in particuar. I tend to stick with scriptable infrastructure tools when I can find them. I've also had to handle a variety of network devices from 3500XLs, 6500s, VPN Concentrators, to Nokia IPSO boxes, NIDS, etc... I use Neo and disco (scripted) to monitor switches and routers... svn to store configs and rollback to known good branches when things got hairy...
I'm not doing much systems engineering work at the moment (I handled most of the tasks above as a sec eng., I'm a pen-tester now). But I'll be replicating quite a bit of my old setup on my home network soon... -
-
Re: Monitoring
Sun, October 14, 2007 - 12:11 AMok ... I am far too intoxicated to chime in at this point ... so I'll try again when I'm sober. (Hope y'all don't take this too seriously - my post that is)
-
Re: Monitoring
Mon, November 5, 2007 - 4:56 PMOk, Zenoss when it's grown on you is still ok, but just ok. With anything but SNMP based setups, it can be a false positive fest. Flapping hosts and service port checks are waaaaay hair trigger. A crappy day in the network neighborhood (groan) can easily generate a ton of alerts that are bogus. I still haven't found a way to mellow it out. SNMP checking is flawless w/ it though... which is fine unless you don't want 161 open or like us use SNMP for other nefarious purposes. The kicker was when I hit the Zenoss irc channel and was shunned because I didn't want to do SNMP checking, I only want checks via SSH, written by us. And we do use SNMP inhouse for other things, running Net-SNMP in parallel on another port broke the crap out of things the developer really like and I had a crappy time sorting back out. Supposedly the latest release has fixed the SSH checking options (broken in 2.0.4 and 2.0.5 for us) and all should be well... To late for me though, I have pressure from above to ditch it.
My interim solution has been to keep Nagios kicking around, then select machines with Monit on them are plugged into Nagios. Monit is pretty darned cool. I also cobbed an older VALinux 1220 machine from parts and put Munin on it (Zenoss had a nice RRDTool interface), I think no matter what, we are going to keep Munin around, we like it. Munin is a great utility that is easy as pie to set up, I even managed to get the plugin installed on old Solaris. Good times!
I am glad though there are choices as nothing has come close to doing everything we want, I'll just keep trying things and eventually, some day, I will be able to sleep all night again. Check out Monit, it really is a great little snitching/restarting daemon watching daemon.
-
-
Re: Monitoring
Fri, November 9, 2007 - 11:11 PMInteresting to hear your results with Zenoss. I'd like to give it a shot in a lab environment just to get a sense of it's functionality. I have to say though, I never took much of a liking to Nagios. I've been stubborn - my first experiences with it were negative. It's bulky and I never took to the visualizations. I'm going to give it a try again soon.. I'll definitley take a look at monit and similar...
At some point, I'll host a bakeoff on my private net and post my observations... thanks for your impressions!
-
-