For general information and resources, ITIL and ITSM World is the most well known for both ITIL and ITIL Books. A shorter snapshot approach can be found at ITIL Zone
Note: ® ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.
Posted: Wed Mar 18, 2009 1:21 am Post subject: System Monitoring KPI's
Good Afternoon,
I am trying to put together some KPIs for a System Monitoring team i have setup. There main purpose is to do event management and investigate monitoring alerts on a 3 tier hosted atchitecture.
The problem i have is that i have writers block and can't come out of my change management bubble.
Any ideas on KPIs for this.
So far i have
Ensure monitoring dashboard is at least 90% in green (good) state.
Joined: Sep 16, 2006 Posts: 3117 Location: London, UK
Posted: Wed Mar 18, 2009 3:31 am Post subject:
before you do KPIs
wjat is the NMS threshold set as
is there a setting w/in the tool to unalert if the alert goes and down just outside of the threshold
You should do the following before setting impossible KPIs to be met
What is being monitored ?
Is the monitoree single tier or through tier
are you monitoring the drill through or each item
For ex
a nms check to log into the web server - create query, execute query present response w/in timings
What should be check
what is the actions after the nms tool shows an alert
the NMS tool I used had a threshold of 5 minutes. Because ICMP protocol is last and SNMP is higher, the alerts that were ICMP based - ping etc
were set to a high threshold w/a % successful
the SNMP protocol or the TCP/IP or UDP traffic that is used would also have a different effect
What do they do if the tool is set is isolated
With the threshold set to 5 minutes, the NOC I worked would still get hundreds if not thousand ... link up /...link down or host not responding when the traffic was high. Did we create tickets. nope
unless there were trends and other indications
Merely using event mgmt to generate and action tickets is pavlovian
Ping.. open ticket
ping close ticket
ping open ticket
ping close ticket
all during the shift
What is the purpose of the team. The purpose should drive the KPis _________________ John Hardesty
ITSM Manager's Certificate (Red Badge)
Change Management is POWER & CONTROL. /....evil laughter
Posted: Wed Mar 18, 2009 4:12 am Post subject: System Monitoring KPI's
UK VIKING, you raise a good point about the drill through etc.
The monitoring tool we use is called NimBus, basically we have a DB, ALB and various App servers dependant on the service we offer.
Therefore on a top level monitoring screen we have icons that represent services approx 40, each of the 40 have a drill down to show the the DB, ALB and App status.
Typical metrics are, CPU, Disk Space, user concurrency, and some SP's that generate perfmon user counters relating to core functions on sql speed.
Each probe / metric has a threshold and will have a status colour dependant on status Green good - red urgent problem
, i.e. we dont really want to be alerted if the db space drops from 90% to 89%. reallistically we only want to know when we get to 10%.
A KPI i wanted to introduce is that all space alerts are dealt within x time of alerting. Trouble is managing and knowing when a breach has occured.
I am also stuck on what other kpi's to introduce. I hoep the above detail helps jog some ideas.
Joined: Mar 04, 2008 Posts: 1883 Location: Newcastle-under-Lyme
Posted: Wed Mar 18, 2009 9:39 am Post subject:
Don't you just love KPIs? Easy to say; impossible to apply.
So, 90% in green: is that averaged over a minute, hour, day, week, ...? More importantly, what does 90% mean in terms of service levels and quality of service?
So, space incident resolved within x time of alert; but the crucial aspect is the rate at which space is disappearing; some things creep up and will take another week to get from 85% to 86%, others will take twenty micro seconds. You don't want to drive your staff to treat the two cases the same; so how do you express the KPI?
Is the resolution process always the same? do different resolution processes require different amounts of time to perform? Do different resolution processes have long or short term effectiveness? do some resolution processes require additional work later while others fix the situation properly?
What do you want to achieve? Is it defensive (keep everything going and you are okay) or improvement (quicker than before and using less resource) based. It's probably both; so you need KPIs that measure outcomes and KPIs that measure processes. You might notice that your green example is outcomes and your space one is processes.
Sorry, this is rambling. Too late at night. I'll post it anyway, in case it helps. _________________ "Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum