Event Management KPIs

Posted: Tue Oct 02, 2012 1:42 am
by viv121
Hi folks,

Am looking for some KPIs for Event Management. We have a fully established monitoring team that's supposed to send alerts for any potential eventualities. We have a degree of automation too (Patrol, Candle etc) and these tools are capable of opening auto-Incidents and flash reports to support.

What should be the KPIs for such a function? Ofcourse, I want to base them on my SLAs/OLAs example Service A requires 99.5% availability and have 24/7 online hours except for a weekly service outage period. Can I put something like "Incidents avoided due to monitoring" as a KPI? Does anyone have more of them?

Posted: Tue Oct 02, 2012 3:58 am

My first question is what is the purpose for the KPIs ?

If the automated monitoring system does not perform as expected, are you going to fire it ? demote it ? send it to the showers ?

The KPIs need to have a purpose - beyond being just a KPI in itself

Define a purpose for creating a KPI and then the KPI would follow

Posted: Tue Oct 02, 2012 4:29 am
by viv121
Thanks John for being the first one as always. I thought I'll wait for Diarmind before any comments. But the purpose of the KPI is same as any other KPI and that of continual service improvement. Similar to Change Management KPI of Failed change where the KPI itself doesn't mean that the change owner is fired or sent to showers.

Event KPIs look like a tricky one. I need a few to put a value on the alerting systems we bought recently and to know what value the additional people are creating.

I thought of a few like - Incidents avoided due to Events, Changes triggered due to events, Proactive management of Problems.

Anything else do we know?

Posted: Tue Oct 02, 2012 5:40 am
by Diarmid
Sorry to be so late viv. I was at the vet.

Event Management is a term I don't fully understand. Events seem to have a hazy boundary with incidents. I will have to think as I write.

So I'll start with the obvious. Event Management is about efficiently, effectively and appropriately managing the fact that an even has occurred in terms of its implications for the service(s).

Monitoring is probably the principle pro-active element.

Analysis is what determines the impact, potential impact, risk and cause. This could lead to preventive/remedial action (if its remedial, I would have said the event was an incident, I think), communication concerning the implications, raising a problem or alerting pro-active problem management that there is something worth a look, and some other stuff.

What can you measure? Well:
1.incidents (or perhaps service impacts - because I might want to call the event an incident) avoided.
2. incidents (ditto parenthetical comment) resolved sooner than if the alert had been from a service user with trouble.
3. problems spotted sooner as above.
4. capacity management more successful - don't know how you measure this, but good event management should mean greater control of server and service performance which should mean more accurate timing of needs such as adding processing power to the infrastructure.

As I write this I begin to feel that event management cuts across various disciplines, especially incident management, operations management, service level management and capacity management. I wonder how it can be looked at across those contexts in a holistic way beyond its principles.

Told you I didn't know much about it. Still, hope this helps.

Posted: Tue Oct 02, 2012 5:51 am

I was NOT at the yet as I have had my rabies shots.

Thinking about what sort of KPIs on the NMS tools.

Should you not put the KPI against the team - human - that reacts to the auto - alert

For example, if the NMS Alert creates a ticket, sends email to team... how long does it take the team to react to it ?

I would also look at CoBIT. They use KPIs with KGIs which is what I was trying to infer.


Posted: Tue Oct 02, 2012 6:49 am
by Diarmid
I don't see a problem putting KPIs against a function. In fact I prefer them there. The function is more than the team; in fact sometimes there is not a team mapped against the function and this could be a case in point.

Posted: Tue Oct 02, 2012 8:35 am

I guess the issue here is what are you trying to inform with the KPIs

You will get metrics from the tool(s) on how many tickets they created etc

either from the source system or the destination system

My thing is KPIs usually would be used to action something

like - too many failed changes - check process for approving for plans etc

KPI Tools

