Posted: Fri Sep 22, 2006 1:24 am Post subject: Event Management and Incident Tickets
I'm currently involved in implementing an ITIL based Incident Management process in our organization. Another project is working on getting an event management tool in place. One of the issues that will soon have to be addressed, is how event management and Incident Management relate. The main question at hand is whether all events detected by the event management tool should lead to the automatic generation of an incident ticket.
This issue is only briefly addressed in the ITIL Service Support book, which says:
Within the more technically oriented systems-management function, an automatically registered event such as exceeding a disk-usage threshold, is often regarded as part of 'normal' operations. These events are included in the definition of Incidents even though service delivery to Customers is not affected.
It certainly makes sense to treat events as incidents, even though they do not necessarily meet the definition of an incident (i.e. a service disruption). After all, an event, if not handled appropriately, might lead to an incident. The question though is, if ALL detected events should result in an incident ticket being created. And that is where I expect a lot of discussion.
My take on this that we will have to go down to a question at an even more basic level: which events should be detected (and result in some kind of alert to an operator or an engineer) by the event monitoring tool? I would personally say that if some threshold is being met or exceeded and a corrective action (to prevent a service disruption) is required, an incident ticket should be raised and incident management should notify the right people to work on the issue (instead of the event monitoring tool sending out an alert to somebody). If some threshold is being met or exceeded and no corrective action is required, I think there is no reason to detect this as an event and send out an alert at all. These things should be monitored through for instance capacity management based on weekly reports.
I am very curious about opinions on this issue and would like to learn how other organizations have addressed this. _________________ Manager of Problem Management
Fortune 100 Company
Yes, event detected by event management tool should be treated as an incident and should lead to automatic generation of incident ticket.
If your technical staff works on a system to prevent a possible breach of threshold, enforce them to create a ticket for the job which they performed.
What it would do to your organization is to record every time the system reaches the threshold. It would also help the Capacity Management to monitor the frequency the system reaches its threshold and determine if any action is required.
Threshold’s are set to prevent a possible service disruption, which by means to help the person in charge to take a note of it well in advance and take the preventive measures such as reducing the disk space and so on..
Identify the critical component which needs to be monitored and set its threshold as required.
What I mean here is why set a threshold so low which does not require any corrective action? This means either the threshold is not correctly defined. I have seen incorrectly set threshold’s leading to ignorance, which resulted in a service disruption.
I agree with Juan, set two threshold if you could possibly can…
Any major disruption will have an incident leading to another incident, which will eventually result in catastrophe.
The purpose of Incident management is to restore the service at the earliest.
If the incident management team rises a ticket and assigns to the tech support or if the event management team automatically raises a ticket it’s all one and the same. The incident management will be notified if a ticket is created and if its service level agreement is about to be breached.
Thank you both for your input. I understand and agree with the idea of having two thresholds: one to support regular monitoring efforts within capacity management and one to help raise an incident ticket in time.
Thank you also, Juan, for correcting me in how I described incidents. You are of course absolutely right that they are not only about actual service disruptions, but also about potential service disruptions. I should have been more careful in phrasing my question there.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum