Posted: Thu Apr 28, 2005 12:44 am Post subject: Problems vs. Incidents
I am still stumbling a bit on the difference between a problem and an incident. As an example, if a user calls to report an error message on his/her screen which has caused her system to hang, our support analyst would most likely record the details of the message and the other appropriate circumstances and then ask the user to reboot his/her system. I believe I'm describing an incident, which should be closed if the user reboots and is able to perform the function once again successfully. I don't think we'd spend a huge amount of time doing root cause analysis at this stage because it's a "one off".
How does this incident move to being a problem and when does this happen? Is this only done during the time when the incident manager reviews all incidents for a given period of time and determines trends? What if this is the only time this error has ever come up, is this still a problem or just an incident?
Posted: Thu Apr 28, 2005 1:58 am Post subject: Incidents vs Problems
Hi Big Al, you are not the only one who is still confused. Many organizations still say problem when they refer to an incident.
What you described fits the “incident” as the purpose of incident management is to restore to normal service as soon as possible.
Problem management are linked to all incident records as there needs to be some method to correlate the tickets to find common incident tickets. The problem management team would then investigate. If it were a known error, a pre-defined solution would be applied to correct the incidents, which should result in the elimination or greatly reduce that particular type of incident. If it is an unknown error, then full root cause analysis needs to be performed to find a solution.
E.g., one of our clients had many incidents where hard drives would fail to boot up, or would crash throughout the work day. Was it applications hanging, was the image corrupted on desktop rollout, etc? Through root cause analysis, it was determined that only a certain type of drives was experience this problem. They tracked down all the drives, notified the manufacture of the drives, and replaced all of their drives, as it was a bad batch. One all of the desktops that had the drives replaced, those types of incidents decreased significantly.
Problem management needs to work with Incident Management. There should be a separate Problem Manager and Incident Manager, because they are two different roles. You have to remember, Incident Management wants to restore service as soon as possible, what ever it takes (take this with a grain of salt). Problem Management wants to get to the root cause, and this normally requires time. Of which your client usually does not have.
Typically, incidents become problems when:
- Incident matching produces no Known Errors or previous problems
- Incident is of a high severity
- Incident is a repeat incident
- Root cause for the incident is unknown
The ITIL Service Support Book actually presents a flow chart table that can be referenced that describes when an Incident should be escalated to a problem. See page 102 - Incident Matching Process flow. This whole section discusses this issue.
In your example, if the root cause is not known, then it is a problem.
No one says all problems require major effort to solve or need to be addressed right away. This would probably move to the Problem managers queue as a low priority problem. Business need would determine whether much effort would be spent working on it or whether to leave it as "under watch". This enables problem management to respond faster if it looks like more of these incidents occur again.
Azard did a great job of explaining the traditional approach to Problem Management. This is a textbook ITIL answer - and if you're interested in a traditional textbook ITIL process implementation, this is perfect.
However, I respectfully disagree with this methodology as I believe some of the traditional assumptions are outdated. For example, traditional definitions assume that multiple incidents must occur before a problem is declared. This feeds the philosphy that PM is implemented by parent-child ticketing systems... where multiple incidents are rolled up into a single problem record. This is inherently reactive. Technology is advancing far enough to where this methodology isn't needed. While I believe that parent-chile ticketing is valuable and needed, it shouldn't be the ONLY means to PM, nor should other incident management tools be morphed into PM systems.
A second outdated assumption is that PM is implemented by a functional group of people that do nothing but PM. This sounds extremely inefficient and expensive to me. Most organizations are having enough problems affording the resources required to provide daily reactive support services.
As Randy suggested, if the root cause is unknown, it is a problem... and I suggest it is a problem at the first incident and should be prioritized appropriately. I suggest that if the incident goes through the PM process at the first occurance, future incidents will be avoided. This is the goal of PM.
I believe PM should be proactive. Technolgy has come to a point where we can proactively scan for error conditions without waiting for an incident to be reported. Those errors can go through a root cause diagnosis process and remediated to further prevent incidents from occuring. I believe most of this can be done through computer automation and analytics.
Jacob, you definitely bring some good points to the table. The approach I mentioned maybe what you consider “traditional”, but your assumptions, I believe are made from what you infer rather than what I have actually written. Just to let you know, I take no offense being called a "traditionalist".
So let me see if I can clairify.
I understand the resource issue, which is why I only identified the Incident Manager and Problem Manager as different roles. I think they should remain different roles. Ideally you would have two separate groups, one for Incident Management, and the other for Problem Management, that doesn’t mean you need to have a huge team dedicated to Problem Management as you would Incident Management.
I also think you would create a problem ticket for every incident ticket. And I do agree that problem tickets should be prioritized. Problems should be solved based on having the greatest benefit to the organization. Some problems may be such a low priority that it could a very long to find a permanent solution.
While it is hard to put all your thoughts down without writing a few pages covering every little detail, your points are taken. Ideally, it would be great to have an in depth conversation over a drink or two.
Posted: Fri Jun 03, 2005 2:55 am Post subject: Problem Management
I find the "traditional" approach to PM to be the only one that is really talked about these days - where the service desk is reacting to incidents and looking at trends, etc. to determine if there is a problem worthy of going through the PM process. And yes, my reference to your previous post being traditional was composed of assumptions, etc. Thanks for letting me be liberal.
I work for a software company that's is working on a proactive PM solution - so I suppose my perspective is a bit bent I really value this kind of discuss because it helps me understand what challenges people have today and what future solutions will need to do to help solve "problems".
Wow! Great conversation guys! I work at an internal help desk where I lead a small team we call the Call Reduction Team (CRT). The goal of that team is to monitor incident data, identify trends in that data, then apply appropriate resources. Appropriate resources could mean a CRT member sending a broadcast e-mail to inform users of best practices or posting a solution on our IT intranet site. If the problem has a large enough impact, root cause analysis is initiated. This meeting includes representation from global desktop support team, the applicable application owner(s), and a CRT team member who facilitates the meeting.
We are currently trying to streamline our process for data analysis, as our current process in very manual and labor intensive. Is there a standard procedure or software solution for analyzing incident ticket data? Where can I look for more information on Incident/Problem management?
Posted: Sat Jun 04, 2005 1:17 am Post subject: Problem Management
Hutch... you bring up the major issue that I think most people have today... and that is gathering and analyzing the relevant incident data in order to determine the root cause.
This turns out the be the fundamental aim of a product my comapany has built - but I'll refrain from giving a commercial on this forum. Needless to say, I truly believe that the manual, human-based, and slow tasks of many of the incident and problem management tasks will be completely computer automated - including the actually root cause analysis. In brief, if a computer can gather an enormous amount of data, more than any human can analyze, and use statistical analysis and policy information to automate the root cause analysis of any error condition. What's more, the service desk can then be proactive because we no longer have to wait on the symptom to be reported by the user - rather a machine can proactively find true error conditions through automation.
Just my 2 cents at what I believe the future holds.
"... will be completely computer automated - including the actually root cause analysis. In brief, if a computer can gather an enormous amount of data, more than any human can analyze, and use statistical analysis and policy information to automate the root cause analysis of any error condition. "
I don't see that happening at all.
Who will analyze problems with the computer automated root cause analysis process?
Actually, we have found a statistical analysis software that will work for us. There is a bit of a learning curve involved because SQL is required for queries and such. It is still better than a human trying to analyze, re-categorize, then analyze again, attempting to locate incident trends! I can honestly tell you from my experience leading this problem management team so far, I hope what Jacob said about fully automated incident data analysis software in true! It would be a phenominal contribution to the IT community!
We are going to implement problem management process. No we are having Incidents in one database (our product) and 2nd level can force creation of Change Request in 2nd db (Starteam).
I am afraid, that implementing of Problem management will lead to big amount of problems.
Let's expect, there is a some number of incoming Incidents solved by 1st Level - I do not know exact number.
The rest is passed to 2nd level - according what I know they cannot solve about 80% of these Incidents, because they are software bugs and there is simple no quick solution! Some of the rest are questions - answered by developers usually (is this 3rd level?).
These 80% are becoming Change Requests and they are usually fixed in Release, Csp or Hotfix.
I think when the developer overtake Change Request, he starts with analysing root cause (problem control phase).
Unfortunately, there is probably no statistics how many of Change Requests handled by developers are leading to changes and the rest is solved by configuration, or...
I think the big difference here between "normal" service provider and software environment is in amount of changes.
It is often possible to solve user's Incident by quick fix:
And there is no so often neccessity to force Problem mgmt. and Change process.
Perhaps I am speaking in the wrong context here. I am referring to the analysis of incident data, meaning tickets generated from reported computer related issues in our environment. These incidents range from software issues to network hardware issues to desktop issues to homegrown application issues.
Problems can be identified by analyzing this type of incident data and finding common incidents, or trends in the data. Are you saying I am off base on that "Guest" ? Perhaps you could elaberate on your points. They are comming across with a bit of a negative connotation. I am here to discuss this topic with colligues in similar situations. Pointing out more problems without presenting solutions does not help me. I look forward to any thoughts ...
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum