Posted: Tue Jan 13, 2004 5:23 pm Post subject: Problem Management
Is there anyone on the forum who has implemented Problem Management?
If so, I would be very interested in sharing information.
Whilst I fully understand the concept I am having trouble getting started with the processes.
Any help would be appreciated.
though not directly responsible for the implementation, i have been a key member of a solution. i would think many have faced the same dilemma. getting started that is. are you implementing a third party solution or an internal methodology, which has yet to be defined? what resources do you have available? if a third party application were the case, i would suggest engaging the provider for implementation services. be careful not to let them do the full implementation. be sure you and your team are part of the process so you will have a complete understanding of the application. if you have specifics or additional questions, i will be more than happy to assist. for your info, we implemented PEREGRINE, which i quickly became a fan of. worked well for both the service desk and the user community.
Joined: Mar 20, 2004 Posts: 3 Location: San Diego, CA
Posted: Sun Mar 21, 2004 4:08 am Post subject: PM Reply
I am currently involved in implementing ITIL for PM within a major healthcare company, and my focus/role is on process. I am charged with presenting the processes to executive-level audiences to obtain their acceptance and buy-off. A clear definition of PM and its differences from Incident Management is key for beginning to develop a solid process. What other questions do you have? I'm always interested in sharing information!
Posted: Fri Jun 18, 2004 4:06 am Post subject: How to get start?
My company is an IT solution provider and we are starting with ITIL to improve the service for our clients. We preferred to start with Problem Management as Incident Management is a mature process here. But I´d like to see what other companies are doing to implement it.
We have a system to register incidents and generate reports. The idea is based in the incident reports (and others sources) to identify the problems. I´d be happy if anybody share your experience.
I also work for an IT solutions company and have been involved in implementing ITIL at the grass roots level.
In principle I agree with Gold's comments, ensure that the underlying structure is in place before implementing Problem Management. Without good Service Delivery and Incident Management you are just stabbing in the dark. I find that if the PM gets on well with the Incident Manager then the processes flow more naturally and you don't have to fight for information to do your job.
PS This question has also been asked on a couple of interviews I've been on recently, so is a good one to brush up on.
Joined: Nov 01, 2004 Posts: 81 Location: Sask, Canada
Posted: Tue Nov 02, 2004 9:10 am Post subject:
hey, all. I am a recent member of this forum, but have been working as a Problem Analyst since 1999 when I implemented the process here. We are a telecommunications company of 4000, and I support our internal Help Desk, doing Problem Management on Severity 1 incidents related to our computer infrastructure. (I hope that made sense).
Progress has been slow, but steady.
I have been able to demonstrate that over half of the Severity 1's that occur in a month are related to a Problem with a Known Error, whose solution has just not yet been implemented. Currently over half of our open problems have been open for more than 3 months. (okay, yes, that is slow).
My current objective is to have the resolution to a Known Problem (actual infrastructure change) implemented within 3 months of the 1st occurrence of an incident. My question is, is this a reasonable objective? Suggestions & advice welcome!
[quote="eisbergsk"] We too are in the early stages of resolving known errors, however it is rare that an internal known error goes unresolved for more than 30days. We have a Fault Management Board that reviews new and existing problems on a weekly basis. Mgmt involvement is key to our success.
Posted: Sun Dec 12, 2004 3:47 pm Post subject: Re: Problem Management
A very basic process...
1. The customer encounters a problem using an application or the computer....
2. They call a central number, or a specific group number to report the problem, i.e. a help desk...
3. The person answering the phone collects their name, number, location, athe problem they are encountering, and similar information.
4. The person responding to the incoming call attempts to fix the problem. If the problem is fixed, the close the call. If not, then route to another team for support... etc...
5. in all cases the customer is provide a reference / ticket number.
If the customer (user) has the capability to report the problem without calling a number, such as a web based problem reporting tool, then they in turn would be reporting and logging the problem...
From the perspective of logging a call, a good rule to follow is if the person calls, then either open the support request for them, and then direct them in the correct process to follow, or transfer them to the central number or contact to log the call...
Some would define an incident as being resolved / fixed at the point of call (i.e. the customer calls the help desk). However, if there were other reports within the tracking system indicating more than one person was having the same issue, then it could be classified as a problem, which is means a solution may take more than say 1 day to correct... In short, you could have a field that provides the option to estblish the call as an incident or a problem...
As for categories, you could use a classification system, and assign a priority code to that classification. For instane Hardware, Desktop, Failure is assigned priority 2, which carries a 1 hour response and no more than 2 hours to resolve. However, it could be classified as a problem, if it is know that the power grid for the area where the computer is located is not working...
In general there should be some type of review each week for all the calls opened to idetnify common solutions, and to identify thos issues or incidents that re-appers. Example, in the first week we had 5 callls for IP rel / renew. In the second weeks we received 10 calls, and this week we recieved 15 calls for a release and renew. If this were correctly tracked through the use of specific categories and assignments, then even if resolved at the point of call, you would still know you have a problme brewing...
Joined: Nov 16, 2004 Posts: 24 Location: Australia
Posted: Tue Dec 14, 2004 7:47 pm Post subject:
Whilst i agree with GSR's basic steps; there is a clear difference between Problem and Incident.
Where GSR's basic process talks about reporting a 'problem', you would be substituting that with 'incident'.
Users always report incidents. If a problem record has been raised, and the user is reporting an incident related to that problem, then the incident is just another count against the problem. (eg. the problem was raised because of 15 incidents, a new incident updates the problem/incident count to 16).
It probably wont make too much sense until you have a tool that is capable of doing this, and one that can link incidents with problems. Problem Management is primarily trend analysis, root cause analysis, then generally change management will ultimately be involved in the final solution.
Joined: Nov 01, 2004 Posts: 81 Location: Sask, Canada
Posted: Thu Dec 23, 2004 4:23 am Post subject:
We too are in the early stages of resolving known errors, however it is rare that an internal known error goes unresolved for more than 30days. We have a Fault Management Board that reviews new and existing problems on a weekly basis. Mgmt involvement is key to our success.
Thank you for your response. This is very exciting, but I would like to know more about the types of problems before I hit our mgmt team over the head with this and demand a bigger stick!
Our focus is on Severity 1 incidents, which are defined here as 'customer or workgroup affecting'. The problems that I have the most difficulty getting resolved usually are from the following type:
1) the cause of the incident is unknown (eg. application just stopped. nothing in log, no unusual activity, nothing). There are frequently varying & conflicting opinions about what can be done to fix it.
2) Frequently the solution turns out to be 'get a bigger/better/separate server', and that takes a long time because each person involved in each of the steps (analysis, recommendation, selection, approval, ordering, receiving, lab installation, base test, os test, build test, qa, app test, client test, change scheduling, implementation, and I'm sure I've missed a few) wants to make sure they do a good job. They do, and each step is important. maybe it's slow handoffs?
3) known error in a current version of software, waiting for a maintenance cycle to implement the change.
How does your organization overcome these kinds of obstacles? The only stick I have right now is 'if it's not going to be fixed in 3 months, and it is customer affecting, it must be mitigated with workarounds/special attention/short fixes'.
I appreciate any and all advice
Workarounds and Short Fixes are terms that should not be mentioned as mitigation in the Problem Management arena. These should be in place well before the 3 month period.
In the specific types you stated...
1... There will be times where Root Cause cannot be identified. In that case, your saving grace to your clients/execs is that you put in place measures to ensure that this data can be captured should the Incident Re-Occur. If it doesn't re-occur, then all the better. There should be something in your Problem Management process that will give you this out if needed. "Support is monitoring the service" is an excellent phrase to learn. However, if all the teams come to you with things that are wrong that could have caused the problem... They should be addressed as well.
2... If you have a mature Capacity Planning team, then the answer to a Problem should never be to throw more money at the problem (i.e. bigger server, etc.). If you don't have a mature capacity planning team, then these types of items may crop up, but you should identify them before critical mass through proactive Problem Management. This should also be mitigated through Change Management, SLM or Business contacts. IT should know when additional resources will be required because of changes in the business. However, if the answer does end up being more money, it should not be the Problem Manager making the case. This is where a risk acceptance document comes in handy (see below).
3... This is no longer a Problem. It is a Request For Change and should be managed by CM. If the Problem occurs more frequently, then Incident and Problem Management can discuss with CM and have the change priority raised, or have the item rushed into production as an emergency change.
A couple of other things that you may want to put into your toolkit.
1. Have a risk identification and acceptance policy drafted up for when there is resistance to implementing an RFC to resolve a Problem. Nothing will loosen the dollar signing pen quicker than if they have to reach for the risk acceptance pen.
2. As someone mentioned, ensure that Change, Problem and Incident management all get along real well. This is crucial in ensuring that the data required is available. It is also helpful to have the Major Incident Manager and the Change Manager by your side and onboard when you want to rush a fix into production.
3. Peer pressure is useful. If you have the chance to get the Engineering/support Directors in a room once a month to review open problems, you would be suprised at how hard they will work to fix something and ensure they look good in front of their peers.
Joined: Nov 01, 2004 Posts: 81 Location: Sask, Canada
Posted: Mon Jan 31, 2005 7:23 am Post subject:
Dear Guest- thank you for a great post. Please consider joining this group
I appreciate the time and effort you spent to try to answer some of my questions. Altogether, your responses indicate an organization that I would describe as being at a MUCH higher level of ITIL maturity than my own. That may account for some of the functional overlap: I beg for Problem mitigation because - no one else does! and there is no capacity planning team - me using a severity 1 incident to point out that operational fixes are no longer adequate is the closest I've gotten yet to pro-active PM. Our Change group only accepts a request for change when it's within 2 weeks of being implemented, and there are no 'unknowns'. This may change this year as they implement a more ITIL-centered product, in place of their homegrown Lotus version.
Your post has definitely given me some focus on direction for improvement.
Some clarification re: cause of the incident is unknown. Currently, I give these problems a status of 'watch for it'. Implicit in this is the understanding that measures have been put in place to capture data if it reoccurs. If no reoccurrence in 6 months, it gets cancelled. Current goal is: reduce number of 'watch for it's. Nirvana: no 'watch for it's. and while I'm dreaming....
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum