For general information and resources, ITIL and ITSM World is the most well known for both ITIL and ITIL Books. A shorter snapshot approach can be found at ITIL Zone
Note: ® ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.
Posted: Tue Jan 01, 2008 1:32 am Post subject: Suspended problems
How long does everyone else keep a problem in Suspended status? I have mine required to have a workaround and honest effort/resources no longer available before suspension is allowed.
What I'm seeing now is that problems continually sit in suspended status because when I revisit them the SME's responsible still can't find the root cause.
Obviously there is less of a chance of finding root cause as time goes by since forensic data gets overwritten, jobs change and new higher priority problems arise....
Is there ever a time when you can just throw up your hands and say that it's never going to be found?
Joined: Sep 16, 2006 Posts: 3110 Location: London, UK
Posted: Wed Jan 02, 2008 7:21 pm Post subject:
What is suspended status ?
Is thata tool specific status
If a problem cant be investigated or not enough resources to investigate, then the PMgr needs to have a process to define those kind of problems and closes off the problems _________________ John Hardesty
ITSM Manager's Certificate (Red Badge)
Change Management is POWER & CONTROL. /....evil laughter
Posted: Sat Jan 12, 2008 12:52 pm Post subject: Suspended Problems
Are these high priority problems or low priority?
If they are low, then what is really the value in pursuing them if you have a workaround? Now, if you still have incidents occurring against these problems, then I would suggest using that as a means to raise the priority to raise visibility.
But we are talking RCA, which should be urgent for 95% of all problems because the root cause leads you to determining solutions.
We don't have this where I am, all problems are fairly urgent currently, so we get to root cause quickly, we also fix them quickly, a little too quickly.
This is a delicate issue. If you have a problem starving months and months with nobody taking care of it… it probably means that solving it is not so critical. You should review your stack of problems on a regular basis (for example once a month) and then close all the tickets starving since a too long time. Keeping them in the stack of the open problems will only overload problem management till making almost impossible to work when too many problems are open.
If you think all the problems open should remain open even if they are staving since a long time… it means you need more resources.
If you think you solve problems too quickly (as Titan said in his answer) pay attention… you are loosing money! (too much resources)
Joined: Sep 16, 2006 Posts: 3110 Location: London, UK
Posted: Wed Jan 23, 2008 10:58 pm Post subject:
One of the things that has to be done to have effective and efficient problem management is agreeing what is problem management
One of the posters stated that the find the root cause quickly and resolve it quickly
To me that means he actually has good Incident mgmt not problem management per se
If the incident is 'web site not loading properly', then the incident is sent to the Nth level incident mgmt team to try to restore service.
They see that one of the asp, html or php lines has an obscure reference to a site that is down. Root cause found !! They remove the line !! // edit it soit does nto hang the site Service restored !!!
This is Incident mgmt not problem mgmt
Problem mgmt is finding the underlying root cause of a problem
for example
The web site crashes every 5-8 days. The O/S is Microsoft 2003, the patch mgmt is currently; IIS 6.0.
The incident is raised and the microsoft team reboots the system and conducts a basic check after serviceis restored. They find nothing
The web site crashes 13 time in thirteen weeks. The Incident mgmt team and the Problem mgmt team at their regularly scheduled meeting discuss any incidents that candidates for problem mgmt
The IM says... hey there is a recurring web site outage server - bob - that has happened 13 times in the last 13 weeks. The M/S people restore service every time
The PM Mgr say - well I gather you dont know the UNDERLYING ROOT CAUSE for this do you ?
IM says No
The PM Mgr says. OK. We will look to see if the incidents meet our criteria for a Problem record being created. Please continue with raising the incident
That afternoon, the PM mgr has his regular PM team meeting and discusses the Web Site/server incidents
He says
Well first - service is impacted as the web site has gone down during 'prime working hours' - impacting SLA
They have a work around which is merely reboot the system so not much of a work around. On/Off solves all incidents and problems
It happens frequently enough to have an impact on SLAs
So since this incident group meets the minimum criteria, what about resources that can be assigned and for how long
PM M/S OS member - well, I can dedicate 40 man hours to investigate the problem over the next month.
PM Mgr: for 2 -3 months ?
PM M/S O/S: yeah. but we do have the high priority M/S issues to deal with first. I would set this as a medium priority to be review in 90 days
So the PM team has accept the IM recommendation of the web site crash
They decide to spend 40 hours a month to find out the underlyign root cause for the next 3 months.
30 hours into the work, the PM M/S team find out that there is a fatal programming error which Microsoft or 3rd party vendor has found that cause the site to freeze.
Vendor/Microsoft expects to release a patch in the next 30-90 days.
So PM mgmt has a mtg, moves the problem record to suspend / hold and waits for the patch.
Once the patch comes out, a change is raised. The patch is applied and lo and behold, the machine does not reboot every 5-8 days any more. the chaneg is closed, the problem is closed
this is problem mgmt _________________ John Hardesty
ITSM Manager's Certificate (Red Badge)
Change Management is POWER & CONTROL. /....evil laughter
John says “this is problem management” I’d rather say this is an example of problem management. (and btw a very well described and detailed example, you understand ITIL concepts much faster when you can figure out concepts, so thanks for the effort)
In John’s example the problem is clearly identified, the team has resources to investigate and they find the root cause.
It happens (sometimes… most of the times… it depends on your organisation) that you don’t have enough resources to deal with all problems as you would. And so you have to act and respecting priorities cut off the “dead branches” to concentrate on the most important stuff. (as somehow John pointed out in his first reply)
Respecting ITIL the main activities of problem management are:
Problem control (handling Problems in an efficient and effective way)
Error control (to be aware of errors, to monitor them and to eliminate them when feasible and cost-justifiable)
Proactive prevention of Problems (identifying and resolving Problems before Incidents occur)
In my opinion taking such critical choices like closing an unresolved problem record is one of the most delicate and important parts of problem management. Sometimes more critical that bring a problem resolution till the happy end.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum