Search
Topics
  Create an account Home  ·  Topics  ·  Downloads  ·  Your Account  ·  Submit News  ·  Top 10  
Modules
· Home
· Content
· FAQ
· Feedback
· Forums
· Search
· Statistics
· Surveys
· Top
· Topics
· Web Links
· Your_Account

Current Membership

Latest: FMcCallum
New Today: 14
New Yesterday: 165
Overall: 131424

People Online:
Visitors: 59
Members: 3
Total: 62 .

Languages
Select Interface Language:


Major ITIL Portals
For general information and resources, ITIL and ITSM World is the most well known for both ITIL and ITIL Books. A shorter snapshot approach can be found at ITIL Zone

Related Resources
Service related resources
Service Level Agreement
Outsourcing

Note: ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.


The Itil Community Forum: Forums

ITIL :: View topic - Major Problem / Major Incident
 Forum FAQForum FAQ   SearchSearch   UsergroupsUsergroups   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Major Problem / Major Incident

 
Post new topic   Reply to topic    ITIL Forum Index -> Problem Management
View previous topic :: View next topic  
Author Message
AndyBostian
Newbie
Newbie


Joined: Jul 09, 2009
Posts: 2
Location: Little Rock, AR. USA

PostPosted: Fri Jul 10, 2009 8:11 am    Post subject: Major Problem / Major Incident Reply with quote

Disclaimer: I apologize in advance if this has already been asked/addressed, but I searched the forum and did not find a similar thread.

The questions are related more to business implementation than they are to ITIL process. I'm curious how others handle these situations in their organizations. Specifically, let's say you have a high-impact high-urgency incident (service down for an extended period of time), root cause is unknown, no solution in sight.

0.) Do you have strict business criteria to declare this a "Major Incident" (and/or "Major Problem"), or is it more of a judgment call by senior management?
1.) How do you successfully continue to run your processes in parallel? (i.e. PM subset of folks investigates for root cause while IM subset of folks looks for workaround).
2.) If you run in parallel, do you have one person with accountability over the whole "war room" situation?
2.a.) Is that person the Incident Manager, the Problem Manager, or another party?
3.) How do you manage resources when both Incident and Problem resolution efforts need the same SMEs? (developers for example).

Thanks for any feedback.
Andy
Back to top
View user's profile
Diarmid
Senior Itiler


Joined: Mar 04, 2008
Posts: 1884
Location: Newcastle-under-Lyme

PostPosted: Fri Jul 10, 2009 6:05 pm    Post subject: Reply with quote

Andy,

Little Rock is a long way from here. So I won't bother throwing stones.

0.) You do both; you have criteria based on, for example, loss (for indeterminate time) or imminent loss of vital services; but senior business management can wade in if in their judgement an incident is threatening the business; part of the issue is often going to be uncertainty of recovery time.

1.) The priority is effective restoration of service; that may well require deep investigation (in effect root cause analysis) but is nothing to do with Problem Management; if there is possibility of recurrence (one way to determine that there is in fact a problem), then the activities (not the management functions) merge into the problem analysis and investigation once resources are free from service restoration; there may not be an underlying problem once the incident is dealt with and the major incident review can check and confirm if that is the case.

2.) The war room is around the incident and, if there is an underlying problem that threatens repeat incidents (for example), especially if they could occur at any time or sooner, then you move the war room to resolving the problem as soon as the incident is fixed.

2a.) In a major incident situation, you do not assign lead role on basis of normal roles; you want the most capable person available running the show; it would be natural for the most senior person present to take charge and s/he may well delegate coalface management to the best and most experienced person available while retaining close contact with what is going on and with senior business management; in any event whoever leads must have authority (at least for the duration of the incident) to acquire and deploy whatever resources are required.

3.) I think this is implicitly answered above; the priority is effective restoration of service.

Problem Management is not a front line activity. If you expect the service to drop again in an hour, or even a day, you are still deep in incident management because you are working on an imminent breach. If you expect the service to stay up but you are not sure whether it might go down again sometime, then you can apply problem management to it and your first step might probably be to validate the workaround that was achieved and streamline its application.
_________________
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
Back to top
View user's profile Send e-mail
milligna
Itiler


Joined: Oct 27, 2008
Posts: 34

PostPosted: Mon Jul 13, 2009 10:05 am    Post subject: Reply with quote

This is an interesting topic. I was discussing this with our problem manager here... My understanding of the problem process was that a Problem becomes a Known Error and THEN has a workaround applied. This doesn't fit Incident Management - Incident Management seems to be able to apply workarounds without understanding the root cause in order to restore service.

Does anyone else find it confusing that Problem Management highlights the linear notion of Problem->Known Error->Workaround/solution, and that Incident Management allows Workarounds without understanding the root cause?

Should we have different names for the blind-faith bandaid type of workaround that incident management comes up with (bounce the box!) and the reasoned workarounds that Problem Management comes up with after understanding Root Cause?
Back to top
View user's profile
Diarmid
Senior Itiler


Joined: Mar 04, 2008
Posts: 1884
Location: Newcastle-under-Lyme

PostPosted: Mon Jul 13, 2009 9:53 pm    Post subject: Reply with quote

milligna wrote:
Should we have different names for the blind-faith bandaid type of workaround that incident management comes up with (bounce the box!) and the reasoned workarounds that Problem Management comes up with after understanding Root Cause?


I don't see the necessity. It is not strictly "blind faith" since the workaround has been applied to resolve at least one incident before it ever reaches the status of workaround. Even after "root cause" is established and a workaround derived, this is not set in stone. It is perfectly possible to change the workaround after this if a better one is found for the business. there is more than a technical component to a workaround.

Properly speaking a problem does not become a "known error", it spawns a known error or it acquires an attribute of "known error".

It is clear from the recent posts in several threads that this area is confusing. I do not propose to try to unravel ITIL, if for no other reason than I do nor have access to the books. However, the best way to do something is the best way to do it regardless of book learning.

To my mind the high emphasis on "known error" is symptomatic of an environment where development of software is important. In many service environments (probably most outside of the really big organizations), it seems overkill to start managing a sophisticated "known error" system on top of Problem Management.

I prefer to unravel the knot. You need a workaround to resolve an incident. The first one you come up with should be used until either you find a better one (possibly through "root cause analysis") or you find a serious defect in the one you are using (in which case you find another one pronto).

You don't really want staff ferreting around a bunch of incident records, a bunch of problem records and a bunch of known error records every time there is an incident. You want one unified search ("match the symptoms and tell me the workaround"). You do not care about all the technical terms at that point and the only time you are interested in workarounds is when you have incidents.

Equally, you are only interested in "known errors" as distinct from "problems" because this gives you better information as to what is happening to cause the incident and therefore a better confidence in and understanding of using the workaround.

Service management is also interested in "known error" status as part of tracking current problems as to their current state of progress.

I prefer to think of all these terms as concepts and to use my understanding of the concepts to apply good process where and how it is needed.

The relationships that matter are the practical ones. The theory is just there to help you understand things. Draw up a process flow that will provide good incident resolution and good problem resolution and let the concepts of "known error" and "workaround" fall into their natural place.
_________________
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
Back to top
View user's profile Send e-mail
UKVIKING
Senior Itiler


Joined: Sep 16, 2006
Posts: 3258
Location: London, UK

PostPosted: Mon Jul 13, 2009 10:13 pm    Post subject: Reply with quote

To add to Diarmid's

You, the user, want your service back. You dont care the reason the service is out. You want it back.

So you try to restore service as quick as possible (Incident mgmt)

Think of it like this

You have a bar full of drinkers watching the SuperBowl or a Championship or Ashes game

You serve (insert beer) on tap. They are drinking it fast.

Your keg empties and you have 20+ beer drinkers wanting beer

Do you a) trying to find the root cause or b) replace the empty keg to restore service

I would do b)

Then the TV reception goes out. (IM part 2). You and your staff frantically try to figure out what's wrong.

You call 'the cable company'
You call your electrician
you call another pub to find out if the incident is only yours or all pubs

One of the bar staff notices a plug is loose and pushes it in and lo you have the game again

still incident mgmt - restoring service

problem mgmt - eitehr reactive or proactive
_________________
John Hardesty
ITSM Manager's Certificate (Red Badge)

Change Management is POWER & CONTROL. /....evil laughter
Back to top
View user's profile
Caperz
Itiler


Joined: Jul 24, 2009
Posts: 23
Location: Sydney, Australia

PostPosted: Mon Jul 27, 2009 12:10 pm    Post subject: Reply with quote

AndyBostian - I agree with Diarmid and UKVIKING. I would just like to add that the way we work with this situation is that we have a criteria for creating a problem investigation being a P2 incident record with a high urgency to resolve. This is usually when a large number of users are impacted and the service outage / interuption is impacting business. An example of that would be a line outage to a site.

The key focus is to restore the service to the business, as soon as possible. The speed of the service restoration, in this case, will be dependant on whether the line provider needs to be involved etc. If a simple restart of the modem or NTU restored service then great... otherwise it may be through a reduced performance solution, such as a wireless service (until the line is back up again).

On the problem management side, the reason for outage would be assesed (if problem investigation is warranted). In some organisations and situations it is decided that since the root cause was outside of the internal IT group's control then it is not worth spending the time and effort to pursue root cause with the vendor. This all depends on how your organisation works. In other cases... service disruptions are assesed against the vendor's agreed uptime and action is taken from there.
_________________
ITIL V3 Capability - Operational Support & Analysis Certified
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    ITIL Forum Index -> Problem Management All times are GMT + 10 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB 2.0.8 © 2001 phpBB Group
phpBB port v2.1 based on Tom Nitzschner's phpbb2.0.6 upgraded to phpBB 2.0.4 standalone was developed and tested by:
ArtificialIntel, ChatServ, mikem,
sixonetonoffun and Paul Laudanski (aka Zhen-Xjell).

Version 2.1 by Nuke Cops 2003 http://www.nukecops.com

Forums ©

 

Logos/trademarks property of respective owner. Comments property of poster. Rest 2004 Itil Community for Service Management & Foundation Certification. SV
Site source copyright (c)2003, and is Free Software under the GNU / GPL licence. All Rights Are Reserved.