Search
Topics
  Create an account Home  ·  Topics  ·  Downloads  ·  Your Account  ·  Submit News  ·  Top 10  
Modules
· Home
· Content
· FAQ
· Feedback
· Forums
· Search
· Statistics
· Surveys
· Top
· Topics
· Web Links
· Your_Account

Current Membership

Latest: GHoman
New Today: 48
New Yesterday: 87
Overall: 139314

People Online:
Visitors: 65
Members: 2
Total: 67 .

Languages
Select Interface Language:


Major ITIL Portals
For general information and resources, ITIL and ITSM World is the most well known for both ITIL and ITIL Books. A shorter snapshot approach can be found at ITIL Zone

Related Resources
Service related resources
Service Level Agreement
Outsourcing

Note: ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.


The Itil Community Forum: Forums

ITIL :: View topic - PM Scenario
 Forum FAQForum FAQ   SearchSearch   UsergroupsUsergroups   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

PM Scenario

 
Post new topic   Reply to topic    ITIL Forum Index -> Problem Management
View previous topic :: View next topic  
Author Message
davis3792
Newbie
Newbie


Joined: Mar 31, 2006
Posts: 4

PostPosted: Sat Apr 01, 2006 1:41 pm    Post subject: PM Scenario Reply with quote

Hey folks,
I'd like to define a scenario/use case and get some opinions on how ITIL would specify it be handled.

8:00am a critical application process on a windows server dies
8:01am an agent generates an snmp trap and sends to an event browser
8:05am tech personnel sees trap message and creates a corresponding incident record
8:07am tech personnel logs into system and restarts application process
8:10 am tech confirms that application is back up - but no root cause has been established.

What is the next step from an Incident Management perspective?

Based on the following assumptions, what is the next step from an ITIL Problem Management perspective? What might the full lifecycle of the Problem look like?

==========
Assume that if so inclined, someone could review system metrics and see that memory consumption had steadily been increasing on the server. No other information is immediately available.
However, someone, if diligent could check the process occasionally and see that it's memory consumption has been growing steadily since it was restarted. From there someone could also check the vendor's knowledge base and see that a patch has been published for a known memory leak.
==========

Thanks
Back to top
View user's profile
itilimp
Senior Itiler


Joined: Jan 20, 2006
Posts: 172
Location: England

PostPosted: Sun Apr 02, 2006 3:18 am    Post subject: Reply with quote

In this case from the point of incident management, the incident can be resolved because the system is working and not causing a negative impact to the users's services.

However, if operating problem management you would raise a new problem record against so that the root cause of the incident can be identified. The actions you've mentioned such as checking memory and known issues with vendor would be part of the investigation that an analyst working on problem records would go through. If the vendor does have a patch for the memory leak, at this point (correct me if I am wrong someone!) you can create a known error record because you know what the root cause is, and you have a workaround for it - even though it is not in place yet.

You then raise a change record to apply the patch the server concerned. This risk of this would be considered by the change manager and cab according to your change management process, then, assuming they approve and add it to the schedule of changes, someone applies the patch. At this point the change, problem, and known error records can be closed. (It may be that you have a set period of time after the patch goes in to monitor the change before actually closing it from a qa point of view.)
Back to top
View user's profile Visit poster's website
Nikhil
Newbie
Newbie


Joined: Mar 21, 2006
Posts: 17
Location: India

PostPosted: Wed Apr 05, 2006 11:47 pm    Post subject: Reply with quote

Hi,
Should Known Error be tracked as an attribute on the problem record or should it be tracked as a seperate record on its own?
Do let me know your views.

Nikhil
_________________
Regards,
Nikhil Kulkarni.
Application Analyst
cMango.. The Services Management Company

The taste of low quality lingers long after the satisfaction of low price.
Back to top
View user's profile MSN Messenger
Guerino1
Senior Itiler


Joined: Jan 01, 2006
Posts: 500
Location: New Jersey

PostPosted: Thu Apr 06, 2006 12:17 am    Post subject: Reply with quote

Nikhil wrote:
Should Known Error be tracked as an attribute on the problem record or should it be tracked as a seperate record on its own?


Hello Nikhil,

Based on my experience, most organizations like to simply have it be represented as an attribute on the Problem record. They find it easier to simply "tag" an existing record than to enter a new one. However, it's not wrong to break it out as a separate record if you desire.

The feedback we get is that creating another record for "Known Error" is tedious and highly redundant, as most of the information is in the Problem record, already, including the fact that the Problem will have mitigation details, scheduled releases to correct it, transparency, as it ties to specific Changes, etc. To clone or re-enter all of this data into another separate record is not something that most people "want" to do. We've found that it's simply easier to select a field that tags a Problem as a KE and enhance the existing information in the Problem record to reflect actions and details around the KE.

As a result of all of this, in the system we offer, we simply allow users to tag a Problem as a KE. They have the ability to easily filter out Known Error Problem records from Unknown Error Problem records and that seems to keep partners satisfied.

Again, neither way is wrong. Capturing and managing data is all good. As long as you can successfully pitch either one, your enterprise will win.

Regards,
_________________
[Edited by Admin to remove link]
Back to top
View user's profile Send e-mail Visit poster's website
Guerino1
Senior Itiler


Joined: Jan 01, 2006
Posts: 500
Location: New Jersey

PostPosted: Thu Apr 06, 2006 12:25 am    Post subject: Reply with quote

itilimp wrote:
You then raise a change record to apply the patch the server concerned.


Hello itilimp,

We find that most organizations break this down even further. At the point in your process description where you recommend entering a Change record to apply the patch, they actually enter a Service Request to apply the patch. The Service Request results in a Task or series of Tasks for Service Groups to execute. The Service Group doing the work will then make a decision as to whether the work necessary is big enough to warrant a Release (that groups a number of Changes together) or a Change. Upon making this decision such Service Groups will create the new Release and/or Change records and then act upon them, appropriately, as they move through the process.

I hope this helps.

Regards,
_________________
[Edited by Admin to remove link]
Back to top
View user's profile Send e-mail Visit poster's website
lexxone
Newbie
Newbie


Joined: Mar 31, 2006
Posts: 5
Location: Atlanta, GA, USA

PostPosted: Thu Apr 06, 2006 1:40 am    Post subject: PM Scenario (expanded) Reply with quote

Some questions have come up as a result of reading this thread. First, I need to restate the scenario.
I've modified it to show how the various ITSM processes AND the ICTIM processes would fit. These are the acronyms in use: ICTO is ICTIM Operations; IM is Incident Management; PM is Problem Management; CM is Change Management; AU is Automation; SD is Service Desk.

1. 8:00am a critical application process on a windows server dies - AU
2. 8:01am an agent generates an snmp trap and sends to an event browser - AU
3. 8:05am tech personnel sees trap message and creates a corresponding incident record - ICTO (can be AU)
4. 8:07am tech personnel logs into system and restarts application process - ICTO
5. 8:10 am tech confirms that application is back up - but no root cause has been established. - ICTO
6. day end SD Manager runs reports of incident volumes and distributes - SD
7. next day PM manager reviews reports and sees multiple incidents over a period of time against this server for this issue - PM
8. next day PM tech opens a problem against this issue and begins Root Cause Analysis - PM
9. later PM tech finds root cause (memory leak), finds solution (patch) and flags the problem as a known error (including the recycle as the workaround) - PM (I am agreeing with Frank on this approach to KE's.)
10. later PM tech opens an RFC to get the patch installed - PM
11. later change manager does impact analysis and gets CAB approval - CM
12. later change implemented via tasks to ICTO personnel - ICTO (possible that the SD can also implement, depending on your company's stance. our SD isn't to that maturity level, yet.)

So, first question for all is, how do you integrate ICTIM into your ITIL implementations (which are usually ITIL ITSM implementations)?
As for PM work, please respond with how you do RCA work? Do you have a specific methodology? How robust is it? For example, if the solution to the problem were a bit more complex than a vendor's patch, maybe the RCA would reveal that it wasn't cost-justified to solve the known error? IN that case, step 10 would not be completed since it's been decided to "live with it". If we follow this course, step 10 would be replaced with something like this: ICTO modifies the monitoring and alerting to accumulate, say, 10 of the SNMP alerts for this issue before an incident is created. (Of course, each time an SNMP alert pops for this issue, an ICTO tech would perform the workaround of a recycle, just without a logged ticket except for the 10th time.) In other words, to prevent incidents being created for something we've decided to live with, we might choose to not track every time we recycle due to this issue. After all, the SNMP logs would have all of the historical information, so why clog our ticket database with meaningless FYI tickets.
Another question, and this one for Frank Guerino, when working with known problems (or unknown error problems as you called them), the same sort of thing could happen - i.e. doing RCA reveals that determining the root case would be a waste of time compared to simply executing a work around every time the issue comes up. In that case, you'd have a known problem that has been decided to be kept as a known problem - never moving to the known error and, ultimately, the elimination steps of the process. How would you deal with these?
Back to top
View user's profile
Guerino1
Senior Itiler


Joined: Jan 01, 2006
Posts: 500
Location: New Jersey

PostPosted: Thu Apr 06, 2006 3:41 am    Post subject: Re: PM Scenario (expanded) Reply with quote

lexxone wrote:
Another question, and this one for Frank Guerino, when working with known problems (or unknown error problems as you called them), the same sort of thing could happen - i.e. doing RCA reveals that determining the root case would be a waste of time compared to simply executing a work around every time the issue comes up. In that case, you'd have a known problem that has been decided to be kept as a known problem - never moving to the known error and, ultimately, the elimination steps of the process. How would you deal with these?


Hello Lexxone,

It depends on the organization we deal with. Many will simply keep the Problem open, with a resolution statement that states that there is no intention to resolve the Problem in the near term. Others will do this "and" tag it as a Known Error, to ensure they can quickly bring the Problem up, on search, keeping the Root Cause blank or putting in a comment stating it hasn't been found yet and why.

However, the workaround you mention would typically be found in the Incident details, not the Problem, as it's an Incident that will be "the issue that comes up", at the Help Desk, not the Problem. The HD staff will look in their Knowledge Management System to see if an Incident has ever occured before, what the resolution is, whether or not there are repeat occurances of the Incident, whether or not a formal Problem has been registered against the Incident, what the state of the Problem is, what work has been scheduled (or not) to address the Problem, what Product Release the Changes are scheduled for, who's worked on the associated Incidents, Problems, Products, Releases, Changes, what documentation is associated with each of the Incidents, Problems and so on, etc., etc., etc. Success for Help Desk staff and alternate support resources is quick access to and transparency into any and all details that impact their customers' satisfaction.

I hope this answers your questions.

Regards,
_________________
[Edited by Admin to remove link]
Back to top
View user's profile Send e-mail Visit poster's website
davis3792
Newbie
Newbie


Joined: Mar 31, 2006
Posts: 4

PostPosted: Thu Apr 13, 2006 12:45 pm    Post subject: Reply with quote

I'd like to back up for a moment and discuss the use case. And I'd like to focus wring out the answers one at a time - no allowing ourselves to wander.

At step 6 & 7 you stated:
Quote:
6. day end SD Manager runs reports of incident volumes and distributes - SD

I'm ok with this.

Quote:
7. next day PM manager reviews reports and sees multiple incidents over a period of time against this server for this issue - PM


First of all, this use case didn't specify anything about multiple incidents over a period of time. Let's say this is the first time they've seen this particular problem. So there isn't a history of problems with this CI. What we have is an incident w/o a root cause.

QUESTION: All too frequently I see incident's closed (appropriately) w/o any subsequent RCA to try to reduce the chance of the problem happening again. ACCORDING TO ITIL, does every incident w/o a root cause trigger the creation of a problem? If not, why?
Back to top
View user's profile
eisbergsk
Senior Itiler


Joined: Nov 01, 2004
Posts: 81
Location: Sask, Canada

PostPosted: Fri Apr 14, 2006 8:56 am    Post subject: Reply with quote

I had to look in the Book: since the goal of PM is to minimize the adverse impact of Incidents & Problems on the business that are caused by errors in the IT infrastructure, the existence or lack thereof of a Root Cause may not necessarily be THE deciding factor on whether or not to open a problem.
Isn't it Up to the business itself to decide? or am I splitting hairs?
Back to top
View user's profile Send e-mail
JasonUK
Newbie
Newbie


Joined: Apr 24, 2006
Posts: 3

PostPosted: Tue Apr 25, 2006 10:15 am    Post subject: Reply with quote

davis3792 wrote:

QUESTION: All too frequently I see incident's closed (appropriately) w/o any subsequent RCA to try to reduce the chance of the problem happening again. ACCORDING TO ITIL, does every incident w/o a root cause trigger the creation of a problem? If not, why?


It's not incident managements job to investigate root causes or create problem records.

The problem manager should be reviewing incident statistics to identify problems, prioritise them and log them. Then decide wether or not they currently justify further investigation and any RFCs as a result.

A one off incident with low business impact would not be an efficient use of problem management resources. If incident management were to create problem records for every incident then incident management would be slowed down and the problem manager would be overwhelmed.

Of course there is nothing preventing incident management from highlighting problems. However in most cases incident management are more likely to highlight problems from those that shout the loudest or are tedious rather than those problems once solved would give the biggest increase in the efficency of incident management.

Remember that incident management includes escalation through to higher levels of technical support. It's assumed that through these channels all incidents can be resolved using the incident management process. During the incident management process problem solving skills will no doubt be used to identify and workaround or fix the incident however this is still incident management.

Let's take a simple example...

A user logs an incident. They have no network connection.

Through incident management it's determined that the fault is with the users network wall socket. A nearby wall socket is used to get the user running again and action is taken for the socket to be replaced.

Whilst we have a "problem", then a "known cause" followed by a "circumvention" followed by a "permanent resolution", it's not a problem, that's standard incident management.

However, our eagle eyed problem manager has spotted that 8 of these sockets have failed in the past 4 months. This is a high failure rate, a problem is logged and investigation begins.

We already have a circumvention as we can replace the network sockets. However we still have no idea why we have the failures.

It's determined that the underlying cause is some poor quality installation work undertaken by the contractor who installed the network points. We now have the underlying cause in addition to our workaround.

So we now have a known error.

For permanent resolution we haul back in the contractors to check and replace any additional faulty network points - This will go through change management if appropriate.

We might now put some processes in place or review our relationship with that contractor to prevent a reoccurence.

The problem is resolved.

Does this make sense?

Jason.
Back to top
View user's profile
bobtom115
Newbie
Newbie


Joined: Jun 19, 2009
Posts: 1

PostPosted: Fri Jun 19, 2009 7:26 pm    Post subject: Reply with quote

Hi there, Would a new one like me be welcome here?
Thanks so much in deed.


pret personnel
Back to top
View user's profile
UKVIKING
Senior Itiler


Joined: Sep 16, 2006
Posts: 3292
Location: London, UK

PostPosted: Sat Jun 20, 2009 12:51 am    Post subject: Reply with quote

Sigh

We dont welcome people

What usually happens is one of the following

You the new person asks a very basic question

1 - I as the unnominated, un official Grumpy ole git will say

have you ever heard of search ? Google or Yahoo ?

Diamid another GoG but not as grumpy will may warn you about me before or after

tehn of course.. the band of demons will join in...

Johnny rosin'd up his guitar

whoa... channelng Charly Daniels for a minute

several people will make snarky reference
one person or more will refer to the 6th book

and then you are well here
_________________
John Hardesty
ITSM Manager's Certificate (Red Badge)

Change Management is POWER & CONTROL. /....evil laughter
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    ITIL Forum Index -> Problem Management All times are GMT + 10 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB 2.0.8 © 2001 phpBB Group
phpBB port v2.1 based on Tom Nitzschner's phpbb2.0.6 upgraded to phpBB 2.0.4 standalone was developed and tested by:
ArtificialIntel, ChatServ, mikem,
sixonetonoffun and Paul Laudanski (aka Zhen-Xjell).

Version 2.1 by Nuke Cops 2003 http://www.nukecops.com

Forums ©

 

Logos/trademarks property of respective owner. Comments property of poster. Rest 2004 Itil Community for Service Management & Foundation Certification. SV
Site source copyright (c)2003, and is Free Software under the GNU / GPL licence. All Rights Are Reserved.