Search
Topics
  Create an account Home  ·  Topics  ·  Downloads  ·  Your Account  ·  Submit News  ·  Top 10  
Modules
· Home
· Content
· FAQ
· Feedback
· Forums
· Search
· Statistics
· Surveys
· Top
· Topics
· Web Links
· Your_Account

Current Membership

Latest: E24A
New Today: 0
New Yesterday: 49
Overall: 148341

People Online:
Visitors: 54
Members: 1
Total: 55 .

Languages
Select Interface Language:


Major ITIL Portals
For general information and resources, ITIL and ITSM World is the most well known for both ITIL and ITIL Books. A shorter snapshot approach can be found at ITIL Zone

Related Resources
Service related resources
Service Level Agreement
Outsourcing

Note: ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.


The Itil Community Forum: Forums

ITIL :: View topic - Known Error: definition versus practice
 Forum FAQForum FAQ   SearchSearch   UsergroupsUsergroups   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Known Error: definition versus practice
Goto page 1, 2  Next
 
Post new topic   Reply to topic    ITIL Forum Index -> Problem Management
View previous topic :: View next topic  
Author Message
sherlock
Newbie
Newbie


Joined: Apr 07, 2006
Posts: 9
Location: Brussels, Belgium

PostPosted: Fri Apr 07, 2006 8:25 pm    Post subject: Known Error: definition versus practice Reply with quote

I would like to have your opinion on the following subject.
By definition, a known error is a problem for which a root cause is known and a workaround or alternative has been found.

In practice, I notice that on repetitive incidents, a problem is triggered and analysts searches for the best alternative to close the incidents, to restore the service disruption. Then they start searching for the root cause. In that case, the definition would be ok.

It can also happen that no workaround is available yet, but the root cause is found, and during error control (when searching for a suitable fix or replacement) we have a valid alternative.
This means that only one condition is met: the root cause is known.

I'm describing the best practices in our company, and I was wondering if the validation and acceptance of a workaround is not a seperate process in problem management, over problem and error control. Also, the trigger for finding a workaround and accepting the workaround is more driven from incident management.

Your opinions plz, and how you described this in your process definition + does ITIL need a correction in its known error statement?
Back to top
View user's profile
Ed
Senior Itiler


Joined: Feb 28, 2006
Posts: 411
Location: Coventry, England

PostPosted: Fri Apr 07, 2006 11:15 pm    Post subject: Reply with quote

sherlock

Can you please clarify this for me - you said:

By definition, a known error is a problem for which a root cause is known and a workaround or alternative has been found.

and then:

It can also happen that no workaround is available yet, but the root cause is found, and during error control (when searching for a suitable fix or replacement) we have a valid alternative.

Surely here both requirements are met i.e. root cause is known and an alternative is available?

Regards

Ed
Back to top
View user's profile
sherlock
Newbie
Newbie


Joined: Apr 07, 2006
Posts: 9
Location: Brussels, Belgium

PostPosted: Sat Apr 08, 2006 12:11 am    Post subject: Reply with quote

phases in problem control:

identification => classification => diagnosis to RC => KE

phases in error control:

assessment of KE => solution => closure of error and problem(s)


assume, you don't have a workaround, you start identifying the problem, start RCA and at the end found the root cause
you fix the failing component, and start planning the change
so, you passed the phase of "KE" and are working in error control
at that point you find a workaround for closing your incidents
only then the two conditions are met

the definition in the phase KE assumes you have a root cause and a workaround; while you can have found your failing component without having an alternative available

I hope it's clear Wink
Back to top
View user's profile
Fabien
Senior Itiler


Joined: Sep 27, 2005
Posts: 207

PostPosted: Sat Apr 08, 2006 7:38 am    Post subject: Reply with quote

That's an interesting question. I don't have an answer per se but I would be tempted to call it a known problem Smile
_________________
BR,
Fabien Papleux

Accenture
Technology Consulting | Service Excellence
Red Badge Certified

Twitter @itilgeek
Back to top
View user's profile Send e-mail Visit poster's website MSN Messenger
Ed
Senior Itiler


Joined: Feb 28, 2006
Posts: 411
Location: Coventry, England

PostPosted: Tue Apr 11, 2006 4:47 pm    Post subject: Reply with quote

I've tried to make it come out different, but can't - I still think it is a Known Error - I guess someone out there will tell me different?

I don't think that because we've passed out of root cause analysis the rules stop working - that is too much silo mentality, the one thing we as ITIL champions are trying to avoid.

Still a KE.

Regards

Ed
Back to top
View user's profile
Itilitarian
Newbie
Newbie


Joined: Apr 27, 2006
Posts: 8

PostPosted: Fri Apr 28, 2006 6:20 am    Post subject: Reply with quote

A condition that you need to bear in mind is that the relationship between Incident and Problem can be somewhat linear, ie) incident resolved... then problem mgt takes over.

Ed, you can have a condition where you know root cause but no workaround in place. I would call that an Incident! The workflow and corresponding resolution strategy would still fall under the jurisdiction of Incident Mgt. According to the ITIL books...

Investigation and diagnosis may become an iterative process, starting with a different specialist support group and following elimination of a previous possible cause. It may involve multisite support groups and support staff from different vendors. It may continue overnight with a new shift of support staff taking over the next day. All this demands a rigorous, disciplined approach and a comprehensive record of actions taken with corresponding results.

Tip:

If it is not clear which support group should investigate or resolve a User-related Incident, the Service Desk, as the owner of all Incidents, should coordinate the Incident Management process. If there are differences of opinion or there are any other issues arising, then the Service Desk should escalate the Incident to the Problem Management team.


In otherwords, as long as the service interuption/degredation existins (by virtue of there not being a suitable workaround or alternative) work is performed under Incident Mgt. If all else fails, engage Problem!

Hope this helps!
Back to top
View user's profile
JasonUK
Newbie
Newbie


Joined: Apr 24, 2006
Posts: 3

PostPosted: Fri Apr 28, 2006 6:57 am    Post subject: Reply with quote

Quote:
assume, you don't have a workaround, you start identifying the problem, start RCA and at the end found the root cause
you fix the failing component, and start planning the change


A problem is the underlying cause of one or more incidents. It will become a known error when the root cause and a temporary workaround or a permanent fix is identified.

You have a root cause and you have identified a permanent fix (replacing the component). Therefore you have a known error.

Jason.
Back to top
View user's profile
Guerino1
Senior Itiler


Joined: Jan 01, 2006
Posts: 500
Location: New Jersey

PostPosted: Fri Apr 28, 2006 11:50 am    Post subject: Reply with quote

JasonUK wrote:
A problem is the underlying cause of one or more incidents. It will become a known error when the root cause and a temporary workaround or a permanent fix is identified.

You have a root cause and you have identified a permanent fix (replacing the component). Therefore you have a known error.

Jason.


Jason,

I figured I'd add some more information to the mix.

Many development organizations would not agree with this definition. To many organizations, a "Known Error" is one for which they can reproduce the problem, typically with a repeatable testcase of some form. This is the only criteria for it to be a "Known Error". In other words, to many organizations, a KE is a Problem that has been verified to be an accurate and repeatable error.

A KE does not have to have an identified fix, as the fix may not be scheduled for a number of Releases in the future and coming up with the fix, itself, may not be possible until someone spends a great deal of time analyzing/debugging the problem and evaluating options to fix it.

A KE does not have to have a workaround, as many Problems may never have a valid or acceptable workaround. Example: It may be acceptable to have a memory leak in a product, where memory randomly gets reset by other functionality in the product, making it benign to the End Users. In this case, the development team(s) may make a decision not to worry about it for a very long period of time, as it's hurting no one.

A Problem, until it is verified to be a "Known Error" or a "Repeatable Error" is typically a perceived Problem or can even be an anticipated Problem that needs to be addressed, at some point. In these cases, a Problem may never actually progress to become a "Known Error".

Development teams will typically work with stakeholders such as Product Managers, Marketing, Sales, and Customers to prioritize which Problems will or won't be addressed to improve future Releases of Products and/or Services. They will use this Problem list, in conjunction with their list of new feature Requirements and Risks that will drive work in these future Releases. These teams will not typically close a Problem until they have some formal signoff that proves that the Problem has been fixed completely, typically from the stakeholder(s) that originally identified the Problem or were victims of the Incidents that were symtomatically caused by the Problem.

I hope this helps.

Regards,
_________________
[Edited by Admin to remove link]
Back to top
View user's profile Send e-mail Visit poster's website
Itilitarian
Newbie
Newbie


Joined: Apr 27, 2006
Posts: 8

PostPosted: Fri Apr 28, 2006 2:14 pm    Post subject: Reply with quote

Guerino1 wrote:

A KE does not have to have a workaround, as many Problems may never have a valid or acceptable workaround. Example: It may be acceptable to have a memory leak in a product, where memory randomly gets reset by other functionality in the product, making it benign to the End Users. In this case, the development team(s) may make a decision not to worry about it for a very long period of time, as it's hurting no one.



Frank, I agree with your point and I'd take your explanation a little further. The K.E. is likely related to the concept of Proactive Problem Mgt. This memory leak may have accompanied the application during the transition into production. If it is benign to users, and no service-related impact is present, there really isn't a need for a workaroud because by default the workaround is intended to reduce the impact of repeat instances of an existing Problem or KE. HOWEVER, The second this benign KE exerts impact on a user, enough to record an Incident, any tactic used to reduce the impact, or restore service would essentially become the Workaround in the KE record and that "proactive" problem would be recognized as a conventional reactive one with an associated workaround.

So Sherlock, to answer your question, in the 'traditional' or 'reactive' sense of problem mgt, there should always be a workaround to a given KE, and the workaround addresses the action to take on the repeat occurence of incidents. However, in the absense of Incidents caused by that KE, your KE can still exist without a workaround if it's recognized as a "Proactive Problem"
Back to top
View user's profile
Ed
Senior Itiler


Joined: Feb 28, 2006
Posts: 411
Location: Coventry, England

PostPosted: Fri Apr 28, 2006 5:07 pm    Post subject: Reply with quote

Itilitarian wrote:
Ed, you can have a condition where you know root cause but no workaround in place. I would call that an Incident! The workflow and corresponding resolution strategy would still fall under the jurisdiction of Incident Mgt.


My point here was that if you take Sherlocks situation

"It can also happen that no workaround is available yet, but the root cause is found, and during error control (when searching for a suitable fix or replacement) we have a valid alternative.
This means that only one condition is met: the root cause is known"

I disagree with him because you have an alternative. This makes it a Known Error for me.

Regards

Ed
Back to top
View user's profile
rjp
Senior Itiler


Joined: Mar 12, 2005
Posts: 255
Location: Melbourne, Australia

PostPosted: Fri Apr 28, 2006 6:17 pm    Post subject: Reply with quote

I'm with Frank on this one.

The term "known error' is pretty unambiguous. If you know which CI(s) is in error then you know that and record it. It would be very odd to identify the root cause of an incident (or potential incident through proactive PM) and decide that you could not record and manage it under the error control subprocess.

I don't think for a second the Problem Management authors intended this.

A known error without a corresponding work around or solution/rfc would still be a known error - it would not be a 'special' kind of problem or incident. However one of two things are certain:

If you are instantiating and recording known errors by setting a status value on your problem record the incident records should stay in an unresolved state if no workaround is available. (Of course if you just want to keep you information management consistent you could have a boilerplate 'workaround' that says - 'decided to live with it' - to cover such cases, provided the real activity behind that 'workaround' was a negotiated and documented agreement from the customer on that course of 'inaction'.

If you are keeping separate error records from your problem records (there are some good reasons to do so), you would 'resolve' your problem. A problem is an 'unknown' cause of one or more actual or potential incidents - so if it's not unknown there is no 'problem' in terms of this process (natural language is a different thing again.) But as above you would leave the incident(s) open or have a special case handler.
Back to top
View user's profile Send e-mail AIM Address Yahoo Messenger
Itilitarian
Newbie
Newbie


Joined: Apr 27, 2006
Posts: 8

PostPosted: Fri Apr 28, 2006 11:36 pm    Post subject: Reply with quote

rjp wrote:
However one of two things are certain:

If you are instantiating and recording known errors by setting a status value on your problem record the incident records should stay in an unresolved state if no workaround is available.

If you are keeping separate error records from your problem records (there are some good reasons to do so), you would 'resolve' your problem. A problem is an 'unknown' cause of one or more actual or potential incidents - so if it's not unknown there is no 'problem' in terms of this process (natural language is a different thing again.) But as above you would leave the incident(s) open or have a special case handler.


RJP, I agree with your comments, and that was really the point that I was making, that the Incident should remain unresolved. The reason I stated that IM would be the jurisdiction under which the issue gets addressed is because if you read the "TIP" from The Book in my initial post, you get the distinct impression that resolution activities for that WIP incident are still handled as an incident. If the investigation goes 'pear-shaped', then Problem can be engaged, even while the incident remains unresolved.

However, I do have a question about your comment:

"A known error without a corresponding work around or solution/rfc would still be a known error " - RJP

Back to Sherlocks's original question, how could this be if the definition of a KE is "An Incident or Problem for which the root cause is known AND for which a temporary Work-around or a permanent alternative has been identified."
Back to top
View user's profile
rjp
Senior Itiler


Joined: Mar 12, 2005
Posts: 255
Location: Melbourne, Australia

PostPosted: Sat Apr 29, 2006 3:27 am    Post subject: Reply with quote

Itilitarian wrote:

Quote:
Back to Sherlocks's original question, how could this be if the definition of a KE is "An Incident or Problem for which the root cause is known AND for which a temporary Work-around or a permanent alternative has been identified."


It can't. But I for one feel very comfortable putting this down to a poor choice of words. There are more glaring clunkers in the books than this one. After all the aim here is intelligent application, not exegesis.

However, some exegetical light begins to dawn... looking again at the offending passage I was struck by a slightly different discordant note: The conflating of Incidents and Problems - the Incident Life Cycle doesn't produce known errors. But I believe the problem and incident management chapters didn't have the same authors anyway.

Perhaps the following is worth considering:

The sentence reflects the situation of the majority case of how incidents will be viewed from a problem managment perspective.

The most pressing problems are going to be raised on those incidents that were not resolved in incident management.

In these cases resolution of those incidents becomes dependent on Problem management, which normally will, a) find the root cause and raise an RFC, but also b) work equally hard to get a workaround so that Serivce restoration doesn't have to wait for the implementation of the RFC.

In the majority of these cases the work around will be arrived at once the error is identified, (otherwise Incident Management would have got there first.)

But if the question is really about closing incidents there is only one hard rule - you can't close an incident while the Serivce is disrupted. And the corollary - you must close it if Service is restored - whether by work around or not. So in the end it's moot. The guidelines in the Incident Management chapter trump Problem Management on this point.
Back to top
View user's profile Send e-mail AIM Address Yahoo Messenger
JasonUK
Newbie
Newbie


Joined: Apr 24, 2006
Posts: 3

PostPosted: Sun Apr 30, 2006 7:30 pm    Post subject: Reply with quote

Quote:
Many development organizations would not agree with this definition. To many organizations, a "Known Error" is one for which they can reproduce the problem, typically with a repeatable testcase of some form. This is the only criteria for it to be a "Known Error". In other words, to many organizations, a KE is a Problem that has been verified to be an accurate and repeatable error.

A KE does not have to have an identified fix, as the fix may not be scheduled for a number of Releases in the future and coming up with the fix, itself, may not be possible until someone spends a great deal of time analyzing/debugging the problem and evaluating options to fix it.


The critical part of problem management according to ITIL is that the problem manager has responsibility for a problem throughout it's lifecycle.

In a development environment, steps to reproduce an error are an essential component of of the problem record, but I would not consider a reproducable problem a known error.

A known error indicates that the analysing/debugging has been completed (ie. a root cause) and a fix proposed (might well be a patch). This indicates the end of the investigative processes that locate the problem and identify a fix. If you do not identify a fix and/or workaround then you still have a problem.

Perhaps where you are going wrong is assuming that problem management themselves need to perform the root cause analysis and develop the fix. It may well be that the Problem Manager highlights the problem with the development manager to get resources assigned to the problem.

Although at this point the problem has been delegated to the development manager, the problem manager still needs to track it's progress and ensure that it is ultimately fixed.

If the problem has little or no business impact then of course it can be left alone, invoking the problem/change/release process for a very minor fault is not justified. Bashing one minor bug has the potential to generate side effects that will cause problems with a greater business impact.

Very low business impact problems should still be noted and kept open as problems as it may well be worth resolving in a major update to your software package when a major testing process will be undertaken.

Jason.
Back to top
View user's profile
BGolum
Newbie
Newbie


Joined: Oct 16, 2006
Posts: 1

PostPosted: Tue Oct 17, 2006 4:39 am    Post subject: Open vs. closed Reply with quote

JasonUK wrote:
Quote:
Many development organizations would not agree with this definition. To many organizations, a "Known Error" is one for which they can reproduce the problem, typically with a repeatable testcase of some form. This is the only criteria for it to be a "Known Error". In other words, to many organizations, a KE is a Problem that has been verified to be an accurate and repeatable error.

A KE does not have to have an identified fix, as the fix may not be scheduled for a number of Releases in the future and coming up with the fix, itself, may not be possible until someone spends a great deal of time analyzing/debugging the problem and evaluating options to fix it.


The critical part of problem management according to ITIL is that the problem manager has responsibility for a problem throughout it's lifecycle.

In a development environment, steps to reproduce an error are an essential component of of the problem record, but I would not consider a reproducable problem a known error.

A known error indicates that the analysing/debugging has been completed (ie. a root cause) and a fix proposed (might well be a patch). This indicates the end of the investigative processes that locate the problem and identify a fix. If you do not identify a fix and/or workaround then you still have a problem.

Perhaps where you are going wrong is assuming that problem management themselves need to perform the root cause analysis and develop the fix. It may well be that the Problem Manager highlights the problem with the development manager to get resources assigned to the problem.

Although at this point the problem has been delegated to the development manager, the problem manager still needs to track it's progress and ensure that it is ultimately fixed.

If the problem has little or no business impact then of course it can be left alone, invoking the problem/change/release process for a very minor fault is not justified. Bashing one minor bug has the potential to generate side effects that will cause problems with a greater business impact.

Very low business impact problems should still be noted and kept open as problems as it may well be worth resolving in a major update to your software package when a major testing process will be undertaken.



Jason,

I believe we are all in agreement. The nuance that allows this, is an open known error versus a closed known error.

The open known error would be a reproducable case which results in an error.

The closed known error would include the appropriate fix to resolve the reproducable case.

Brian
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    ITIL Forum Index -> Problem Management All times are GMT + 10 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB 2.0.8 © 2001 phpBB Group
phpBB port v2.1 based on Tom Nitzschner's phpbb2.0.6 upgraded to phpBB 2.0.4 standalone was developed and tested by:
ArtificialIntel, ChatServ, mikem,
sixonetonoffun and Paul Laudanski (aka Zhen-Xjell).

Version 2.1 by Nuke Cops 2003 http://www.nukecops.com

Forums ©

 

Logos/trademarks property of respective owner. Comments property of poster. Rest 2004 Itil Community for Service Management & Foundation Certification. SV
Site source copyright (c)2003, and is Free Software under the GNU / GPL licence. All Rights Are Reserved.