Search
Topics
  Create an account Home  ·  Topics  ·  Downloads  ·  Your Account  ·  Submit News  ·  Top 10  
Modules
· Home
· Content
· FAQ
· Feedback
· Forums
· Search
· Statistics
· Surveys
· Top
· Topics
· Web Links
· Your_Account

Current Membership

Latest: KButtensh
New Today: 37
New Yesterday: 71
Overall: 146137

People Online:
Visitors: 58
Members: 0
Total: 58

Languages
Select Interface Language:


Major ITIL Portals
For general information and resources, ITIL and ITSM World is the most well known for both ITIL and ITIL Books. A shorter snapshot approach can be found at ITIL Zone

Related Resources
Service related resources
Service Level Agreement
Outsourcing

Note: ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.


The Itil Community Forum: Forums

ITIL :: View topic - Definition of a Known Error...
 Forum FAQForum FAQ   SearchSearch   UsergroupsUsergroups   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Definition of a Known Error...

 
Post new topic   Reply to topic    ITIL Forum Index -> Problem Management
View previous topic :: View next topic  
Author Message
Jamiey
Newbie
Newbie


Joined: Mar 14, 2006
Posts: 2

PostPosted: Tue Mar 14, 2006 7:28 pm    Post subject: Definition of a Known Error... Reply with quote

Hello all.

Query with regards to defining known errors.

Generally I would take a known error to be you have found the root cause of the problem/diagnosed what the issue is and have potential work around.

However, I have a grey area situation in that: We know what the issue is (a database is corrupted); what the workaround is (rebuild it) and how to premanently fix it (the system is running on legacy infrastructure and will have to be replaced at some point in the next few years) but we don't actually know what is originally causing the database to become corrupted.
Am I correct in my belief that as the root cause has not be identifed that I cannnot put this as a "known error".

Thanks in advance...
Back to top
View user's profile
rjp
Senior Itiler


Joined: Mar 12, 2005
Posts: 255
Location: Melbourne, Australia

PostPosted: Tue Mar 14, 2006 10:09 pm    Post subject: Reply with quote

That is a really interesting question....

I would say, consider it from a management point of view. The Incident > Problem > Known Error > RFC > Change chain of processes is there for a specific purpose: To ensure that services are operating as expected.

In this context it is sufficient to consider a 'root cause' as the combination of 'CI in error status' + 'Action required to restore functionality'. So in the service restoration chain the root cause is the cause [i]of the incident[\i], not the cause of the failure that caused the incident.

The technical 'cause' may need to be understood in some cases, but if the fix is clear without that, there is often no additional value in pursuing it. If you do you are (perhaps) doing the vendor's job for them.

Or in slightly different words:

The 'root cause' is adequately discovered when you can answer 'Why did this happen' and 'What must be done to remedy it'. In your example the root cause was found. There is always a deeper why: Don't go any deeper than is necessary to ensure services are effectively delivered at optimal cost.
Back to top
View user's profile Send e-mail AIM Address Yahoo Messenger
Guerino1
Senior Itiler


Joined: Jan 01, 2006
Posts: 500
Location: New Jersey

PostPosted: Wed Mar 15, 2006 8:32 am    Post subject: Reply with quote

Hi Jamiey,

Some companies I deal with will do the following:

1) Register a Problem as a placeholder for the unknown defect causing the corruption. Because they do not know the formal root cause, they will not classify it as a Known Error.

2) Raise a Risk to address what will happen if the hardware/system is not upgraded.

3) When an Incident of data corruption occurs, they will then bind the Incident to both the Problem and the Risk.

I hope this helps.

Regards,
_________________
[Edited by Admin to remove link]
Back to top
View user's profile Send e-mail Visit poster's website
Itilitarian
Newbie
Newbie


Joined: Apr 27, 2006
Posts: 8

PostPosted: Fri Apr 28, 2006 2:21 pm    Post subject: Re: Definition of a Known Error... Reply with quote

Jamiey wrote:
We know what the issue is (a database is corrupted); what the workaround is (rebuild it) and how to premanently fix it (the system is running on legacy infrastructure and will have to be replaced at some point in the next few years) but we don't actually know what is originally causing the database to become corrupted.


Hi Jamiey,

I have a question: How could you know how to "permanently fix it" if you "don't actually know what is causing the database to become corrupted"?

How do you know that legacy infrastructure is causing the corruption? If your not certain, you're still dealing with a Problem rather than a K.E.
Back to top
View user's profile
eisbergsk
Senior Itiler


Joined: Nov 01, 2004
Posts: 81
Location: Sask, Canada

PostPosted: Fri Apr 28, 2006 3:17 pm    Post subject: Reply with quote

my 2cents ( or in another week, 2 Kronor, as I will be gallivanting around Sweden Smile
I've had a few of these: always major incidents, don't really know the root cause, but upgrade(s) will either fix - or change the infrastructure, so that any problem will be 'new' :-/
I like Frank Guerinos answer, but really, it's just semantics. good semantics, but semantics none the less. I want action.
I will not accept inability to determine root cause as a license to do nothing - no one gets off my PM-hook that easily.
If the answer is 'upgrade', I want problem resolution in 3 months (that's our current slack target). If not resolution, I want mitigation so that no more severe incidents occur before the problem is resolved. Even when waiting for a patch/upgrade from an external vendor, we should ask for mitigation in the meantime.
I poked our managers with this stick again yesterday, and I think the repetition is actually starting to pay off. Either that or they're wearing good padding. But they seemed to get it, & I'm actually looking forward to their reports when I get back in a month.
Best of luck to you!
/Sharon
Back to top
View user's profile Send e-mail
pac111
Newbie
Newbie


Joined: Jan 20, 2007
Posts: 1

PostPosted: Sun Jan 21, 2007 4:29 am    Post subject: Reply with quote

If you don't know the underlying root cause it's a problem not a known error.

PS rebuilding the DB isn't really a workaround either, but a recovery process.

A typical workaround would be to restore the DB from a backup (perhaps leaving in read-only mode temporarily so at least the app is available) , and then apply the transaction log against it to recover any lost data etc.
Back to top
View user's profile
UKVIKING
Senior Itiler


Joined: Sep 16, 2006
Posts: 3313
Location: London, UK

PostPosted: Sun Jan 21, 2007 6:36 am    Post subject: Reply with quote

Also,

What is being corrupted...

The Data in the database which is merely content

or the database itself and its structure - tables, fields

Also, what about the storage medium for the structure - the hard drives etc
_________________
John Hardesty
ITSM Manager's Certificate (Red Badge)

Change Management is POWER & CONTROL. /....evil laughter
Back to top
View user's profile
dboylan
Senior Itiler


Joined: Jan 03, 2007
Posts: 189
Location: Redmond, WA

PostPosted: Mon Jan 22, 2007 12:34 am    Post subject: Re: Definition of a Known Error... Reply with quote

Jamiey wrote:
Hello all.

Query with regards to defining known errors.

Generally I would take a known error to be you have found the root cause of the problem/diagnosed what the issue is and have potential work around.

However, I have a grey area situation in that: We know what the issue is (a database is corrupted); what the workaround is (rebuild it) and how to premanently fix it (the system is running on legacy infrastructure and will have to be replaced at some point in the next few years) but we don't actually know what is originally causing the database to become corrupted.
Am I correct in my belief that as the root cause has not be identifed that I cannnot put this as a "known error".

Thanks in advance...


One of the principles of the ITIL Framework is to take away the IT Cowboy mentality. In the old days, we would do exactly as you suggest. Loosely define an issue and then find a shotgun approach to fix it. And many times we were wrong.

The reason Problem Management is so specific about requiring that the Root Cause (and Configuration Item at fault) be identified is that it forces us in IT to accurately determine what is the failing component before we write up the Request for Change.

It may be that there is an untrained user who is doing something in the app that should never be done during the production day and causing the corruption. In which case the Root Cause is a procedure that is being followed, and the CI at Fault may be the Training Material or New User Training Syllabus.

In fact, by following your example of replacing the entire app with an upgraded version, you may bring the Problem over into the new system. Do you have to work every Problem through to resolution? No. It may be too costly to do the required investigation to identify the Root Cause and CI at Fault. Should you stop implementing a new system just because the old system still has identified Problems? No, the chances are that the majority of outstanding Problems will be addressed if a new system is implemented.

But once the new system is implemented, there needs to be a time period when the old system's Problems are left open to see if they will reoccur in the new system. Because, truthfully, you never did the work required to successfully take them into the Known Error realm.

Some organization's management think that having Problems not followed through to Known Errors is a terrible thing. They shouldn't believe this. It is natural for a mature Problem Management process to uncover many undiagnosed Problems. It is then up to the Process Manager to determine which Problems need the additional resources/time/money spent on them to do full Root Cause analysis.
Back to top
View user's profile
Globis
Itiler


Joined: Apr 17, 2007
Posts: 36
Location: Cape Town

PostPosted: Wed May 02, 2007 11:25 pm    Post subject: Reply with quote

ITIL says:

A known error is an incident or problem for which the root cause is known and for which a temporary or permanent alternative has been identified.

Much of the preceding discussion has hinged on what defines knowing the root cause.

Root cause analysis can only go so far. If you know that your database is being corrupted (let's assume this means the tables are corrupted) under a certain set of circumstances, after a reasonable amount of investigation then you can legitimately conclude that it is something to do with the database application itself if you can find no other reason. So you contact the supplier and open a problem record with them, and they say (predictably) that you need to upgrade to version xyz.

In this case you still don't know exactly what is causing the problem i.e. down to the specific bug in the code, but you do know enough to say it is the the root cause, so of course it is a known error.

Much of IT is now supplied as black box, whether it is a software app, outsourced MPLS service etc. So your own RCA can often only identify which black box is is the problem, and then you are at the mercy of the supplier. Often they will say upgrade, perhaps they already have a known error for the problem.

So I think RJP was exactly right when he said:

Quote:
The 'root cause' is adequately discovered when you can answer 'Why did this happen' and 'What must be done to remedy it'. In your example the root cause was found.


BTW. we had exactly this problem a few years ago with MySQL 4 when tables in our network management app were periodically corrupted. After a lot of investigation we still could not work out why this happened, but we knew we could fix it by rebuilding the tables.

We knew that sooner or later we would have to upgrade to 5.0 and so we accelerated testing on the new platform, upgraded and the problem went away.

For dboylan to suggest (perhaps accidentally) that Jamiey's approach is somehow 'cowboy mentality' is well wide of the mark, especially as he cannot know the extent of testing, investigation that was done.

Dave
Back to top
View user's profile
dboylan
Senior Itiler


Joined: Jan 03, 2007
Posts: 189
Location: Redmond, WA

PostPosted: Sun May 06, 2007 4:37 am    Post subject: Reply with quote

Globis wrote:
ITIL says:

For dboylan to suggest (perhaps accidentally) that Jamiey's approach is somehow 'cowboy mentality' is well wide of the mark, especially as he cannot know the extent of testing, investigation that was done.

Dave


I am sorry if I implied that Jamiey's approach was that of an IT Cowboy. I was trying to (perhaps poorly) draw an analogy of how IT used to handle issues in the past by saying "We don't know the underlying cause of the issue. Let's start spending money to fix it." and how ITIL says we should attempt to resolve errors in the infrastructure.

ITIL says that we must determine Root Cause before we can define a Problem as a Known Error. If that is not possible, then we might implement fixes for the Problem because the Root Cause is unknowable. Having done that, we need to be aware that any attempted resolution might not succeed because we never defined the cause. Hence the need to keep the Problem open until sufficient time has passed and we can be reasonably sure the error is fixed.
Back to top
View user's profile
ranjithraghunathan
Itiler


Joined: May 09, 2007
Posts: 22
Location: Bangalore

PostPosted: Wed May 09, 2007 8:19 pm    Post subject: Reply with quote

Theoretically a known error database maintains a record of all the known/identified resolutions for a problem.
From a KEDB perspective, there is only one relation the Known errors has with the Problem Requests. This relation is the final resolution or workaround that is an outcome of the problem request. Only when any one of these is obtained shall the problem request be closed.

Hence until the resolution or workaround is not obtained, the incident or the problem does not qualify to be a known error in any sense. There is no implication that a known error is a necessary outcome of a problem request.

Moreover if a resolution as a result of Root Cause Analysis has been obtained and has been implemented through a Change Request, the initial problem statement never appears in the KEDB as the problem has been permanently fixed.

Ranjith Raghunathan
ITIL Foundation Certified
Back to top
View user's profile MSN Messenger
jpgilles
Senior Itiler


Joined: Mar 29, 2007
Posts: 123
Location: FRance

PostPosted: Thu May 10, 2007 1:54 am    Post subject: Reply with quote

ranjithraghunathan wrote:
Moreover if a resolution as a result of Root Cause Analysis has been obtained and has been implemented through a Change Request, the initial problem statement never appears in the KEDB as the problem has been permanently fixed.


Hummmm. you might be sticking a little bit too much to the concepts.

A Knowledge base is supposed to provide increased knowledge as time goes on and experiences appear... So I would not remove anything from the KEBD even is the problem is (supposedly permanently) solved, as I would fear to lack some valuable information later on... If you have not (yet) seen incidents and problems reoccuring weeks or months after initial resolution....you are probably quite young Wink

I would only cure KBs of older (?) records as part of the necessary clean-up required to avoid "problems" (space, performance, costs,...).
_________________
JP Gilles
Back to top
View user's profile Send e-mail
ranjithraghunathan
Itiler


Joined: May 09, 2007
Posts: 22
Location: Bangalore

PostPosted: Thu May 10, 2007 6:32 pm    Post subject: Reply with quote

JP - I suggested this because post a Change there could be Configuration changes that has resulted on the IT components. Sometimes it is possible that if a resolution that was available before the change was implemented is prescribed to the user it may cause further complications.

For example an executable file that may work like a patch to fix a problem and a later release fixing the problem permanently.
However if the problem reocurrs and the old executable file is sent to the customer, it may corrupt the application as during the fix implementation the old executable file was not considered to be used for reocurrance of the problem. PHEW! not sure if I am clear.

I am definitely young in the field btw. Razz

Please provide your thoughts JP.

Ranjith Raghunathan
ITIL Foudation Certified
Back to top
View user's profile MSN Messenger
jpgilles
Senior Itiler


Joined: Mar 29, 2007
Posts: 123
Location: FRance

PostPosted: Thu May 10, 2007 8:12 pm    Post subject: Reply with quote

I am not sure I am catching your last point....

The interesting information I want to keep in the KB is not only the solution that was identified (and that may not be adapted as context has evolved) but trather all the findings from the investigation phase.

Let me take an example:

18 months ago, when moving a server to version X.Y of the OS, traffic performance problems arose (very slow response time) . A lot of investigations were made (network, cabling, CPU, server turning, ...) to find out that the firmware on the network card needed a fix. That solved the issue.

Last week the server got upgraded agin: trafic performances arose... If the way you describe incidents and problems is well planned for and strictly followed (another of my BIG subjects...) PM would straight away direct investigations toward the network card firmware .... If the information on the first problem has been deleted, they may spend hours through the same investigations.... (people may have changed, and even not, I would not rely on people memory: that what a KB is suposed to provide...).
The real solution there might be to change the card by a newer version /model in order to avoid implementing fixes every time....(*)

CONCLUSION: the solution that was supposed to be permanent in the first place (problem solved and no more incidents) just proved not to be so permanent....

anyhow, if you spend some time in support, you will build your own experience that will enrich your knowledge of the ITIL framework with concrete aspects...

(*) the example is just for illustration....I acknowledge that proper change management would have determined that the firmware needed to be adapted or the card changed for the OS version to come .....
_________________
JP Gilles
Back to top
View user's profile Send e-mail
svien
Newbie
Newbie


Joined: Feb 13, 2007
Posts: 8

PostPosted: Fri Jul 20, 2007 3:43 am    Post subject: Reply with quote

Globis wrote:
ITIL says:

A known error is an incident or problem for which the root cause is known and for which a temporary or permanent alternative has been identified.

Much of the preceding discussion has hinged on what defines knowing the root cause.

Root cause analysis can only go so far. If you know that your database is being corrupted (let's assume this means the tables are corrupted) under a certain set of circumstances, after a reasonable amount of investigation then you can legitimately conclude that it is something to do with the database application itself if you can find no other reason. So you contact the supplier and open a problem record with them, and they say (predictably) that you need to upgrade to version xyz.

In this case you still don't know exactly what is causing the problem i.e. down to the specific bug in the code, but you do know enough to say it is the the root cause, so of course it is a known error.

Much of IT is now supplied as black box, whether it is a software app, outsourced MPLS service etc. So your own RCA can often only identify which black box is is the problem, and then you are at the mercy of the supplier. Often they will say upgrade, perhaps they already have a known error for the problem.

So I think RJP was exactly right when he said:

Quote:
The 'root cause' is adequately discovered when you can answer 'Why did this happen' and 'What must be done to remedy it'. In your example the root cause was found.


BTW. we had exactly this problem a few years ago with MySQL 4 when tables in our network management app were periodically corrupted. After a lot of investigation we still could not work out why this happened, but we knew we could fix it by rebuilding the tables.

We knew that sooner or later we would have to upgrade to 5.0 and so we accelerated testing on the new platform, upgraded and the problem went away.

For dboylan to suggest (perhaps accidentally) that Jamiey's approach is somehow 'cowboy mentality' is well wide of the mark, especially as he cannot know the extent of testing, investigation that was done.

Dave



We have many Known Errors in our problem dB which are waiting on Vendor upgrade/patches. A lot of the time problem management find the root cause, which is the application itself, so we have to note in the workaround that we are waiting for the new version/patch from the vendor to remove the Error. There isnt actually a workaround but it is a known error at the vendor level.
Would you agree with this approach ?
If we were to not mark this type of problems as KE's due to us not knowing what line of code the Vendor has updated (as suggested in other posts in this thread) I beleive we would have a lot of open problems.
How do other folks mark there Vendor Application related problems ?
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    ITIL Forum Index -> Problem Management All times are GMT + 10 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB 2.0.8 © 2001 phpBB Group
phpBB port v2.1 based on Tom Nitzschner's phpbb2.0.6 upgraded to phpBB 2.0.4 standalone was developed and tested by:
ArtificialIntel, ChatServ, mikem,
sixonetonoffun and Paul Laudanski (aka Zhen-Xjell).

Version 2.1 by Nuke Cops 2003 http://www.nukecops.com

Forums ©

 

Logos/trademarks property of respective owner. Comments property of poster. Rest 2004 Itil Community for Service Management & Foundation Certification. SV
Site source copyright (c)2003, and is Free Software under the GNU / GPL licence. All Rights Are Reserved.