Posted: Tue Oct 29, 2013 5:06 am Post subject: Problem Record / Known Error Relationship
Over the last year I've worked to implement problem management in my
IT dept. While I think we've come a long way (from virtually nothing), I think our process may need some re-working to mature further. One of these re-workings is to implement a KEDB and better utilize the known error process.
Previously, we only use(d) known errors in the instance that the resolution for a problem was not implemented (for whatever reason, ex. maybe too expensive, or something), so we classify the PR as a “Known Error”, because it’s known, and the business accepts the risk, + we have a workaround, and close the PR (it’s “known”…no use to let it needlessly age).
So up to this point our process looks like:
A PR is opened, worked, root cause identified, workaround applied (in the interim, if needed)“fixed” (eliminated), PR updated and closed. (note that at no time does this become a KE).
Known Error Process:
A PR is opened, worked, root cause identified, workaround applied, but it is not “fixed” b/c the business either accepts the occurrence, or the fix would not be viable, etc. etc., so it is a “known error” in that we know about it, have a workaround, but are not eliminating the root cause.
However, this has caused a few issues:
1) Root causes for problems are usually identified and fixed in short order – ex, a router goes down, it’s found to be faulty hardware, the hardware is replaced, etc. etc. – and in these instances the transition stage to known error seems redundant, however
2) Sometimes things are just “fixed”, instead of having the root cause eliminated, so often we find ourselves closing PR’s when the situation has been remedied, and not asking the Q – what have we done to make sure this never happens again
3) Sometimes the root cause cannot be determined, so what to do with the problem record at that point
So I guess my question is what advice do you have on defining and transitioning b/t PR and KE’s, and what is the status / relationship b/t the closure of either.
Separately, do you do any kind of reviews (monthly?, by criticallity?) by which you look at the active problems, decide what is viable to work on, and what not, or what is an acceptable "root cause" (Ex: Sometimes we known X MPLS will just fail, and we don't have a workaround) and then for one's you either aren't going to dedicate time to, or accept that the workaround is good enough, what to do with those active problem records (since "technically" they haven't been eliminated).
Interesting question and I imagine that everyone will have a slightly different opinion.
In my humble opinion, once the root cause of a Problem is known, it's no longer a problem but is a Known Error, simply because you know what's causing it now, irrespective of whether you have a workaround or a solution.
In our tool, it's a matter of changing the status from "Problem" to "Known Error", which in itself becomes a KEDB. The trick is to clean it up when Known Errors are eliminated through attrition (system with KE no longer in use), etc...
Known Errors can either be ignored (if there's no impact), managed via application of a workaround, or resolved ("fixed"). How you manage those records is up to you, but we use a mixture of status and closure code which works well enough in our environment. E.g. For one you'd manage via a workaround rather than fix, the status is "Closed" and the Closure Code is "Managed via workaround". A simple search will show you what you need and also provide you with the workaround.
We review weekly, just to make sure our priorities are right and that our workarounds are as robust as possible.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum