Posted: Mon Feb 27, 2006 2:01 pm Post subject: Incident classification
I am curious how to draw the conclusion that no root cause is being drawn based on what I described. Point being, I'm interested in knowing exactly what you mean. I agree you need to classify the errors to do root cause analysis, but to do that, don't you need to break the error into components? Just curious what exactly you meant there. I obviously simplified what I described and perhaps too much. I don't agree that anyone really can or should take what is being used in one environment and duplicate it in another. Environments and goals differ. I simply posted that in response to a call for examples on how people have classified helpdesk requests.
There ARE best practice guidelines for doing so, many and varied, although most of the basics remain the same. I believe ITIL relies heavily on the six sigma guidelines for incident classification?
My point about "common sense" was probably another over-simplification. The point is that ITIL is yet another set of guidelines for best practice. It's not magic, and while it does have some philosophy differences from other "best practice" guidelines, it basically calls for the use of best practices with a focus on improving customer support, correct? I'm also unaware of any hard and fast rules or materials that indicate, for instance, exactly how one should set up categories for a helpdesk, for example, instead relying on the implementor's application of ITIL principles in light of his/her organizational goals to set these things up.
Using best practices, one needs to break calls into classifications by which metrics can be drawn and RCA be done based upon those as well as other information contained in the ticket, correct? How do you see classification being broken down? I'd like to see some other examples as well in order to discuss the merits of such a system and how it works in practical application. Without good classification, I'm sure we'd all agree that RCA and service improvement in general is a difficult proposition.
Also, if one used ISO guidelines, research from industry leaders (e.g. HDI, Gartner, etc.) is one not better suited and prepared for ITIL implementation? Also, I'd like to hear some input on how to go about and the benefits and pitfalls of trying to implement ITIL for problem management for example, without having organizational commitment to configuration management and the other areas. I'm facing that currently and see problems inherent with the implementation when configuration management in particular is missing. Any help appreciated!
Joined: Mar 12, 2005 Posts: 255 Location: Melbourne, Australia
Posted: Mon Feb 27, 2006 11:06 pm Post subject:
Fair enough. Let me start by saying my post was not intended to be personal or critical. However, on reading it, I think it did come across that way - so apologies. Yes your post was answering the question directly. If I could I would remove the 'caution' sentence, and rephrase a couple of others.
Let me respond to your last paragraph first, as it will balance my previous post somewhat. ITIL processes are extremely interdependent. The common practice of implementing them in various sequences poses real challenges for those of us at the coal face. So, for example if you have no brief to implement supporting levels of configuration and service level management, and have been told to implement just incident management (or just incident, problem and change management), then it is going to be overloaded (in an information architecture sense) with data that the missing processes cannot provide, but which is nonetheless required. In such a case there is no alternative but to compensate practically, and a classification approach like the one you outlined would be entirely appropriate in many contexts.
Now to address the rest, I don't want to get into a specific Q&A around your post. Rather I'd just like to summarise some of the experience my comments were based on.
About two and a half years ago I watched a system go in (somewhere out there ) to handle incidents. A classification approach was deployed that classifies all the bits and pieces that the people responsible for fixing them were concerned with. By the time it weighed in as a tree with about 400 different possible choices (excluding service requests), it was still growing as deployment widened, and was starting to break down at the service desk. It also was not really assisting in providing effective reports. Worse, it was doing nothing to focus technical support staff on restoring services and most were still jumping straight to RCA. Moreover, it was a constant struggle to actually keep the schema consistent.
The team sat down and decided to go back to basics - ITIL basics - to see what we could do. We decided there were some central ideas that could guide us. There were:
To stick the ITIL definition of an incident: A disruption in the agreed delivery of a service - any disruption - with no assumption made that behind every incident there is an error.
To stick to the ITIL definition of a service: Basically as capabilities used by the business and understood to be delivered as a whole.
To stick to the ITIL incident life cycle as the process template. And the wider service support process chain of
Incident (disruption) -> Resolution -> Problem raising -> RCA -> Error Identification and recording.
Less centrally we also decided that incident classification was anything on the record which specified the incident - not just what might be called the 'primary classification', which is the subject of this thread, (and many others). This meant that primary classification did not have to carry all the classification information. We only had to pick a schema that was best suited to what primary classification is for. We came to the conclusion that primary classification is best suited to:
a) Classifying the impact on the Services. For example if a network switch fails, but an end-user reports that they cannot post a leave application form, what needs to be captured is what the business lost (the capability of the service), not what ICT lost (the equipment downtime).
b) The appropriate assignment for an incident report. What level of skills and to which function it should go.
c) The basic report type - whether it is a break fix event, a problem with incorrect information in a record system, whether it is a lack of training, or information, and, of course whether it was simply a service request.
We also decided that asking the submitter (or logger) to decide which component (or even type of component) was the target of investigation at the first point of contact was effectively requiring them to guess at the beginning, what RCA, which is much further down the support chain, would come up with. And under the existing schema classification accuracy (a KPI for incident management) did suffer, and the number of reassignments was too high and directly attributable to a 'where-is-the-error' based primary classification schema.
We did a technical ownership audit of our services. Rewrote our services catalogue, and classified every incident first by the disrupted (or requested) service.
We did an Ishikawa diagram for all our services and found that all Root Causes could be represented clearly in three simple branches representing a) The state of the infrastructure - break/fix, capacity and configuration errors, b) Errors in the state of the information floating around in various systems and in the business processes managing that information, c) end user competency, ability, knowledge, etc. Straight up service requests (kind of 'not having' as the source of a disruption was an effective 'fourth' branch.
We then built our primary classification around the task of simply identifying what service is being reported as disruption, and what kind of disruption it was.
We brought our entire classification tree back to about 70 total combinations (still excluding service requests). Reports showed incidents, (not errors), and the impact to the business, assignment became more accurate and incident resolution times decreased due to a lowered level of reassignments, and staff became more focused on service restoration. First point of contact and escalation figures remained unaffected. But our classification schema is equally suited to classifying the knowledge base articles in the KB system we are planning to deploy, and we expect to gain there in both KB maintenance loads and effectiveness.
It wasn't perfect - nothing is. We have had to work hard to provide good back up for the technical side of the process by carefully growing our configuration management capabilities alongside these initiatives. We still face a struggle to get the Service Catalogue development into a managed business cycle. Changes of management and shifting agendas have had to be negotiated. We've spent money improving our asset management and monitoring capabilities, and linking the infrastructure data into our service management system. We still have to rely on tribal knowledge too much. And so on...
But the bottom line is we put in an extremely simple classification schema, which contains no trace of the things that RCA looks for, and improved both our incident management and service desk KPIs. But most importantly we focused our process on the services and are capturing what the business cares about - where the incidents are actually hurting them - something common SLA availability and response metrics don't actually show.
So far I have been doing this about three years - which is not a great deal, there are others with much more under their belt. But what I have found is that the best returns have come where I have been able to stay very close to the central concepts in ITIL It's where I have actually had to 'adopt and adapt' that it has become hard and costly.
Posted: Tue Feb 28, 2006 1:33 am Post subject: Classification (rjp)
No worries! I didn't take it as a personal, so no problem. Was just confused as to what you meant. I would also agree that I need to be careful in framing my words on a forum such as this and being clear as to what I'm meaning to convey. This is a challenging subject, so it's better to be clear and avoid confusing anyone any further.
I appreciate the response on this. So, what I'm seeing is that you took classifications and per ITIL base them on the SERVICE or the end-user IMPACT, rather than the actual component. So to illustrate, you might have something more like Outlook/non-functional/unable to load inbox rather than Software/Outlook/mailbox issue.. ? Again, maybe not quite right, but the point is that the classifications are more end-user and INCIDENT based rather than for the benefit of IT and identifying the component at fault? At least that's what I'm getting from your post and that would make sense under what I've seen from ITIL.
Again, appreciate any and all input. Feel free to email me as well if needed.
RJP - excellent detailed explanation of the why and how to build what you call "primary classification". I digg the whole thing!
Fabien, I'd be careful in putting the IS System (eg: SMTP server) at second level in your classification system, as it is quite doubtful that when an Incident is reported it can always be clearly identified what is the IT component involved. Sounds to me like that information would be better stored in another field (configuration item... or other). _________________ Better have remorse than regrets
I just re-read your previous posts and I understand what you're saying. We haven't gone through the entire classification tree yet and we may want to split the level 1 (customer-facing service) into 2 levels as you indicated.
However, the IT system mentioned may just be a bad example. The idea is not to identify a component but an overall IT system like "network".
Here is how we thought of this piece: once we identify a customer-facing service, we can start classifying incidents based on symptoms. In order to go further, we would need to identify which part of the service may be failing. So for each of those services, there is a number of IT systems involved, e.g. for an e-mail service, there is client, lan, mail server, WAN, Internet gateway, ...
Initial support procedure should assist in defining what that IT system is.
I'm very interested in your comments on this. _________________ BR,
Technology Consulting | Service Excellence
Red Badge Certified
Posted: Mon Jul 24, 2006 4:18 pm Post subject: Great discussion!
Hi there, long time since I've been in here! I originally posted a question that was along the lines of what's been discussed here ... my original post... http://www.itilcommunity.com/modules.php?name=Forums&file=viewtopic&t=480 . Funnily enough, my account had expired so I recreated with the same username. There was no reply for a few weeks so I figured there wasn't going to be one - thanks rjp for your reply (as long overdue as the thanks are! ).
It would be great if this other discussion could be 'stickied' somewhere or put somewhere as a reference as I think the info and points are valuable. I have moved on since I made that original post and am in a new job now where I have little involvement with the central decisions and changes. Makes it tough for someone that enjoys change for the better like I do but you never know, if I make a strong enough case I may be able to get a very big ball rolling if I can get the right people on side. Hopefully this and other forums will assist as I look to do that.
This discussion and the points made, particularly where services are concerned, are fantastic. Thanks for the help.
Does anyone know of any other similar discussions/resources on other forums they could direct me to?
Posted: Thu Jul 27, 2006 12:27 pm Post subject: Further questions
I like the ideas presented in terms of logging from the user's perspective and keeping it service based. My issue with this though is that where I work currently, the 'services' seem to be vast.
With a view to the service>system>issue methodology would anyone have any ideas how to categorise at the 'service and system' levels for around 30 applications - not including office apps. I think this is the trickiest part as at times there doesn't really appear to be a delineation between service and system i.e. the product ArcGIS is effectively a service and a system?
Does anyone have any pointes on how to manage a great raft of apps?
It's reasuring to know others are struggling with this issue as much as I am.
RJP, like your suggestions, makes alot of sense. Something I'm keen to do is follow the ITIL principals and ensure that the users and those within the Service Desk itself can understand and use the classifications we use.
I'm thinking along the lines of a 2 tiered classification along the following lines :
System -> Function
Where "System" is a name users and Service Desk technicians alike will relate to (e.g. eMail, Printing, Logon,Office Application), but even that comes to a sticky point ! "Office Application" could be Word, Excel, PowerPoint, Access etc etc. I could put these application names in my first tier, but then it starts to grow into a list of all the applications we support (bearing in mind we support PC's and Macs). I think it's important to keep the list short and understandable to the users, but keep hitting this flaw in my plan.
"Function" would be things like "Send/Receive", "Out of paper", "Account locked".
This would serve my purpose, in as much as I'm able to record where the weaknesses are in either the infrastructure or the skills (users or Helpdesk technicians), but I just can't make it fit together without going 3 tiered which I don't really want to do as we're just starting out with ITIL where I am (I've used it before as the guiding principal, but it's new where I am now).
It's a tricky one isn't it. The comments I've read around the place about this being a 'black art' are very true . I really don't know how you will manage to do your categories without using a 3rd tier. I couldn't imagine see any way to do it as I just don't think you can get all the information you want but I guess it depends on exactly what information you want too.
I feel like I am gradually getting somewhere with mine and I will try to remember to post what I come up with in here in case it benefits anyone else and/or no doubt it could do with comment once I have put something together.
The really tricky part here is coming up with a list which contains terms everyone will relate to across all IT teams (Service Desk, SysAdmins (Apple & Windows), Network techies etc.).
I like the idea of each incident is a result of an error (i.e. something unexpected, even if it's actually a training issue), as this removes the need to record the error within the classification structure. This prevents the classification structure becoming a long list of possible / encountered errors with a particular Service -> System etc.
I'll try putting something together which works as best I can, then like you dsquared, post it here for comment.
Joined: Oct 26, 2006 Posts: 2 Location: Bogotá, Colombia
Posted: Fri Oct 27, 2006 3:47 am Post subject:
--DeanoB-- (and everyone else)
I've been a reader for a couple of months and this is my first post, so apologies for the newbie and the language related problems here (spanish is my native tonge).
I'm leading the implementation of service desk and incident management process in my organization.
We have faced many of the doubts and troubles mentioned here, with the categories definition.
Getting to DeanoB post, we have decided to create two separated lists to deal with the "common language" issue, and we call them: external categories (those and end user can see, and the service desk guys should know) and internal categories (those the second level, usually service administrators as the DBA, Servers Adm., Middleware Adm., etc., what we call specialists should deal with).
Our web tool for the register of incidents and requests can be configured so as we can define which categories can be seen based on someone's role (service desk analysts, end user, specialists, etc.)
We use a three level list for both (internal and external), where the first one is the service (which not always match one system or application).
This way and end-user may request access for a new user to finance system. This would be under the external category: Finance system->users management
The service desk create two or more "internal" requests, in order to do the creation: Permissions - Data Base and Permissions - Firewall
Both are managed as childs of the external request.
So far it seems to be working. _________________ Alejandro Segura
ITIL Project Coordinator
Universidad de los Andes
Posted: Fri Oct 27, 2006 10:41 pm Post subject: Incidednt categorisation
A very interesting and informative thread, even if one of the most lengthy.
Can I add a word of caution or even advice when putting together a categorisation structure for incidents.
Believe me the real test is when you have to produce a report using the data in the structure you have created. So in my mind this has to be a bottom up approach i.e. what reports may I require in the future from the data I am collecting now. And we obviously have to provide a structure that will enable rapid Incident matching for the Service Desk.
Another point is that you will have different audiences so you may have to report historic information held in different ways.
When initially logging an Incident the reporting categories should reflect the business impact, allowing meaning reporting against SLAs.
When closing the Incident you have to reflect the technical issues, which may be of future use to problem solvers in the future.
I would always recommend a toolset that will allow more that one form of categorisation. Some award winning toolsets fall down in theis area so beware...
All times are GMT + 10 Hours Goto page Previous1, 2
Page 2 of 2
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum