Joined: Oct 11, 2012 Posts: 2 Location: Sacramento, California - USA
Posted: Fri Oct 12, 2012 9:37 am Post subject: Triage Simulations for (Incident / Problem) ?
Greetings & Salutations,
This topic may likely be moved, but I'm unsure where so I thought I would start here and see if anyone may have some helpful suggestions or recommendations from previous experiences.
Some brief background:
I work within a fairly large organization that adopts ITIL methodology in our stability/support environment. I'm relatively well versed in Knowledge Management for Incident Mgmt from previous roles. I'm currently seeking some information or comparative industry common/best practices for improving our analyst performance during a triage incident or problem. I believe my scenarios involve problem type events as they are typically associated with multiple incidents or high impact counts. Hence, I don't think this would fall under Incident management.
My current focus, and this can certainly evolve after I better understand similar situations that others have faced, is to institute a solution, likely involving simulated exercises, where in I can track analyst performance by way of certain CTQs (time resolve incident, time to identify severity, customer satisfaction...others?) in the form of a Simulated and/or audited Triage Performance. If it's not quite clear enough, my group is responsible for providing triage management services during high technology related impacting events, such as core systems failure (ex: network / site / large system down, etc))
So I think what I'm looking for (and again this is subject to further maturity / development), is any ITIL or related guidance towards instituting administered simulations or active Audits of analyst performance who participate in our Triage events. Whenever we have these 'incidents' (perhaps calling them Global Incidents is more appropriate?), we have various participants join and assist in Triage efforts to assess impacts, identify possible solutions, vet potential root causes, etc..: We have a Level 1 support rep, a Level 2 app specific support rep, in addition to area specific Tech partners such as DB, Network, or Hardware support (could be classified as a Level 3). The goal in the simulation exercise is to further enhance these team member capabilities from a readiness, response, consistency and/or accuracy perspective.
Given the requirements, I am thinking I can approach this by working with leads within each participant area to come up with check points and test scenarios and look at implementing some type of recurring simulations with a 'test group', and then coming up with some type of scoring system. I think I have a fairly decent high level plan to pursue this, but again, perhaps ITIL has something to say about this in a particular area I can further review. As well, any resident expert opinions are graciously welcome.
I am a bit concerned at the potential head room to implement such a solution would be a significant undertaking from an administration and simulation content upkeep perspective so again, looking for any tips or suggestions in addition to anything else I may want to consider. Part of me is also sees some built in short comings with any process we implement given the type of work that is being performed and the some what variant types of scenarios that occur, however maybe ITIL has already approached this subject.
If you've read this far, I certainly appreciate it.
Joined: Mar 04, 2008 Posts: 1893 Location: Helensburgh
Posted: Fri Oct 12, 2012 9:49 pm Post subject:
I'm not sure how much you will be able to find to help you at this level of detail save through consultancy. Certainly ITIL says nothing at so detailed a level - if you have the books then you have all of ITIL - and it is rather a specialist subject for there to be much of value in the public domain. I could be wrong - others may know differently. This is not really
If I understand you correctly (if you are using the word triage as I understand it, borrowed from medical emergency procedures), then there is nothing of problem management in what you are talking about. It is all associated with prioritizing incident resolution activities. Perhaps recognizing that will help to simplify and clear your thinking of what is involved.
However, even if you are in a situation where it is appropriate to "triage" problems (that would be some massive scale of operation and volatility), you cannot properly deal with the two (problems and incidents) together because their approach and driving imperatives are so different.
Since you are only talking about improved performance during triage, then I infer that you are not talking about improving technical knowledge skills, but rather of the way in which triage is conducted. But you do not seem to want to improve the process, rather the skill with which it is performed. I wonder if it is wise to look at one without the other?
You mention a few possible measures. I do not see a place for customer satisfaction as a measure in this area, although communication with users and customers is undoubtedly important and you may find ways to measure its effectiveness (and expeditiousness).
The essence of triage is making informed judgements quickly and then passing the appropriate information and guidance on to the "resolvers". Is this the skill set you are aiming to measure. There will be times, of necessity, when the judgement made proves incorrect simply because not all the facts are known. _________________ "Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
Joined: Oct 11, 2012 Posts: 2 Location: Sacramento, California - USA
Posted: Sat Oct 13, 2012 8:52 am Post subject:
Thanks for the inputs. I agree this is definitely in the incident space, so that helps me narrow the scope just a bit.
I've revisited this request with my manager today and the more I learn about the request, the more it becomes apparent this is part of a larger effort that is a work in progress. I don't have all the pieces to understand the portion of the puzzle I've been asked to contribute, so my efforts seem a little out of step. Apparently the solution may not even involve performing simulated exercises, but rather audits or other measurement approaches, TBD.
Re: Customer satisfaction - We do have support partner group pulse surveys which enable us to gauge participant and impacted user perception on how the triage was handled. (Did the call stay on focus, were the right questions asked which lead to resolution, was escalation pursued where appropriate) I think the fallacy is to overly rely on this measure as a true CTQ, I tend to look at it as a perception check. We often get feedback from different partners/participants that seem to contradict each other. (ex: one survey respondent may say "the call was inefective, slow" vs. another response saying "excellent work from the triage manager, it was a difficult situation")
That said, the effort is a work in progress. We continue to look for ways to get closer towards a better maturity in a readiness and response. I think there is a sense that everything can be predetermined and scripts or runbooks fully defined. We've made some progress in this area such as pre-defined scripts for analysts to follow (i.e. ask these questions, gather these details), however how to measure the impacts of this and similar efforts are something I've yet to discover.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum