Don't have an account yet? You can create one. As a registered user you have some advantages like theme manager, comments configuration and post comments with your name.
NOTE: ® ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.
Search
Languages
Select Interface Language:
Advertising
Please contact us via the feedback page to discuss advertising rates.
The Itil Community Forum: Forums
ITIL :: View topic - Resolution vs Recovery vs Restoration
Posted: Sat Apr 04, 2009 8:41 am Post subject: Resolution vs Recovery vs Restoration
My first post here and I am hoping that all of you can help me out.
I am struggling with the ITIL definitions of Resolution, Recovery and Restoration. Now then, the incident management life cycle indicates that [Resolution and Recovery] occur after [Investigation and Diagnosis]. Notice that Recovery is noted second in the step of [Resolution and Recovery]. I think there is significance to this as I read the definitions of these terms.
So here is the quandry, according to ITIL definitions Recovery is "returning a service to a working state before returning to users" and Restoration is taking action to return service to users after recovery from an incident." If this is true and Resolution comes before Recovery as noted in the life cycle of an incident, how can Resolution be the action taken to repair the root cause of an incident or problem or to implement a workaround as defined by ITIL?
Are all of these terms to be used interchangibly?
Here is another spin, one thing I found in ITSM documentation about Resolution and Recovery is "ITIL incident resolution and recovery is performed once the incident is fully understood. Finding a resolution to an incident means that a way of rectifying the issue has been identified. The act of applying the resolution is the recovery phase. Recovery can be performed by IT support staff or by providing the end user with a set of instructions to follow. Incident resolution and recovery is the seventh step in the incident management process. It follows investigation and diagnosis."
If one wants to find the mean time to resolution would that be at the time the resolution is identified?
If one wants to find when service has been restored to the end user would that be mean time to restoration?
If one wants to find when service has been restored but maybe all end users have been released to that service would that be mean time to recovery?
What a tangled mess these ITIL definitions present.
Joined: Mar 04, 2008 Posts: 1894 Location: Helensburgh
Posted: Mon Apr 06, 2009 8:24 am Post subject:
It seems quite straight forward.
"Resolution is the action taken to repair the root cause of an incident or problem, or to implement a workaround"
In other words resolution is the action taken to achieve recovery. Therefore resolution is not a state because performing resolution (i.e. resolving the issue) achieves the state of recovery. Thus mean time to resolution is not particularly meaningful or at best is identical to mean time to recovery.
Restoration is once again delivering the service to the customer.
In very many cases recovery and restoration will be achieved simultaneously because making the system work often instantaneously makes it available. However ITIL recognizes that this is not always so and asserts that there may be more actions required in order to achieve availability to customer.
As far as measurements are concerned, restoration is customer facing and is relevant to SLAs, while recovery is service oriented and is of interest for internally measuring service processes. The customer does not care that you have mended the fault, only that you are once again delivering the service reliably.
Of course in the real world, these terms are used very loosely because they seem to convey a degree of meaning that people think they understand and so long as they are used consistently in your organization, that is sufficient.
The ITIL books provide definitions to ensure that when the terms are referenced you will know the way they are intended in that context.
I cannot comment on your "another spin" quote (because you do not state its source), other than to say that it would not be wise to attempt to derive definitions from it when the glossary has already provided clear distinctions for the terms. _________________ "Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
Joined: Sep 16, 2006 Posts: 3590 Location: London, UK
Posted: Mon Apr 06, 2009 5:44 pm Post subject:
Alessi
As Diarmid said it succiently, I wont add to what he said. I will add to your confusion instead
You have lies, damn lies and Statistics
MTTF - mean time between failures
MTTR - mean time to repair
these are fairly easy to calculate
Server 01 went down at 11;34 GMT on March 11 2009
Server 01 was back up at 13:47 GMT on March 11, 2009
The server had been installed on March 1, 2009 at 11:30. The MTTF would 10 days 4 minutes. MTTR would 2 hours 13 minutes
The hard part is as follows
The server in question is the application front end (Single point of failure) for 500 customers.
The server down at 1134. Customers called at 11:45 complaining that the website is down. The SD created a ticket when the first customer called. The server outage is escalated to the right group who fix it and bring the server up at 1347. The application on the server starts at 1350
When the SD complies the outage,.... what is the duration calculated from
NMS down / up time for the server
Service up / down time (from logs in the seerver)
Ticket open / ticket close
?
The answer tomorrow _________________ John Hardesty
ITSM Manager's Certificate (Red Badge)
Change Management is POWER & CONTROL. /....evil laughter
I appreciate the replies that have been posted so far.
My apologies in regards to the ITSM reference. The ITSM definition of resolution came from the itsm.certification.info webiste in their encylopedia of terms.
In an effort to hopefully get more clarification or maybe better explain where I am struggling with these definitions, let me state the following.
As a technical team we are interested in knowing how quickly we are able to identify a resolution to an incident, whether it is permanent fix or a workaround. Using the definition of resolution would it suffice to say that mean time to resolution (mttr) would be at the point we identifed the resolution?
Considering the above to be true, we are also tracking our incident "life cycle" as the time elapsed from the contact time to the time of restoration of service i.e. mean time to restoration.
Mean time to Resolution - point in time at which the action to resolve the incident is identified
Mean time to Restoration - point in time at which service is restored to the end user
Does this make sense? Would these be considered ITIL compliant definitions?
Joined: Mar 10, 2008 Posts: 403 Location: Sunderland
Posted: Wed Apr 08, 2009 2:01 am Post subject:
Just to be awkward I will put a different spin on it.
A database is locked causing screen freezes for users.
Resolution is the act of bouncing the database to clear the locks.
Recovery is the act of correcting any inconsistencies in the database following the bounce that would lead to further online problems.
Restoration is the act of bringing service back online for the users.
Its just a load of old wordy bollix dreamt up by the 30+ authors of the ITIL books so don't worry about it too much. Apply what works for you and do it consistently, in doing so you drive out meaningful performance metrics
Joined: Sep 16, 2006 Posts: 3590 Location: London, UK
Posted: Wed Apr 08, 2009 2:21 am Post subject:
Alessi
You are mixing apples with sheep
You are looking this from the technical resolution team
ITIL metrics is for the service desk (sort of)
At 0731 a server 's web service goes down
At 0736, the threshhold for creating an incident ticket is met
At 0737. the ticket is raised
at 0740. a user of the service on the server complains. a second ticket is created
At 0745, the SD matches the two tickets
At 0745, the SD assigns a technical resolution team to solve the incident
At 0750, you, alessi, grab that ticket to work on
from 0750 to 0810, you troubleshooting
at 0810, you realize what is wrong.
at 0811, you restart the web service
at 0815, the web service is restored
at 0820, you close your task
at 0825, the SD picks up the ticket as you have closed your tasks/action
at 0830, the SD contacts the user to inform them that the service is back
so what are the SLA's / OLA's,
the SD has the monitoring tool set to a 5 minute threshold
your team has an OLA of responding 10 minutes of recieving tasks
Your team does NOT have a FIX TIME (other thread)
So what is the down time of the service and where do you start / end the calculation
For the Service Desk..from the time they open the ticket to close the ticket is one set of stats
for the Availability team, the NMS down / up time is another set of stats
for the technical team, the time the ticket was assigned, picked up, closed and assigned back to the SD are their stats
for the user... his time is also different _________________ John Hardesty
ITSM Manager's Certificate (Red Badge)
Change Management is POWER & CONTROL. /....evil laughter
Is the phrase really "Mixing apples with sheep". Since we have a simulare phrase in german I would have thought it would be "comparing apples with pears".
Aucade, the English saying is "comparing apples and oranges," which is very close to the German version. John is making the point that the items mixed up (compared) are even less similar than just types of fruit.
Caveat: I'm American and I'm not up to date on the latest Briticisms (but I know John's a transplant).
Joined: Mar 04, 2008 Posts: 1894 Location: Helensburgh
Posted: Wed Apr 08, 2009 6:31 am Post subject:
alessi wrote:
As a technical team we are interested in knowing how quickly we are able to identify a resolution to an incident, whether it is permanent fix or a workaround. Using the definition of resolution would it suffice to say that mean time to resolution (mttr) would be at the point we identifed the resolution?
If I understand what you are saying then your question is not about definitions at all. You seem to want to measure your team's performance (as you should). So what you are measuring is start point to end point of your scope. You can use any useful terminology that you like to do this.
If you think about it, someone has the scope of applying the solution that you identify. you cannot both (or either) hijack the global service concept of "time to resolution" because each of you performs only a part of it.
alessi wrote:
Would these be considered ITIL compliant definitions?
There is no compliance with ITIL. as I said before the definitions in the book are there to explain what the book is saying.
BorisBear wrote:
Its just a load of old wordy bollix dreamt up by the 30+ authors of the ITIL books so don't worry about it too much. Apply what works for you and do it consistently, in doing so you drive out meaningful performance metrics
This is the correct answer in summary form.
Therefore:
alessi wrote:
Mean time to Resolution - point in time at which the action to resolve the incident is identified
Mean time to Restoration - point in time at which service is restored to the end user
Does this make sense?
If they work for you, do it.
If someone comes along and says "but this is not ITIL" your answer is "no, this is not guidance; this is a working Service Management System" _________________ "Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
UKVIKING - Let me clarify my statement that we are a technical team.
In ITIL's definition we are a SD. However, we are a technical team as well. We provide the SD function yet we are also tier 1, 2, and sometimes tier 3 support depending on what the incident involves.
I would not classify our team as a typical SD. So tracking our metrics of how quickly we can identify a resolution and how quickly we can restore service are our top metrics. Our team does the whole lifecycle of an incident - detect, respond, diagnose, repair, recover, restore.
Does this change your opinion of what I was asking about Mean Time To Resolution and Mean Time To Restoration?
Joined: Sep 16, 2006 Posts: 3590 Location: London, UK
Posted: Wed Apr 08, 2009 7:05 am Post subject:
No it does not
It does not change anything
Your team is doing incident mgmt
Are you customer /user facing or are you a technical response team that gets handed work or finds it independently ?
The stats I put were used in various roles / aspects of a technical / non- technical SD/NOC
You must have a ticket system, monitoring tools, task tools, etc
if there is an incident / alert that your team must deal with , the clock for you starts when you assign it to an member of the team
the incident must have 'hang time' to assess where it goes and whether it is actually an incident or what
the hang time is the SD role that you play in the IM cycle
The Availabilty stats tells you for a piece of kit how reliable it is or how poor the NMS tool is configured or both
Aucade: it was apples and oranges.. but I was trying to prove a point _________________ John Hardesty
ITSM Manager's Certificate (Red Badge)
Change Management is POWER & CONTROL. /....evil laughter
Joined: Mar 04, 2008 Posts: 1894 Location: Helensburgh
Posted: Wed Apr 08, 2009 7:07 am Post subject:
alessi wrote:
UKVIKING... ...Does this change your opinion of what I was asking about Mean Time To Resolution and Mean Time To Restoration?
It shouldn't do.
You have to distinguish for yourself, between overall service measures and measures of activities that contribute to the service.
To take the extreme case, if a single person has three roles, you need to measure performance and have targets in each role as well as an overall target, just as if it were three people and a manager.
It is confusing to mix the levels you are looking at.
Don't wrestle with the ITIL terminology. just understand the service requirements and establish clear definitions for yourself. _________________ "Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
Joined: Mar 04, 2008 Posts: 1894 Location: Helensburgh
Posted: Wed Apr 08, 2009 7:11 am Post subject:
John,
what are you doing here at this time of night? Is London closed?
Anyway, our overlapped responses dovetail rather well. _________________ "Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
Thank you to everyone who replied to this thread. Thank you for your insight and for answering my questions. It was truly helpful.
I understand what you guys were saying about the mixing of the SD and the IM, I truly do. I could go into a very lengthy disortation on the detailed function of our team but that is not necessary and would only lead to a more lengthy discussion that would need to be its own thread. I will just leave it at the fact that our team is a very unique existence of SD and IM.
Again, I really appreciate everyones input and thank you for helping me.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum