Search
Topics
  Create an account Home  ·  Topics  ·  Downloads  ·  Your Account  ·  Submit News  ·  Top 10  
Modules
· Home
· Content
· FAQ
· Feedback
· Forums
· Search
· Statistics
· Surveys
· Top
· Topics
· Web Links
· Your_Account

Current Membership

Latest: MRawlins
New Today: 30
New Yesterday: 89
Overall: 139448

People Online:
Visitors: 70
Members: 1
Total: 71 .

Languages
Select Interface Language:


Major ITIL Portals
For general information and resources, ITIL and ITSM World is the most well known for both ITIL and ITIL Books. A shorter snapshot approach can be found at ITIL Zone

Related Resources
Service related resources
Service Level Agreement
Outsourcing

Note: ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.


The Itil Community Forum: Forums

ITIL :: View topic - Designing for availability
 Forum FAQForum FAQ   SearchSearch   UsergroupsUsergroups   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Designing for availability
Goto page 1, 2  Next
 
Post new topic   Reply to topic    ITIL Forum Index -> ITIL Discussion
View previous topic :: View next topic  
Author Message
BorisBear
Senior Itiler


Joined: Mar 10, 2008
Posts: 403
Location: Sunderland

PostPosted: Fri Jan 30, 2009 10:17 pm    Post subject: Designing for availability Reply with quote

SOOOooooooo............My customer wants 99.5% availability from their perspective.

The service in question is based on a Genesys front end with a SAP engine. The infrastructure footprint covers locally deployed components/hardware in the UK and most of the SAP kit in Germany.

Our method of calculating whether we can deliver required availability is along the lines of 'resilient servers x network x routers x desktop.....etc' (i.e. the traditional straight out of an ITIL book view of designing for availability). This is far too simplistic for my liking as it doesn't account for the internal and external support required to keep these components running and to fix them when they go wrong.

So, in the interests of challenging our SLM guys and support groups as to whether the customer requirement is truly underpinned I'm looking to apply some more exacting science to the calculations. My thoughts are along the lines of where support hours don't cover online availability hours
what is the likelihood of a component failing and how much downtime are we likely to be exposed to.

I have never really seen this type of thing done particularly well so if anybody has had success in scientifically covering support as well as hardware/software components in underpinning availability I would be very interested in knowing how you did it?
Back to top
View user's profile
UrgentJensen
Senior Itiler


Joined: Feb 23, 2005
Posts: 458
Location: London

PostPosted: Fri Jan 30, 2009 11:27 pm    Post subject: Reply with quote

Boris,

Support is easy to factor in; start with third party contracts and negotiate a lower 'out of hours' downtime rate with your customers to reflect the reduced operational impact.

Factoring in support time is doable if you are managing your support team time/effort accurately.

UJ
_________________
Did I just say that out loud?

(Beige badge)
Back to top
View user's profile
BorisBear
Senior Itiler


Joined: Mar 10, 2008
Posts: 403
Location: Sunderland

PostPosted: Sat Jan 31, 2009 1:01 am    Post subject: Reply with quote

UJ

This is only 'out of hours' for some of the IT Support Organisation not the Business Customer. They want 99.5% for the entire hours of service and they expect us to work out how that can be delivered and charge them for it appropriately, not to say that we can't offer them that but we can offer them something different.

The key here is that its from the customer's perspective not the IT organisation.
Back to top
View user's profile
UKVIKING
Senior Itiler


Joined: Sep 16, 2006
Posts: 3292
Location: London, UK

PostPosted: Sat Jan 31, 2009 1:35 am    Post subject: Reply with quote

BorisBear,

the following is a real situation

One of the company's I had work had an application that was suppose to be available 24x7 w/99.5 % availability

The application was supported by the 3rd party application developers
The application resided on Unix O/S supported by a SUN approved local country IT provider (3rd party)
The company packaged this and all we did was manage the NMS tools and escalated out.

Funny thing on the way to the service.
Sun has a hierarchal support system w/response times - diamond, gold silver, etc however the hierarchy is
their vendor had a different interpretation on support and response

One of the application servers died on a Thursday Night. The server was not on any of the Diamond or Platinum or ulitmate support / response models. It was on the lowest. The response was w/in 24 hours. the service fix time was even longer

The server did not get fixed until the following wednesday / thursday
- however, it was fixed in spec w/the contract hours for that MACHINE not for the company or the application / service

So the 99.5% was not met because some body decided not to spend extra money for maintenance / support contract
_________________
John Hardesty
ITSM Manager's Certificate (Red Badge)

Change Management is POWER & CONTROL. /....evil laughter
Back to top
View user's profile
Diarmid
Senior Itiler


Joined: Mar 04, 2008
Posts: 1884
Location: Newcastle-under-Lyme

PostPosted: Sat Jan 31, 2009 1:52 am    Post subject: Reply with quote

BorisBear wrote:
The key here is that its from the customer's perspective not the IT organisation.


The customer perspective of availability is the only one there is. Availability means availability to the customer.

I think you are already on the right lines in identifying the factors involved. The two main questions that need answering before working out costs and preparing commitment are:

1. what is the time frame for the 99.5%? it makes a big difference whether it is a year, a month, a week a day, an hour ( (: ).

2. What is the tolerance to time down and how does it vary?

The easy case for you is if you have a robust system and they want 99.5% over a year and they can survive a fail during unsocial hours, then you might get away with fixing by 06:00, say. Probably not realistic, but it illustrates some issues.

The more critical their business imperative, the higher the risk/cost is going to be.

Do you have historical data on outages and recovery times? What is your minimum recovery time for DB crash? network node failure? OS crash? etc. how long does it take to detect outage? how long to get support to the 'coal face'? How long to achieve diagnosis and initiate correction and recovery? how much do these figures vary from their averages?

How much contingency needs to be there for investigating mysterious outages?

If gearing up for round the clock quick response is expensive, perhaps some aspects can be put to a third party who already operate round the clock. This will almost certainly be true if your organization is not large.

The true test of how serious they are about the figure is when they see the cost and risk analysis. How near to certainty do you need to be before the risk is acceptable?
_________________
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
Back to top
View user's profile Send e-mail
BorisBear
Senior Itiler


Joined: Mar 10, 2008
Posts: 403
Location: Sunderland

PostPosted: Mon Feb 02, 2009 6:23 pm    Post subject: Reply with quote

Diarmid - I think the truth is that because we don't have historical data we won't be able to offer 99.5% with any degree of confidence and therefore will probably work towards this over a period of a few months.

We do have some periods of the day where availability is more critical than at other times so we could weight availability/downtime accordingly. I guess what I'm struggling with is how to put some science behind what the support groups do particularly as we're very much an organisation that works in silos without joined up ownership of service provision. We're not going to solve those problems overnight but some tips on how to tackle the science and pin down the support groups would be good.
Back to top
View user's profile
Diarmid
Senior Itiler


Joined: Mar 04, 2008
Posts: 1884
Location: Newcastle-under-Lyme

PostPosted: Mon Feb 02, 2009 7:58 pm    Post subject: Reply with quote

Boris,

since your issue is time, you have to measure time.

If your silos are deep, then you measure from the time the incident enters the silo until it emerges and you have to obtain a commitment from the silo manager on how long things take. And you design slick interfaces between the silos.

If your silos are shallow, then you measure all the phases and possibly measure processing time (time sheets) by individual staff and manage it all co-operatively with the silo managers.

Either way you are measuring time and putting yourself in a position to ask for improvements in specific areas. A priority could be to integrate the procedures used by the silos, or build an integrated procedure from scratch. there won't be silos then.

Without the data you can't predict.

Once you have the data, you will see your shortfall. But you can't just tell the business to spend more to address the shortfall. you also have to make improvements. Chances are that breaking down the silos will help a lot. If you set up improvement programs with targets and then meet those targets, then the business will listen to you when you show them the limits of your present capacity.
_________________
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
Back to top
View user's profile Send e-mail
UKVIKING
Senior Itiler


Joined: Sep 16, 2006
Posts: 3292
Location: London, UK

PostPosted: Mon Feb 02, 2009 8:22 pm    Post subject: Reply with quote

BorisBear,

one of the hares I use to split about SLAs was response time

The tool we used for incident mgmt accepted emails from customers (specific email addresses of course). This would create a 'ticket' in the system that would reply to the customer when we accepted the ticket.

there was a 15 minute SLA for Priority 1 (P1) issues. One customer thought everything was a P1.

The SD would make sure the tickets were accepted and the email went to customer w/in 15 minutes

So if the service is M-F 0800 - 2000, then the availability is against that time frame.

Now I also used to split hair w/availability.

If the web site was up and able to accept requests (we used external sitescope), then the availability was 100% if the requests was between 0800 & 2000 (GMT in this case).

Hare splitting I know ...but bugs bunny was always my favorite
_________________
John Hardesty
ITSM Manager's Certificate (Red Badge)

Change Management is POWER & CONTROL. /....evil laughter
Back to top
View user's profile
asrilrm
Senior Itiler


Joined: Oct 07, 2007
Posts: 441
Location: Jakarta, INA

PostPosted: Mon Feb 02, 2009 10:16 pm    Post subject: Reply with quote

Hi,

Referring the first post, it is common to exclude maintenance window from the agreed availability.
Availability could be calculated in a monthly, quarterly, or annually basis.
But usually providers would calculate availability in annual basis for the reason to give them the flexibility of managing the service, although the report is in monthly basis.

For instance the full calendar availability is 365 x 24 x 60 = 525,600 minutes (= 31,525,000 seconds).
Let's say the maintenance window is 20% = 52,560 minutes.
That makes your 100% availability equals 473,040 minutes.
99.5% availability means that the 0.5% (equals 2,365 minutes) unplanned downtime could be spread anywhere throughout the year.

Further, you could breakdown the availability to wrap up all the equipments from end to end, of you could also set availability for individual equipment, or combination of both.

Cheers,
Asril
Back to top
View user's profile
asrilrm
Senior Itiler


Joined: Oct 07, 2007
Posts: 441
Location: Jakarta, INA

PostPosted: Mon Feb 02, 2009 10:45 pm    Post subject: Reply with quote

Quote:
Let's say the maintenance window is 20% = 52,560 minutes.


Sorry, it should be 10% = 52,560 minutes Embarassed
Back to top
View user's profile
Diarmid
Senior Itiler


Joined: Mar 04, 2008
Posts: 1884
Location: Newcastle-under-Lyme

PostPosted: Mon Feb 02, 2009 11:02 pm    Post subject: Reply with quote

asrilrm wrote:
99.5% availability means that the 0.5% (equals 2,365 minutes) unplanned downtime could be spread anywhere throughout the year.


Which is not good if it all happens on one day. (The Longest Day with a vengeance).

The business has to be protected and it won't wash if you are within the literal terms of your contract but the service has been down too long or too often

Where an annual figure is used it should be qualified by other limits over shorter periods. There can be such things as:
- no breaks lasting more than four hours
- 98% availability within any rolling four weeks
- no more than three breaks in any month

These tools keep the service provider 'good' while allowing a little contingent flexibility.
_________________
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
Back to top
View user's profile Send e-mail
JoePearson
Senior Itiler


Joined: Oct 13, 2006
Posts: 116
Location: South Africa

PostPosted: Tue Feb 03, 2009 9:05 pm    Post subject: Reply with quote

Good points already made - I'll try not to repeat stuff.

This is what you should be taking to the customer.

- Tell them to forget about measures like 99.5%, even if they've read such things in the press, unless they have some solid history (from what you've said, they don't have history from you) or solid industry benchmarks, that allow them to match a percent availability with an actual business risk.

In other words, focus on the length of outages as much as, or more than, total lost time. As discussed in previous replies!

- Negotiate requirements for each distinct period ... such as main business hours, critical month-end times, early evening "best efforts" support, etc (I put that phrase in quotes most deliberately) ... and do your calculations separately for each period.

- I have seen the ITIL-style calculations done well, with full consideration of the different formulae for redundant independent components and so on. But the long-term value is only created if you make actual measurements of customer service availability and internal component availability and use them to calibrate and verify your calculations.
Back to top
View user's profile Visit poster's website
BorisBear
Senior Itiler


Joined: Mar 10, 2008
Posts: 403
Location: Sunderland

PostPosted: Tue Feb 03, 2009 11:08 pm    Post subject: Reply with quote

Diarmid wrote:
asrilrm wrote:
99.5% availability means that the 0.5% (equals 2,365 minutes) unplanned downtime could be spread anywhere throughout the year.


Which is not good if it all happens on one day. (The Longest Day with a vengeance).

The business has to be protected and it won't wash if you are within the literal terms of your contract but the service has been down too long or too often

Where an annual figure is used it should be qualified by other limits over shorter periods. There can be such things as:
- no breaks lasting more than four hours
- 98% availability within any rolling four weeks
- no more than three breaks in any month

These tools keep the service provider 'good' while allowing a little contingent flexibility.



....is the right answer. i don't know any of the big Vendors who don't measure services at least in part on a monthly basis, especially given that service reviews and improvement initiatives need to be 'of the moment'
Back to top
View user's profile
BorisBear
Senior Itiler


Joined: Mar 10, 2008
Posts: 403
Location: Sunderland

PostPosted: Tue Feb 03, 2009 11:09 pm    Post subject: Reply with quote

JoePearson wrote:
Good points already made - I'll try not to repeat stuff.

This is what you should be taking to the customer.

- Tell them to forget about measures like 99.5%, even if they've read such things in the press, unless they have some solid history (from what you've said, they don't have history from you) or solid industry benchmarks, that allow them to match a percent availability with an actual business risk.

In other words, focus on the length of outages as much as, or more than, total lost time. As discussed in previous replies!

- Negotiate requirements for each distinct period ... such as main business hours, critical month-end times, early evening "best efforts" support, etc (I put that phrase in quotes most deliberately) ... and do your calculations separately for each period.

- I have seen the ITIL-style calculations done well, with full consideration of the different formulae for redundant independent components and so on. But the long-term value is only created if you make actual measurements of customer service availability and internal component availability and use them to calibrate and verify your calculations.



Hmmmm...not sure I agree. What if the service is up and down like a gigolo's bottom with each downtime period being just a few minutes. In my experience this can be an even worse customer experience.
Back to top
View user's profile
BorisBear
Senior Itiler


Joined: Mar 10, 2008
Posts: 403
Location: Sunderland

PostPosted: Tue Feb 03, 2009 11:12 pm    Post subject: Reply with quote

asrilrm wrote:
Hi,

Referring the first post, it is common to exclude maintenance window from the agreed availability.
Availability could be calculated in a monthly, quarterly, or annually basis.
But usually providers would calculate availability in annual basis for the reason to give them the flexibility of managing the service, although the report is in monthly basis.

For instance the full calendar availability is 365 x 24 x 60 = 525,600 minutes (= 31,525,000 seconds).
Let's say the maintenance window is 20% = 52,560 minutes.
That makes your 100% availability equals 473,040 minutes.
99.5% availability means that the 0.5% (equals 2,365 minutes) unplanned downtime could be spread anywhere throughout the year.

Further, you could breakdown the availability to wrap up all the equipments from end to end, of you could also set availability for individual equipment, or combination of both.

Cheers,
Asril



Asril - I think you have misunderstood me.....I wasn't suggesting that maintenance windows be part of the availability calculation. At the core of my original question was that we have expected failure rates and availability predictions for hardware components but we don't have the understanding of the impact of availability and capability of support staff to deal with incidents when they occur.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    ITIL Forum Index -> ITIL Discussion All times are GMT + 10 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB 2.0.8 © 2001 phpBB Group
phpBB port v2.1 based on Tom Nitzschner's phpbb2.0.6 upgraded to phpBB 2.0.4 standalone was developed and tested by:
ArtificialIntel, ChatServ, mikem,
sixonetonoffun and Paul Laudanski (aka Zhen-Xjell).

Version 2.1 by Nuke Cops 2003 http://www.nukecops.com

Forums ©

 

Logos/trademarks property of respective owner. Comments property of poster. Rest 2004 Itil Community for Service Management & Foundation Certification. SV
Site source copyright (c)2003, and is Free Software under the GNU / GPL licence. All Rights Are Reserved.