Don't have an account yet? You can create one. As a registered user you have some advantages like theme manager, comments configuration and post comments with your name.
NOTE: ® ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.
Search
Languages
Select Interface Language:
Advertising
Please contact us via the feedback page to discuss advertising rates.
Posted: Fri Feb 13, 2009 10:56 pm Post subject: SLA% translated to OLA%
Hello guys,
It's me again.
Just wondering how you think about this.
I recently implemented SLA for business critical application in a global pharmaceutical company.
They decided to start with the SLA and later on OLA.
Unfortunately they ran out of budget after the SLA so the project is frozen.
Now to get to the point. Imagine you agree on an application availability KPI of 97%. (excl. service window and planned outages) This means on a yearly basis, not taking the service window into account, your application is allowed to go down for 11 days (which is a lot!!).
If you translate this into OLA this means your application is running on a server, probably located in a datacenter and networked to the sites.
There you have 4 OLA groups.
Imagine the datacenter disaster recovery plan mirrors the particular server and takes over activities imediately that leaves 3 parties.
Will you divide the 3% downtime into 3 so each OLA has an availability target of 99%? To me this sounds logical, but is this achievable? Is this they way to work in practice? simple as 1+1
thanks for sharing your experience (o; _________________ Misuno
Green ITIL V1 - V3
Joined: Oct 07, 2007 Posts: 441 Location: Jakarta, INA
Posted: Fri Feb 13, 2009 11:39 pm Post subject:
Hi,
I think you need to clarify what you meant by application availability only by measuring server availability.
If you said that it is a centralized application run by many clients then availability is more complicated then merely the server's availability.
You will also have to calculate the availability of network hub/switch, router(s), communication line, then the workstation. The availability is the product of all availability of those nodes.
Anyway, if it is the server's availability that counts, then I guess any approach would do.
In my experience, I would keep SLA and OLA as a separate measurement, meaning that I won't bound the 3% tolerated unplanned downtime to specific OLA whatsoever.
The availability in my SLA is for business critical application. From a customer perspective, asuming the end2end service this means, my application is working.
It is in most cases a centralised application running in a datacenter used in different continents.
What you wrote is interesting: "The availability is the product of all availability of those nodes." I would translate this into what I earlier asked, dividing the downtime in the different OLA's of the nodes?
thank you (o; _________________ Misuno
Green ITIL V1 - V3
Joined: Oct 07, 2007 Posts: 441 Location: Jakarta, INA
Posted: Sat Feb 14, 2009 12:02 am Post subject:
I don't know but I think it's not as simple as dividing 3% by 3.
For example:
- The server's availability is 99.9%
- The switch's availability is 99.8%
- The router's availability is 99.9%
- The communication line's availability is 98% (this usually is an external provider)
Taking all those, the availability at the end user would be 99.9% x 99.8% x 99.9% x 98% equals ...
This implicitly means that an end user could experience unavailability because either the server or the switch or the router or the comm line is not available.
indeed!
so when you promise an availability of 97% to your customer of a certain application then you must be sure all your OLA's underpinning the service must be in line with your agreement.
Especially because your customer is not interested in the server/network availability, only the end product.
This is not new and therefore I am curious about how others are streamlining this. You cannot setup an SLA with greather availability then the sum of your OLA's I guess...
Joined: Sep 16, 2006 Posts: 3590 Location: London, UK
Posted: Sat Feb 14, 2009 12:44 am Post subject:
All
An OLA is an Operational Level Agreement
An SLA is a Service Level Agreement
A KPI is a Key Perfomance Indicator
Availability statistics depend upon MTTR (mean time to restore), MTBF (Mean time between faults)
Mizuno,
The availability statistics that can be attributed to your company / service for your application ends at the point where the application depends on the PUBLIC internet.
If the application is used inside a Private network that is under control of the company (intranet - whether it is one site or a global private network), then you can provide end-to-end KPIs, availabilty and SLAs based on the OLAs that the operational teams state that they can support
But, the internal private network woudl still depend on, if it is more than one location, on external vendors like telcos etc. The OLAs for network support would then depend on the contractual agreements that the Telco has stated about the SLAs that they can meet and then adjusted as well as the fact on whether you are using a single source for telco (BAD!!! Single point of failure) or if the telco you are using are not given you divirgent paths for circuit pairs ( primary down one route, secondary down the other)
Now let us use the 97% availability. If the application is availabile on a 24x7 basis, then every day, the service must be available for 1396.8 / 1440 minutes per day - which means 43.2 minutes of down time per day
So the monitoring tool would gather statistics on a daily basis on the availability of the application.
There lies the rub.. What is the monitoring tool going to monitor ? If the application is a Web based front end application, where will the monitoring be done ? From inside the corporate network environment would be one set of monitoring tools. This would be usefull for monitoring internally available tools.... but
What if the web based application is available not only via the intranet but via the public internet and only available in certain countries.
Well then, you would have to use monitoring that can fake connectivity from the major ISPs in every country that allows your site to get an accurate availability monitoring. most ISPs do sell this kind of service and then you can monitor the sites
SO you do that for 3 countries - Japan, UK & USA - 3 ISPs per country
For example, how would you calculate the availability
For one day, the report for 12 February 2009 for the # of minutes the site is available
JAPAN 1360 1400 1410
USA 1325 1440 1400
UK 1300 1400 1405
This is what the monitoring tools report.
So based on the SLA that the site is available for 1396 minutes per day, what is the site availability ?
Trick question..... based on the above stats and the fact that all users can go to all countries.... the site is up 100%.. because your SLA does not differentiate between countries or ISPs
food for thought
So what is the availability _________________ John Hardesty
ITSM Manager's Certificate (Red Badge)
Change Management is POWER & CONTROL. /....evil laughter
yes ukviking and asril, I thought about the end point of our (I say our as we all serve IT) services as well. Imagine the electric circuit goes down in one of your sites it will not be calculated in your SLA's unplanned downtime. Same as the public internet example. That is clear an no concern.
The example of an outsourced partner is again a nice point. (I love working in outsoured environments btw) Imagine you have an UC with a nw provider having an availability of 96% you will never be able to promise your customer a higher availability unless you want to pay penalties or become an untrustable service provider.
Gosh, your last point makes me crazy. How will you ever make a H2Oproof SLA if you think like this? 1 site is down but the other one is up and running, does it counts for your availability contract or will you subtract the part this site stands for in % of the total? And more of this, imagine this site is the headquarter...
To me all these holes and open questions only shows that an SLA is a kind of symbolic agreement between service provider and customer (in a single organisation), which continues to stay interpretable. Can you call this best practice or good practice?
Therefore I stay with my initial thought to make sure OLA's and UC's are in line with SLA's until someone comes up with a genuinely idea. hihi
enjoy your weekend and if you also have st. valentine like we have in belgium enjoy a lovely valentine. _________________ Misuno
Green ITIL V1 - V3
Joined: Oct 07, 2007 Posts: 441 Location: Jakarta, INA
Posted: Sat Feb 14, 2009 9:44 am Post subject:
I've said earlier that it's not simple.
Plus what Viking said, there are many service parameters that are beyond control that makes SLA hard to maintain.
Therefore we use to agree with our customer, SLA parameters at some point where we (as the provider) can still hold control.
Using your case, we usually propose availability to be measured until the router and not including comm. line. Therefore no matter how geographically spread your users are, SLA would still be maintained
It worked fine along with other parameters (MTBF, MTTR, response time, etc).
that is a fine aproach.
I believe at next introduction or discussion with customers on % availability I would come up with the points which are interpretable but implement the availability until a general point, still symbolic.
I don't believe anyone was ever judged on 0.01% under the agreed service availability but more if services become unworkable unavailable.
Thanks all for the discussion and sharing of your ideas and knowledge.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum