For general information and resources, ITIL and ITSM World is the most well known for both ITIL and ITIL Books. A shorter snapshot approach can be found at ITIL Zone
Note: ® ITIL is a registered trademark of OGC. This portal is totally independent and is in no way related to them. See our Feedback Page for more information.
The Itil Community Forum: Forums
ITIL :: View topic - Incident Closure in a High Availability Environment
Posted: Fri Nov 28, 2008 5:39 am Post subject: Incident Closure in a High Availability Environment
Currently at our office we are having an ongoing debate on incident closure for a high available environment. Here is the example we are using:
2 Servers A and B, Server B is the redundant server to Server A. Server A fails and Server B in short order picks up the role of Server A.
We are in agreement that an incident is opened for the issue but as Server B is up and running should the incident be closed or should it remain open until Server A is back online.
The thought for keeping the incident open is in a redundant configuration, extra expense was incurred to establish a "fault tolerant service"; so the service is a premium cost service intended to guarantee a high level of availability. While end user services have re-established, the fault tolerant service is not restored until the original failing server is restored; so there is outstanding risk while the primary server is still down -and customers are not getting what they paid for.
The thought for closing the incident is that the service has been restored. While IT is concerned about the restoration of redundancy to the user there is no incident. Through problem management the root cause and eventual return to service of Server A would be facilitated.
Joined: Mar 04, 2008 Posts: 1883 Location: Newcastle-under-Lyme
Posted: Fri Nov 28, 2008 6:14 pm Post subject:
I would say quite firmly that the service has not been restored until it is once again fault tolerant. It is incident management that will drive the restoration of the second server with due speed.
Restoring the second server will allow you to close the incident and this does not have to wait for root cause analysis to kick in unless you simply cannot get it recovered.
In fact, it remains a separate decision whether to raise a problem or not. That depends on what your policy is and how well you understand the cause of the incident. _________________ "Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum