February 18, 2013

High-Level Framework: System/Technology/Application Recovery

Filed under: Disaster Recovery — houtkin @ 8:47 am

In the perfect dr world, all technology/systems/applications should go through 4 levels of testing before they go into production - and have an architectural / design document, as-built design, operations model and failover process - if you are lucky enough to have the staff and bandwidth to do this work. Reality dictates that this is not always available but we cannot get away with thinking we can recover an application/system/technology without understanding the basics: the business requirement / use of this application/technology/system and its criticality to the business; enterprise architecture, the architecture of the technology and how it integrates into the enterprise architectural precepts and then how to successfully recovery the system/application/technology.

So, the basics for a framework is an understanding of:
1. The business process that is manifested through the technology/system/application;
2. The RTO of the business process and the system/technology/application;
3. The applicaiton/system/technology architecture and how it integrates into the overall architecture of the technical environment;
4. The operational model - and how the system/technology/application is maintained.

Recovery does not necessarily mean a failover unless the time to recover surpasses the RTO agreed-to with the business. Items required for all system recovery requires:
1. architectural design document;
2. as-built document;
3. operations model and related processes/procedures;
4. recovery processes/procedures
5. testing script for both infrastructure (server, os, database) and application-levels

Other considerations:
-What is recovered: application/technology/system AND data? If so, what is the RPO of the data and can your recovery methodology meet that expectation?
-What up/down-stream technical dependencies are impacted by the outage and then recovery of the technology/system/application.
-What core infrastructure comprises the application/system/technology and what application-level procedures require failover or not. In other words, based on what “goes down”, what is the path to technical least resistance to meet the RTO;
-What skillset is required to recover the application and the various levels (infrastructure/database/application).
-Recovery methodology: do you recover in isolation and then integrate into production, etc.
-Security requirements during recovery and integration back into production; e.g. access control; vulnerability, etc.
-What is the recovery sla with the vendor, if a third-party or managed system/technology/application.
-What is the agreed-to scope of work with the vendor, if a third-arty or managed system/technology/application.
-What policies are in place (or not) to handle recovery.
-Governance - who determines that the applicaiton/system/technology has been fully recovered?

Off the cuff - this is a baseline idea from the technical side.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress