You can only do so much to ensure that your soluion will work but - it must work to support our business. We must keep in mind that 1) the scenario can never be accurately planned for (we are not fortune tellers - and we cannot control disasters and how they manifest); 2) businesses and their priorities change - impacting the technical side of our work.
What I have found, however, is that there are some fundamentals, that if considered in designing, implementing and validating the solution, can reach a consistent level of integrity that helps in answering your question - in a positive way.
1. A clearly written and agreed-to definition of what a successful disaster recovery event means to the business. This should be reviewed twice a year — even management experiences re-organizations.
2. A clear understanding of what the business defines as mission critical and the technology that supports these services and applications/technology.
3. Enterprise architecture. The ability for the dr solution to integrate into the existing architecture or adhere to architectural precepts will help to cut down on some of the risk that the solution will not work.
4. The architecture of each application and its components - problems integrating into the overall architectural environment and those extra steps that are required to ensure that they can failover together in order to meet the RTO/RPO.
5. An extremely detailed failover plan - minute by minute and technology by technology — including all processes and procedures to fail over and fall-back - that becomes the fundamental training source and guideline for a disaster. This must be reviewed quarterly and tested as walkthroughs and through real testing multiple times during the year. As well, it should be audited once by an outside organization and upgraded for all new technology/upgraded technology integrated into the environment.
6. The skillset of staff required to support the dr solution and how often they are trained in the process - including a thought to outsourcing for the event if staff are not available.
7. A business continuity plan for the IT department - to ensure that disaster recovery teams with primary/secondary responsibilities are identified and practiced. Remember that IT representatives are people too and need to be considered in planning - in the same way that the business is.
8. All third-party software/carrier/infrastructure contracts are up-to-date and define roles/responsibiliites for systems/technology/applications during a dr event — and how they plan to handle their own dr event as well as notification plans so you are aware of their issues before it becomes a problem for your business.
Most importantly, one thing that I learned in 9/11 was that until you have your approved dr solution in place, you have to identify temporary solutions - that are agreed-to by the business. You cannot build out technology overnight. However, you can have agreements for temporary solutions should an event occur while your overall solution is being built.