CA Community






This Blog

Disasters, they DO happen…

Published: July 29 2010, 09:09 AM
by Marcel den Hartog

The world watched the BP oil accident in the past weeks. We all have our opinions about it and, like most others; I tried to look at it from a professional point of view.

Risk management is something we have to deal with. Either direct or indirect. And most of us are aware of the importance of IT to the companies we work for. But how thought through is our risk management really? Do we plan for the unthinkable? Or do we just plan for the things we know DO happen from time to time. To use the example of BP, if they had really thought of the unthinkable, they would have drilled a second pipe in every well deeper than "n" feet/km. Would BP's management have accepted the extra costs, would the shareholders have been happy with the lower dividend that would be caused by this? Did anybody even ever think about that?

Let me use another example: about 5 years ago I presented for a 100+ audience of Oracle DBA's. All Unix & Windows people. And I asked a simple question: "How many of you have ever deleted a production database by accident?" More than 80% raised their hands (but only after I admitted that it happened to me as well ....).
And when I asked if they had changed the security settings on the production systems so they would be prevented from doing it again, only 2 left their hands up. Think about the effect and the costs of a deleted Production Database. Think about the low cost of setting the right access levels and taking away the "God mode" Super user access from people who should not even be on production systems.... Has this changed in the past years? I don't think so. As a matter of fact, I know it has not. And this is a risk we can all think of from the top of our minds.....

There is an important lesson for all of us in the oil spill example. Not just about thinking about the unthinkable, but also in the way the disaster was handled. We all have experienced a disaster in our careers. And most of us have been lucky enough to see more than one. We have probably implemented all kinds of stuff to prevent THAT particular disaster to happen again, but what have we implemented to treat new disasters more efficiently and streamlined? We are all technical people and we tend to focus on technical things that solve technical problems. We often forget that a lot (if not most) of the time spent on solving a disaster is not spend on technical issues, but on the panic that occurs immediately after the discovery and the unclear responsibilities  of everybody involved.

Here are some hints and tips from the field:

1. Analyze past disasters, and see how you can create a playbook that deals with the human part of disasters. Who makes the decision to roll-back the production database back with 24hrs if that is necessary? Both on the production-, the middle management- and executive management level. Who informs who, and who approves what.

2. Think of the unthinkable by networking. Both Internal and External!! These days, we spend less and less time going to conferences, listening to webcasts and talking to peers. And I have seen brilliant presentations with lots of "aha" moments when peers or other specialists described a disaster they were confronted with and how they solved it. Implementing the lessons learned by others is cheap and often closer to reality than what you can come up with.

3. Work with the financial people to figure out the cost of a disaster. 1 hour, 2 hours, 1 day of down-time? How much money, customers, good-will does the company loose in case of a major hick-up? I am sure if someone in BP would have told an executive that by drilling a second pipe as a precaution, it could save the company 6 Billion $ in case a disaster stroke, at least a few people would have given it some serious thinking.....

4. Test. TEST!!! With budgets being constrained almost everywhere, testing is amongst the first things we drop. Takes too much time, is too expensive. Once you know the COST of downtime, it becomes easier to defend the things you need to do to prevent them. I still remember a new CEO discussing a DR testing plan with a Mainframer:

a. CEO;"Has it ever happened?" (answer: "No")

b. CEO;"How much will it cost us if it DOES happen?" (answer: "I don't know")

c. CEO;"How much does it cost to prevent it from happening?" (very detailed answer...)

d. CEO;"Forget it, too expensive!" Literally a 5 second conversation on a topic that could bring the company to bankruptcy in less than a day.

Let me leave you with a positive thought. IBM's new zEnterprise Mainframe is designed to " expand the mainframe's role in coordinating management of other, distributed systems that have surrounded it in the modern datacenter"". Good news for Mainframers because our DR procedures are unsurpassed, and good news for the "other" systems because they can now benefit from the more integrated and powerful Mainframe management capabilities.

 

Share this post:  

 

By: Marcel den Hartog
Marcel den Hartog is Principal Product Marketing EMEA for CA Technologies Mainframe solutions. In this role, he is a frequent speaker on both internal (customer) and external events where he talks about CA Technologies mainframe strategy, vision and market trends. Marcel joined CA Technologies in...
Read More..

Comments:

No Comments

Leave a Comment

* An asterisk indicates a required field

* :  

:

* :  

 Submit