Managing service quality is not elusive, but, if attempted manually, it may seem so. Service quality management requires real time knowledge of every component that is involved in providing that service. For a service such as email or order management this means you must track status of the routers, switches, LAN and WAN links, interfaces to the LAN and WAN, applications, physical and virtual servers, databases, and so on. This is not a simple task. When something goes wrong with any of these components it immediately impacts at least a portion of the service users and puts IT under pressure to fix it quickly in order to restore service quality. Is there any way to proactively avoid the impact to service users and to IT as they cope with unexpected service issues?
I contend that managing risk is fundamental to managing service quality; without risk management every service issue will be unexpected, putting your IT staff in constant firefighting mode, scrambling to correct the latest infrastructure issue. In fact, managing risk is the only way to really improve your service quality since risk management proactively averts many service issues.
Let's look at one way to think about risk management. When an organization is building or adding to their infrastructure, robustness is typically an important consideration. IT organizations build in alternate network paths, primary and secondary routers, redundant servers and server farms, virtual servers and so forth to avoid those worrisome single points of failure. Redundancy works well - as long as it is preserved - and that's where risk management enters the picture.
For example, if an order management application is served by three servers and one goes down, your risk has just increased substantially even if the service quality remains high and the two remaining servers are able to meet the business requirement. Now the end of the quarter arrives and the two servers can no longer meet the business demand, meaning order management is degraded - at the worst possible time. What happened was that the robustness that was originally built into the infrastructure was compromised. If risk to service delivery is treated as seriously as impact to service quality, the robustness of the infrastructure will be preserved and many instances of service impact will be avoided altogether.
Managing risk means paying attention to the "heads up" you receive from your infrastructure management solution. From a service quality perspective, service outages are more often than not preceded by service degradation, and service degradation is often preceded by a weakening of one or more infrastructure components in the delivery path of that service. With a full view of your infrastructure you can see the potential points of failure and restore infrastructure integrity before service is impacted. Proactively treating the source of problems rather than treating the symptoms after the fact is a far better approach to managing service quality.