Dictionary.com defines downtime as “an interval during which a machine is not productive, as during repair, malfunction, maintenance.” For Gartner, downtime is the total time a system is out of service.
However, you define downtime as when something isn’t doing what it is supposed to be doing in the period for which it is expected to do something. Reads like a tongue-twister.
Downtime is a recurring issue for many businesses today – digital, physical or phygital (first coined by Australian agency Momentum.
Even in the digital economy when redundancies are a fact of life, at least for regulated businesses, downtime remains a clear and present danger to a business’ operations and brand.
Gartner’s Andrew Lerner said network downtime translates to about US$300,000 per hour.
FutureCIO spoke to Darius Liu, co-founder and chief operating officer of ADDX, for his take on what downtime means to the chief operating officer (COO).
What is downtime in the digital economy?
Darius Liu: Downtime happens when customers of a digital platform can’t access services during regular service hours. Typically, it means they have trouble logging into their accounts either to retrieve information or to carry out transactions.
All financial institutions regulated by the Monetary Authority of Singapore are required to report to the regulator occurrences of downtime within an hour of discovery. Furthermore, they must ensure total downtime does not exceed 4 hours in any rolling 12-month period (per MAS guidelines).
Why should the COO care about downtime?
Darius Liu: Customers must trust a company to use its services. They often must hand over money to make a purchase or to store, send or invest their funds.
Downtime can cause a lot of anxiety among customers as it could leave them wondering what might be happening to their money or whether a service will be delivered. It can have a serious impact on customer trust and by extension the brand and reputation of the firm.
For financial institutions in the wealth and investment sector, reducing disruptions to a minimum is even more critical, because time is of the essence in the financial markets.
An inability to buy or sell securities at the desired moment could result in real financial losses. It could also impact a financial institution’s license. Moreover, wealth and investment platforms are essentially built on a foundation of trust – no user is going to participate in a platform that is not robust.
For these reasons, it is important to leave no stone unturned in our efforts to build a strong, secure and resilient tech system and to provide uninterrupted services to investors on our platform.
Is the COO any less accountable in the digital sharing economy – particularly where the use of microservices is the norm?
Darius Liu: The outsourcing of a service or a microservice to a vendor does not in any way absolve a company of its responsibility to customers and regulators. The buck stops with the company providing the service, not its vendors.
Companies must carry out proper due diligence on vendors and service providers. They should not outsource blindly and should instead maintain a clear understanding of the vendor’s technology and back-end processes to formulate a downtime recovery plan.
Companies should also maintain redundancy and develop in-house competencies to complement the work of vendors.
Given the greater use of public cloud services to run business-critical applications especially following COVID-19. How does the COO mitigate the risk of downtime when the infrastructure is outside their control?
Darius Liu: Even though in theory the use of a cloud service places the infrastructure outside your company’s control, we take the view that all things considered, the risk of downtime is lowered. A cloud computing provider is operating at a much larger scale.
It hires a bigger and more specialised team to maintain its servers and can provide its clients with access to servers in different geographical areas, to reduce risks tied to local factors.
To mitigate risks further, cloud providers run crisis simulation tests at regular intervals to ensure our teams know exactly what recovery steps need to be taken if access to cloud services were disrupted. We also have robust backup and recovery protocols to mitigate worst-case scenarios.
What questions should a COO ask their cloud provider as it relates to downtime?
Darius Liu: The number of availability zones your company’s platform will operate from is an important issue you need to discuss with your cloud provider. In cloud computing, an availability zone refers to a cluster of servers located in the same geographical area.
The cluster is independent and isolated from other availability zones in terms of power source and network connectivity. We sign up to multiple availability zones – these costs more but it means that if one zone goes down, your platform can continue running from another zone.
Downtime at exchanges still happen. What can we learn from your experience managing downtime?
Darius Liu: One important principle is the need to reduce concentration risk through redundancy. That is why we operate from multiple separate cloud service availability zones. But the cloud service isn’t the only service we need.
There are many other services that are required for the operations of an exchange – one example is the use of SMS providers to enable two-factor authentication for account log-ins. In line with the principle of maintaining redundancy, we maintain the ability to switch between multiple SMS providers.
If you are ever curious about the cost of downtime, a Data Foundry blog provides a formula. See if that works for you.
Productivity cost = E x % x C x H
E = number of employees affected
% = percentage they are affected
C = average cost of employees per hour
H = number of downtime hours
Revenue loss = (GR/TH) x I x H
GR = gross annual revenue
TH = total annual business hours
% = percentage impact
H = hours of downtime