The Uptime Institute’s 2022 Outage Analysis report warns that downtime costs and consequences are worsening as industry efforts to curb outage frequency are falling short of expectations.
Uptime’s 2022 Data Center Resiliency Survey reveals that 80% of data centre managers and operators have experienced some type of outage in the past three years – a marginal increase over the norm, which has fluctuated between 70% and 80%.
What causes data centre downtime?
Sebastian Krueger, vice president for APAC at Paessler says availability plays an important role when monitoring large or distributed data centres and it is a business service which goes along with the availability of IT services. CIOs and IT teams are jointly responsible for avoiding downtime at all costs.
“In today’s hyper-competitive environment, not only does downtime lead to annoyed users, but can also quickly escalate to financial and reputational losses for a company. The availability of business services goes in conjunction with the availability of IT services, particularly for Managed Service Providers (MSPs), where customer satisfaction is critical.”
Sebastian Krueger
By the numbers
For context, 99% availability means a downtime of 87.7 hours per year, which is roughly equivalent to 3.65 days. When it comes to the availability of your website, servers, and databases, organisations solely focus on one number: 99.999%, which means downtime of just 5 minutes per year.
Krueger says in large data centres, it becomes essential to keep a constant vigil on a range of different components, across IT and in building services engineering, to detect glitches as early as possible and avoid problems, damage, and breakdowns.
“While network failures, hardware or software malfunctions, power outages and human errors have been some of the common causes of data centre downtime, cyber-attacks have also emerged as a formidable challenge today for data centre operators,” he continued.
“Similar to mechanical failures, the threat posed by natural disasters is inevitable. Therefore, understanding your data centre’s geographical location, and the potential risks involved can go a long way to minimising any downtime,” expanded Krueger.
What are the options that organisations use to minimise downtime?
Sebastian Krueger: Predictive maintenance anticipates future problems with IT infrastructure through forecasts and predictions made by analysing real-time data obtained from sensors and IoT, allowing organisations time to identify and work on anticipated risks. Predictive maintenance employs technologies such as machine learning to model and analyse real-time data and optimise the execution process, drastically reducing infrastructure downtime.
In the context of data centre infrastructure, predictive maintenance is generally focused on hardware devices and systems such as cables, generators, and air conditioning systems. After all, data centres work 24/7 and larger data centres do tend to have more hardware equipment that requires regular maintenance. An important part of implementing and maintaining a predictive maintenance programme, however, is a monitoring solution.
By providing a centralised overview of the entire data centre infrastructure, a holistic monitoring solution supports a predictive maintenance programme in monitoring the sensors and IoT devices that provide real-time data.
A monitoring solution, however, is not just limited to supporting a predictive maintenance programme. When it comes to data centres, monitoring solutions support the monitoring of all IT components, which includes external facilities and security, as well as customisable alerts and reporting.
How can network infrastructure monitoring help organisations tackle these challenges?
Sebastian Krueger: Having a comprehensive network infrastructure monitoring approach can play an effective role in reducing downtime. Engaging the right solution can help IT administrators receive customisable notifications to swiftly act if a malfunction is impending or has already occurred.
This way, one can intervene immediately, often before the error even arises, to avoid a situation where the technical support team will be inundated with countless calls and emails, adversely affecting overall productivity.
With downtime leading to financial losses, what organisations can do is ensure that the data centre availability aspect is covered in the entire network infrastructure strategy.
Detecting problems at an early stage can avoid data centre downtime and this is where ‘predictive maintenance’ gains relevance through which businesses can monitor their physical assets to identify and ascertain any likely snags that may crop up in the future and perform corrective actions before the components malfunction or fail.
What trends do you predict to see in the next few years with Southeast Asia as the new global data centre hotbed?
Sebastian Krueger: The data centre industry will aggressively adopt more climate-friendly practices and approach the climate effort more purposefully. Operationally, they will embrace sustainable energy strategies that leverage digital solutions that match energy consumption with 100% renewable energy. AI and machine learning will play a critical role in optimising the performance of networks, adding up to an accelerated AI adoption in 2023.
Data centre availability will continue to remain the top priority, even at the edge, but the need for lower latency to support emerging concepts such as healthy buildings, smart cities, distributed energy resources, and 5G will continue to rise. 2023 will also witness the next step in integration as data centres collaborate with providers to optimise the integration of larger systems.
Hosted monitoring solutions for all customers who move to the cloud, with all the benefits that entail, will be key to monitoring this hybrid infrastructure. It will deliver the obvious advantages of flexibility of costs that can be mapped in a subscription model and your own managed servers.