Initial reports from BA pointed to a power surge at a data centre at Heathrow that shut down servers, and the backup systems failed to kick in. This superficial description raised a lot of questions that are slowly being answered.
“There was a loss of power to the UK data centre which was compounded by the uncontrolled return of power which caused a power surge taking out our IT systems. So we know what happened we just need to find out why,” said BA after its initial statements about a power surge. “It was not an IT failure and had nothing to do with outsourcing of IT, it was an electrical power supply which was interrupted.”
The power supplier, UK Power Networks, has however categorically denied that there was a power surge. The BA statement points to problems with the power sequencing when starting up systems, although whether these were the main servers or the backup is unclear. This led to the messaging systems being compromised so that systems could not communicate, leading to the cancellation of all BA flights around the world on Saturday afternoon and Sunday.
Since then, staff at the data centre have pointed to the infrastructure as the problem. While the servers and power systems were upgraded, the cooling systems had not kept up. This led to temperature spikes and servers and power supplies overheating and shutting down. This would be more consistent with the explanations given by BA and also the problems with the power sequencing if some systems did not respond. It raises questions about the disaster recovery strategy though. It is possible the backup systems were in the same data centre and so suffered from the same infrastructure problem. If the backup was offsite, then that says the specification of the cooling system was at fault as the same problem hit.
The company has now commissioned a detailed report on what actually happened. “We are undertaking an exhaustive investigation to find out the exact circumstances and most importantly ensure that this can never happen again,” said the company.
What this also highlights is the increasing need for intelligent power to monitor not only the current and voltage but the temperature profile of the racks. Connecting this to the Internet of Things and effective big data analytics would have given some early warning of the coming problems.
BA’s management is now pointing to a single contractor who is said to have unplugged the power to the data centre, bypassing the backup diesel generator. Plugging the power back in caused a power surge and an unstructured startup, said WIllie Walsh, CEO of BA’s parent company IAG. The site maintenance contractor CBRE has denied this.
This again raises a number of key questions, from allowing a single point to failure on the power system, to why this did not cause a failover to the backup servers. In scrabbling for an explanation, the company may have highlighted signifcant systematic neglect of its power architecture, which will surely mean senior managers will lose their jobs as a result. However, this seems a highly unlikely situation. We, and the power industry, await the report with interest.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.