THN PoP
Incident Report for Ai Networks Ltd
Resolved
Good afternoon.

As some customers will already be be aware, at 21:49 yesterday (11/07/19), an issue on one of our core routers caused disruption to many of our connectivity services, affecting Leased Lines, MPLS and site to site connectivity.

Our engineers have been working consistently through the night to get this issue resolved with a combination of traffic re-routing, detailed investigations, root cause analysis in conjunction with the hardware vendor and an external partner, whilst simultaneously provisioning alternative equipment as a means to bring services back online. We were able to bring some services back online throughout the night, but unfortunately not quite everything.

We’re pleased to report that as of 12:20 today (12/07/19), the issues have been fully resolved.

Due to the length and complexity of this issue, we require time for our engineers to get some much needed rest, and then we will be collating all of our investigation notes and compiling a thorough incident report which we aim to have with affected customers by close of play on Monday (15/07/19).

We sincerely apologise for the inconveniences this has caused our customers. Rest assured, we will be working hard to fully understand this issue and do anything we possibly can to mitigate any future occurrences of this.

If you have any further questions after receiving the Incident Report, please get in touch.

Kind regards,
Ai Networks
Posted Jul 12, 2019 - 15:13 BST
Monitoring
All affected services are now working and we are now moving to a monitoring phase before marking as resolved.

If you do still have an affected service, please contact our support team.
Posted Jul 12, 2019 - 12:20 BST
Update
Please be advised that we are about to take steps to resolve the wider networking issues currently being experienced. This will cause some disruption for a few minutes while devices are brought online and traffic reconverges. This is expected and there is no need to get in touch. Though please do let us know if this continues for more than 5 minutes. Apologies for any inconvenience this may cause.
Posted Jul 12, 2019 - 12:03 BST
Update
Our engineers are still working with Brocade in attempt to resolve the issue.

Additionally, we are still working in parallel to bring a new device online to help resolve the remaining issues.

We apologise that we do not have anything more concrete at this time, but can assure you that our team are treating this as their number one priority.

Next update at 12:00.
Posted Jul 12, 2019 - 11:11 BST
Update
In attempt to the address the issue we are taking two paths of action.

Firstly, we’re working with Brocade (the Vendor) to fix the problem, as we have been since the issue started. They previously sent a replacement line card for the router but adding this proved fruitless.

Secondly, we are installing a completely new device in case the work with Brocade doesn’t fix the issue. This is a time consuming process as we don’t want to disrupt the wider network.

The second item will take at least another hour to complete.

Next update will be before 11am.
Posted Jul 12, 2019 - 09:59 BST
Update
Work is still continuing to resolve the issue on the failed device causing the problems.

We apologise for the trouble this is causing.

We will post another update before 10am.
Posted Jul 12, 2019 - 09:05 BST
Update
We are continuing to work on a fix for this issue.
Posted Jul 12, 2019 - 08:26 BST
Update
Most services are now running as usual.

We’re working to bring back the remaining affected leased lines and hope to have these back online before 9am.
Posted Jul 12, 2019 - 07:39 BST
Update
Onsite engineers and the vendor support team are still working on bringing the new line card into service at THN.

Where possible, we have migrated traffic away from THN to ensure stability across the wider network while we continue to fix the router.
Posted Jul 12, 2019 - 05:53 BST
Update
Our engineers are continuing to work on the new card installation, and expect to have this resolved as soon as possible.
Posted Jul 12, 2019 - 02:39 BST
Update
We are continuing to work on a fix for this issue.
Posted Jul 12, 2019 - 02:38 BST
Update
The replacement line card has arrived on-site and engineers are working to install it in the chassis and update configuration.
Posted Jul 12, 2019 - 01:11 BST
Update
We have identified a fault with a router at THN and are working with the vendor for replacement kit to be shipped out. We also have parts in stock in Stevenage and this is being driven to site now.

Once the part is on site, engineers will carry out the replacement and begin work to bring services back on line. We expect either the vendor, or stock part to be onsite within 1 hour.
Posted Jul 12, 2019 - 00:14 BST
Update
We have confirmed an issue with one of our core customer facing devices in Telehouse North. This will affect any sites to site services terminating in this site, as well as any that use it to interlink with other providers. In addition to this, a number of leased line circuits are affected. On-site engineers have investigated the issue, and this has been escalated to two additional engineers who are on route to the site. This page will be updated with more information as it transpires. This is also affecting some DSL services.
Posted Jul 11, 2019 - 23:10 BST
Identified
We are identifying an issue with our THN PoP. Several leased lines appear to be affected as well as some other services.
Posted Jul 11, 2019 - 22:07 BST
This incident affected: London Network, National MPLS Network, and Northern Europe Network.