Wednesday, December 03, 2008
Network issue
There was a severe network issue in london affecting lots of our broadband services for several mintues, and 21CN customers are still affected. We expect this to be resolve very soon.
Subscribe to:
Post Comments (Atom)
2 Follow-up Messages (Posted by AAISP Staff):
We expect to announce some planned work for Sunday to sort what happened with an ethernet loop today which was created though a third party enabling a port. This should not be able to take down our network like this, obviously.
We are also looking in to why exactly there was a side effect on the 21CN lines which meant they took longer to come back.
Post mortum for the technically minded...
We have two data centres in London. The plan was the new one would replace the old one but delays in BT's 21CN network upgrades mean we are running two networks until late next year.
As a temporary measure we set up Ethernet connectivity between the two sites via a third party link provider. As we are now stuck with two data centres for some time we have then upgraded this link to have dual Ethernet connections at both ends. This provides redundancy for link failure, port failure, and we connect each link to different switches our end to reduce the impact of switch failure.
One end was set up some time ago. However, making dual Ethernet links like this means a loop, and protocols such as spanning tree are used to stop this. We confirmed with the link providers that this would work and connected the links outside normal hours. All was fine.
The second end was delayed by various factors and was only finally connected today. We had our side all set up and as we had done this before we expected no issues.
The link was enabled and a loop created. This broke service for several minutes until we got the link disabled. Now the link suppliers say that only some spanning tree systems work. Nice of them to say that now!
The network flood that resulted seemed to then cause issues tripping some DoS handling code (we think) which is why 21CN lines took longer to come up. That is being investigated now.
We always learn from issues like this and are reviewing processes and investigating where improvements can be made.
Sorry for the inconvenience.
Post a Comment