TCLUG Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Post-op (fixed line-wrapping) (Was: MrNet sucking yet again)
Eesh, what nasty line wrapping! Sorry 'bout that..
--- BEGIN ---
Date: Thu, 6 Apr 2000 02:46:57 -0500 (CDT)
From: Dan Boehlke <dboehlke@MR.Net>
To: tech-announce@MR.Net
Subject: NETWORK-STATUS: Onvoy core routing problems and congestion
issues addressed.
Greetings,
We have found and resolved several major issues with Onvoy's core
network. These all contributed to what we consider to be a catastrophic
failure of our network infrastructure.
Thanks to help from many of our customers we have tracked down many of
the routing problems, including routing loops within the core
infrastructure. We have closely examined our BGP routing configurations
and believe that we have resolved any architectural issues with its
design. We are still tracing down some isolated issues with individual
customers. Many of these have been long standing issues and
complexities, that were brought to light because of the configuration
changes this last weekend that were intended to increase the performance
of the network by retiring the Cisco 7507 routers from core service, and
allowing the Juniper routers to fully replace them.
We discovered that traffic between the Juniper routers was bursting in
such as way as to overwhelm the output buffers on the OC3 ports which
run the inter-hub-room OC3 lines on our ATM switches. The Juniper
routers have OC12 ports and while they were not sending more than an
OC3's worth of data, they were doing so in such a short interval to
cause cell loss in the ATM part of the core network. We have applied
traffic shaping to each virtual circuit to keep its traffic within the
bandwidth allocated in the network. This resolved the bursting issue
and restored our confidence in the ATM network core. The cell loss
issue in the core has been building over the last few weeks as traffic
increased. Shaping on the edge is one of the best ways to deal with the
management of Internet traffic in an ATM network. We also moved a major
customer from a port on the ATM switch at MSC to an ATM router port so
that we could be sure that their traffic was not part of the problem, it
turns out that their connection did not contribute to the problem.
Once the traffic shaping was in place things got a little better,
however we were still suffering cell loss at MSC. Investigation
revealed that the OC12 port on the Juniper router needed to be
replaced. Upon replacement the error cleared and so did all the packet
loss and latency issues with the MSC routers.
We apologize for the length of this outage. We called in our best
people to work on it. They have spent many hours under great pressure
to get this resolved. It was not been easy to get to the bottom of the
issues, the combination of issues hid the true cause of the problems so
that only worked to find the problems.
Thanks to our customers who supplied us with crucial information on
their view of the problems. Without this we would not be as sure as we
are that the issues are resolved. I was impressed by the patients shown
by our contacts and their willingness to help.
We are still dealing with some isolated routing issues. Many of these
are due to minor configuration issues. We are still working to find
them, however do not hesitate to report any problems that you may still
see with the network. Thanks.
Onvvoy Technical Support
trouble@onvoy.com
612-362-5890
Please contact your customer care representative or sales person with
any non-technical issues. Thanks.
--
Dan Boehlke, Senior Network Engineer Onvoy
Internet: dboehlke@onvoy.com Formerly MEANS and MRNet
Phone: 612-362-5890 2829 SE University Ave. Suite 200
WWW: http://www.onvoy.com/~dboehlke/ Minneapolis, MN 55414
--- END ---
--
_ _ _ _ _ ___ _ _ _ ___ _ _ __ Error 103: Dead mouse in
/ \/ \(_)| ' // ._\ / - \(_)/ ./| ' /(__ hard drive.
\_||_/|_||_|_\\___/ \_-_/|_|\__\|_|_\ __)
[ Mike Hicks | http://umn.edu/~hick0088/ | mailto:hick0088@umn.edu ]