In this blog, I will tell you more about how one of our customers, a large B2B service provider, implemented 5 of these type of use cases in only two weeks. You will learn that there is a lot of low hanging fruit for network operations teams to benefit from.
Network Compliance is more than you think
The first thing you need to understand is that Network Compliance can be so much more than dealing with security and regulatory requirements. Maybe compliance is not the right word, but I’m struggling to find another term. If you have suggestions, please feel free to share them with me! It should be clear that approaching your network challenges from a compliance perspective can be very beneficial. The best part is that it is non-intrusive; you don’t have to change anything in your network for it to work. When I speak to customers, a lot of them see compliance as a burden, while we see it as an opportunity to drive network availability and agility.
Focus on the actual state of your network
Recently one of our customers, the network operations team of a large B2B service provider, shared their concerns with us about the different types of outages they experienced. They had monitoring software in place but were still struggling with critical errors in their daily operations. An analysis was made and they learned that despite having robust network designs, devices were still wrongly configured in the network. Reasons varied: power outages, human errors, customers’ actions etc.
Together we came up with specific ways to check the network more proactively and on a deeper level than only monitoring. Monitoring software tells you when devices are up or down, or go beyond a certain threshold, but it does not tell you the actual configuration state and if this is compliant with the way you designed it to be. It’s a deeper level of monitoring, for the sake of clarity, let’s call this advanced monitoring :-)
Define use cases with the most impact
We used an agile approach and defined 3 use cases that should have the most impact on preventing critical outages. The bonus was that we also found 2 other use cases that could be implemented quickly and increased customer satisfaction tremendously!.
We implemented the following compliance policies:
1. Prevent redundancy errors
Although redundant networks were designed to fall over, in daily operations there were always situations where this didn’t work. The reason was that the configuration had changed or not configured correctly in the first place; either through a reset, a human error, or a new vendor version.We implemented a policy that checks both the configuration and device diagnostic information on a daily basis. This way we can make sure redundancy is configured according to the latest templates but also that the backup is active and up and running as it should be.
2. Prevent EVPN ethernet-segment carving errors
The current monitoring software only checked if the ports on ethernet switches were up, not if the load was properly balanced according to the plan. We implemented a policy to make daily checks on the correct load balancing of the ethernet switches since these were the ones to cause the most problems. Result: no more errors and outages due to load balancing problems.
3. Unsaved fixes waiting to cause outages
We found that when engineers solved problems on a device, they sometimes forgot to save their running config. In case of a power outage or a reset, the device would use the startup configuration and not the one that solved the problem. A similar thing happened when engineers worked on a device and temporarily removed the ACL (Access Control List). Sometimes they forgot to put it back causing serious security problems. We implemented a policy that made daily checks if the running configuration is in sync with the startup configuration and a separate policy that checked whether the ACL was put back.
The three policies above already prevented a large number of outages. But we also implemented two more that had an immediate impact on time to market and cost reduction:
4. Proactively check MPLS label availability
This service provider deals with large business customers that require new connections on a daily basis. When a new connection is implemented, labels have to be assigned and sometimes the PE router is running out of MPLS labels. This causes enormous delays for the customer and missed or delayed revenue for the service provider. We implemented a policy that checks for the availability of labels on a daily basis. When the number reaches a certain threshold, a trigger is sent to the appropriate team to take preventive action. Result: no more delays because of label availability.
5. Prevent smart licensing errors
The operations team uses smart licensing for Cisco devices. Sometimes there is no license available, resulting in unregistered nodes. We implemented a policy that does a daily check of all the nodes that use smart licensing, all unregistered nodes are picked up by a scenario that automatically fixes the license error by issuing a new token. There is no manual intervention anymore, everything is done automatically.
Rinse and repeat!
The above use cases show you that it’s really worth it to look at your network from a compliance perspective. It helps you to identify the actual status of your network resulting in fewer outages, and thus a better network agility, availability and security.
We defined and implemented the 5 policies in just two weeks. The next step is to define more policies that positively impact network operations and define strategies to auto mitigate non-compliance issues. Are you curious to see what compliance can do for your network? Make sure to reach out to me or one of our team members. We’re happy to help!