Detect CPU spikes with Embedded Resource Manager
David Winter wanted to detect high-CPU spikes and act on them. The first part (high CPU utilization) could be done with SNMP, but since IOS release 12.3(14)T, the right tool for the job is the Embedded Resource Manager (ERM).
The ERM syntax is a bit baroque (and not well documented), so let's work through the example: this is the configuration you need to detect high overall CPU utilization on the main CPU in the box:
resource policy
policy HighGlobalCPU global
system
cpu total
critical rising 95 falling 70 interval 10
major rising 75 falling 50 interval 10
!
user global HighGlobalCPU
And here are the usage/configuration guidelines:
- The whole ERM subsystem is configured under the resource policy section;
- You always have to configure a policy and a user to which the policy applies. In our example, the user is global (as we're measuring the global CPU load);
- The policy we're defining must have the global keyword to indicate we're measuring overall utilization (otherwise you can't attach it to the global user);
- We're measuring the load on the main CPU, so we're configuring the system subsection of the policy (on distributed platforms you could specify slot name to measure utilization on a specific linecard);
- The cpu section selects CPU load measurements. You could measure interrupt load, process load or total CPU load.
- Within each resource section in the policy (in our example, total CPU load on the main system) you can define minor, major and critical thresholds (syslog messages are generated when each threshold is crossed).
- After the policy is defined, it's applied to the global user.
With the CPU load measurement policy defined, the router will generate syslog messages (SYS-4-CPURESRISING) every time the overall CPU load exceeds the specified rising thresholds. When the utilization falls below the falling threshold, the SYS-4-CPURESFALLING syslog message is generated.
This article is part of You've asked for it series.
2 comments:
Interesting article. I tried on 2 different routers and could see the CPURESRISING logs, but not the falling logs. Any ideas? If the fall is within the interval, will the fall not be logged?
The FALL should be logged when the CPU load goes below the falling value ... and please note that the falling value should be less than the rising value.
Post a Comment
If you're using Internet Explorer, your first attempt to publish a comment will probably fail (a feature of Blogger). Don't worry, just press the Post Comment button again.