Whether you’re running a server for an application or just your local machine, you want your computer running its best. That’s why you should make sure to monitor your computer’s CPU, disk, and memory usage. Monitoring things like memory usage and disk usage are actually pretty easy. How much disk space do you have left? Is your disk fragmented? Do you have enough RAM for the applications you’re using? Are you regularly swapping memory?
Those are easy questions to answer, and they’re questions you should ask. But some questions are more complicated, especially when it comes to CPU metrics. Sure, there are simple metrics like overall CPU usage. But if you’re using Linux, you have more esoteric (and sometimes more useful) metrics, like CPU load. In this post, we’re going to talk about what the CPU load metric actually means, and how you can use it to optimize your servers.
CPU Load vs. CPU Usage
The first thing to understand is that CPU load is not the same thing as CPU usage. Even though they might sound similar, they’re quite different. CPU usage is a measurement, in a percentage, of how much time the CPU spends actively computing something. For instance, if you had a program that required uninterrupted processing power for 54 out of the last 60 seconds, your CPU usage on one core would be 90%. Instead, if the program only required six seconds processing time on one core, the usage would be 10%.
Most companies seek to keep the CPU usage of their servers as close to 100% as possible. Most servers are sold by overall computing power, and if your server is only sitting at 30% CPU usage, you’re paying for too much processor power. You could downgrade your processor to a lower tier, save money, and see no reduction in the quality of your server’s performance.
CPU load is different. Instead of measuring the percentage of time that the CPU is working, CPU load measures how many programs are using or waiting for a processor core at one time. A completely idle processor would have a load value of 0. In our previous example, we would have one-minute load times of .9 and .1, respectively. That’s pretty intuitive. However, what’s unintuitive about CPU load is that a value of 1 isn’t necessarily a fully loaded measurement.
Instead, what constitutes “full” CPU load is dependent on how many operating system threads your CPU boasts. Most CPUs these days constitute more than just one CPU core, running more than just one CPU thread. If your CPU has four cores, then a CPU load of 1 isn’t a fully loaded CPU. Instead, it would only be one-quarter loaded. Instead, your CPU load would be considered full for your CPU if your load value was 4, not 1.
Deciphering Linux CPU Loads
Running a program that displays Linux CPU load totals can be a bit confusing at first. If you run a command like uptime or top, you might see a CPU load value that looks like 1.68 0.55 5.91. If you don’t know what you’re looking for, this looks like noise. Thankfully, this is actually pretty easy to decipher. There are three values here, and we’ll look at them in sequence.
The first value is 1.68. This is the value of CPU load during the last minute. Like we mentioned before, this is a measure of how many programs were using CPU time during the last minute. So, during the last minute on this machine, there were an average of 1.68 programs either using CPU processing time or waiting for CPU processing time. If this is a single-threaded CPU, that means the computer is overloaded. Users are waiting for their programs to run on the CPU, and experiencing degraded performance. If, instead, this is a dual-core computer or a quad-core, users are able to get CPU time just as quickly as they needed it, during the last minute.
The second value is 0.55. This is the measurement over the last 5 minutes. As we previously discussed, a measurement below 1 means that the CPU spent some of the time in that window completely idle. In this case, the CPU was idle for almost half the time. If we’re optimizing our CPU to be constantly doing something, that’s not a good sign.
The final number, 5.91, is a measurement of the last 15 minutes. If you’re using an eight-core CPU, then this number isn’t particularly shocking. If you’re using a dual-core CPU, then a number like 5.91 means your CPU is very overloaded. Users are regularly waiting for CPU time, and are probably experiencing significantly degraded performance.
Unix vs. Linux CPU Loads
If you’re used to using the CPU load measurement in Unix, some of these numbers might seem kind of high to you. That’s because Unix and Linux measure CPU load differently. Unix measures CPU load expressly as a measure of programs that are actively using or waiting for CPU processing. Linux measures things a little differently, and understanding how is key to good system administration.
Linux measures CPU load by looking both at programs that are currently using or waiting for CPU time in addition to programs that are in waiting states. For instance, a program that’s waiting on a network request, or to load something from a disk, wouldn’t count toward CPU load on Unix. On Linux, it does. Whether this is a better or worse approach depends on your use case. If you’re looking to measure nothing but overall CPU usage, Unix provides a better interface. But if you have a program that’s locked for 30 seconds waiting on a network request, Linux will tell you that. Unix would never surface that data at all.
Optimizing CPU Loads
There’s no silver bullet for optimizing your CPU loads. It’s a process that you need to approach holistically. Much like CPU usage, when you’re on a server you want to be close to 100%, but not over. Sometimes, the process of optimizing your CPU loads is pretty easy. Maybe your server will only ever utilize two threads at the same time. In that case, optimizing server loads is no more complicated than having a dual-core processor. You may need a faster processor to make sure those programs complete in time, but you’ll never need more than two cores.
Most systems don’t actually limit themselves to exactly two active threads, though. Most servers are much more complicated. And one of the real challenges is keeping the number of cores as low as possible, because that saves you money, without going too low and degrading user experiences. It could be that you need to optimize the code your server runs. Sometimes, you’ll need to determine how much program time is consumed by waiting on things like database queries. It could be that you really do need to up the core count on your processor. Bottlenecked servers can have a number of causes, and can be fixed in a number of ways.
How’s Your CPU Load?
Monitoring and optimizing your CPU load is an important part of getting the most for your server memory. Unfortunately, it’s also a time-consuming, manual process. As noted, most Linux commands only provide CPU load for 15-minute windows. That’s a good thing; you wouldn’t want windows much wider than that. You’d have a difficult time getting any useful information out of them. However, keeping an eye on CPU load over rolling 15-minute periods is kind of a headache.
Fortunately, there are good tools to monitor these kinds of things for you. For instance, Scalyr can monitor CPU load just by ingesting some logs. You won’t just get a simple numerical printout. Instead, you’ll get a nice visual that shows your overall CPU load over time. This can be especially useful, since it helps you pinpoint specific windows that are the most demanding on your CPU. Optimizing your server’s performance means understanding it in detail, and not just at a high level.
Part of being a good systems administrator means understanding your server from every direction. Knowing how to interpret CPU load and how to optimize it is a key part of maximizing your company’s server investment. Before you can do that, you need to understand what your server’s CPU load actually is. There’s no better day than today to get started monitoring your server CPU load. You can start formulating a plan for optimization today.
This post was written by Eric Boersma. Eric is a software developer and development manager who’s done everything from IT security in pharmaceuticals to writing intelligence software for the US government to building international development teams for non-profits. He loves to talk about the things he’s learned along the way, and he enjoys listening to and learning from others as well.