Percent CPU for processes

The ps program gives a snapshot of the processes running on your Unix-like system. On most Linux installations, this will be the ps program from the procps project.

While you can get a lot of information from the tool, a lot of the fields need further explanation or can give “wrong” or confusing information; or putting it another way, they provide the right information that looks wrong.

One of these confusing fields is the %CPU or pcpu field. You can see this as the third field with the ps aux command. You only really need the u option to see it, but ps aux is a pretty common invokation.

More than 100%?

This post was inspired by procps issue 186 where the submitter said that the sum of the processes cannot be more than the number of CPUs times 100%. If you have 1 CPU then the sum of %CPU for all processes should be 100% or less, have 16 CPUs then 1600% is your maximum number.

Some people reason for the oddity of over 100% CPU as some rounding thing gone wrong and at first I did think that; except I know we get a lot of reports about comparing the top header CPU load vs process load not lining up and its because “they’re different”.

The trick here, is ps is reporting a percentage of what? Or, perhaps to give a better clue, a percentage of when?

PCPU Calculations

So to get to the bottom of this, let’s look at the relevant code. In ps/output.c we have a function pr_pcpu that prints the percent CPU. The relevant lines are:

  total_time = pp->utime + pp->stime;
  if(include_dead_children) total_time += (pp->cutime + pp->cstime);
  seconds = cook_etime(pp);
  if(seconds) pcpu = (total_time * 1000ULL / Hertz) / seconds;

OK, ignoring the include _dead_time line (you get this from the S option and means you include the time this process waited for its children processes) and the scaling (process times are in Jiffies, we have the CPU as 0 to 999 for reasons) you can reduce this down to.

%CPU = ( Tutime + Tstime ) / Tetime

So we find the amount of time the CPU(s) have been busy either in userland or the system, add them together, then divide the sum by the total time. The utime and stime increment like a car’s odometer. So if a process uses one Jiffy of CPU time in userland, that counter goes to 1. If it does it again a few seconds later, then that counter goes to 2.

To give an example, if a process has run for ten seconds and within those ten seconds the CPU has been busy in userland for that process, then we get 10/10 = 100% which makes sense.

Not all Start times are the same

Let’s take another example, a process still consumes ten seconds CPU time but been running for twenty seconds, the answer is 10/20 or 50%. With our single CPU example system both of these cannot be running at the same time otherwise we have 150% CPU utilisation which is not possible.

However, let’s adjust this slightly. We have assumed uniform utilisation. But take the following scenario:

  • At time T: Process P1 starts and uses 100% CPU
  • At time T+10 seconds: Process P1 stops using CPU but still runs, perhaps waiting for I/O or sleeping.
  • Also at time T+10 seconds: Process P2 starts and uses 100% CPU
  • At time T+20 we run the ps command and look at the %CPU column

The output for ps -o times,etimes,pcpu,comm would look something like:

    TIME ELAPSED %CPU COMMAND
      10      20   50 P1
      10      10  100 P2

What we will see is P1 has 10/20 or 50% CPU and P2 has 10/10 or 100% CPU. Add those up, and you have 150% CPU, magic!

The key here is the ELAPSED column. P1 has given you the CPU utilisation across 20 seconds of system time and P2 the CPU utilisation across only 10 seconds. You directly add them together you get the wrong answer.

What’s the point of %CPU?

Probably the %CPU column gives results that a lot of people are not expecting, so what’s the point of it? Don’t use it to see why the CPU is running hot; you can see above those two processes were working the CPU hard at different times. What it is useful for is to see how “busy” a process is, but be warned its an average. It’s helpful for something that starts busy but if the process idles or hardly uses CPU for a week then goes bananas you won’t see it.

The top program, because a lot of its statistics are deltas from the last refresh, is a much better program for this sort of information about what is happening right now.

The sudo tty bug and procps

There have been recent reports of a security bug in sudo (CVE-2017-1000367) where you can fool sudo into thinking what controlling terminal it is running on to bypass its security checks.  One of the first things I thought of was, is procps vulnerable to the same bug? Sure, it wouldn’t be a security bypass, but it would be a normal sort of bug. A lot of programs  in procps have a concept of a controlling terminal, or the TTY field for either viewing or filtering, could they be fooled into thinking the process had a different controlling terminal?

Was I going to be in the same pickle as the sudo maintainers? The meat between the stat parsing sandwich? Can I find any more puns related somehow to the XKCD comic?

TLDR: No.

Read more The sudo tty bug and procps

procps 3.3.12

The procps developers are happy to announce that version 3.3.12 of procps was released today. This version has a mixture of bug fixes and enhancements. This unfortunately means another API bump but we are hoping this will be fixed with the new library API coming soon.

procps is developed on gitlab and the new version of procps can be found at https://gitlab.com/procps-ng/procps/tree/newlib

procps 3.3.12 can be found at https://gitlab.com/procps-ng/procps/tags/v3.3.12

Read more procps 3.3.12

procps 3.3.11

I have updated NEWS, bumped the API and tagged in git; procps version 3.3.11 is now released!

This release we have fixed many bugs and made procps more robust for those odd corner cases. See the NEWS file for details.  The most significant new feature in this release is the support for LXC containers in both ps and top.

The source files can be found at both sourceforge and gitlab at:

My thanks to the procps co-maintainers, bug reporters and merge/patch authors.

What’s Next?

There has been a large amount of work on the library API. This is not visible to this release as it is on a different git branch called newlib. The library is getting a complete overhaul and will look completely different to the old libproc/libprocps set. A decision hasn’t been made when newlib branch will merge into master, but we will do it once we’re happy the library and its API have settled. This change will be the largest change to procps’ library in its 20-odd year history but will mean the library will use common modern practices for libraries.