pidof is a program that reports the PID of a process that has the given command line. It has an option x which means “scripts too”. The idea behind this is if you have a shell script it will find it. Recently there was an issue raised saying pidof was not finding a shell script. Trying it out, pidof indeed could not find the sample script but found other scripts, what was going on?
I looked at two interesting issues today around the ps program in the procps project. One had a solution and the other I’m puzzled about.
ps User-defined Format
Issue #9 was quite the puzzle. The output of ps changed depending if a different option had a hyphen before it or not.
First, the expected output
$ ps p $$ -o pid=pid,comm=comm pid comm 31612 bash
Next, the unusual output.
$ ps -p $$ -o pid=pid,comm=comm pid,comm=comm 31612
I have updated NEWS, bumped the API and tagged in git; procps version 3.3.11 is now released!
This release we have fixed many bugs and made procps more robust for those odd corner cases. See the NEWS file for details. The most significant new feature in this release is the support for LXC containers in both ps and top.
The source files can be found at both sourceforge and gitlab at:
My thanks to the procps co-maintainers, bug reporters and merge/patch authors.
There has been a large amount of work on the library API. This is not visible to this release as it is on a different git branch called newlib. The library is getting a complete overhaul and will look completely different to the old libproc/libprocps set. A decision hasn’t been made when newlib branch will merge into master, but we will do it once we’re happy the library and its API have settled. This change will be the largest change to procps’ library in its 20-odd year history but will mean the library will use common modern practices for libraries.
I’m getting close to releasing version 3.3.11 of procps. When it gets near that time, I generally browse again the Debian Bug Tracker for procps bugs. Bug number #733758 caught my eye. With the free command if you used the s option before the c option, the s option failed, “seconds argument ‘N’ failed” where N was the number you typed in. The error should be for you trying to type letters for number of seconds. Seemed reasonably simple to test and simple to fix.
Take me to the code
The relevant code looks like this:
case 's': flags |= FREE_REPEAT; args.repeat_interval = (1000000 * strtof(optarg, &endptr)); if (errno || optarg == endptr || (endptr && *endptr)) xerrx(EXIT_FAILURE, _("seconds argument `%s' failed"), optarg);
Seems pretty stock-standard sort of function. Use strtof() to convert the string into the float.
You need to check both errno AND optarg == endptr because:
- A valid but large float means errno = ERANGE
- A invalid float (e.g. “FOO”) means optarg == endptr
At first I thought the logic was wrong, but tracing through it was fine. I then compiled free using the upstream git source, the program worked fine with s flag with no c flag. Doing a diff between the upstream HEAD and Debian’s 3.3.10 source showed nothing obvious.
I then shifted the upstream git to 3.3.10 too and re-compiled. The Debian source failed, the upstream parsed the s flag fine. I ran diff, no change. I ran md5sum, the hashes matched; what is going on here?
I’ll set when I want
The man page says in the case of under/overflow “ERANGE is stored in errno”. What this means is if there isn’t and under/overflow then errno is NOT set to 0, but its just not set at all. This is quite useful when you have a chain of functions and you just want to know something failed, but don’t care what.
Most of the time, you generally would have a “Have I failed?” test and then check errno for why. A typical example is socket calls where anything less than 0 means failure. You check the return value first and then errno. strtof() is one of those funny ones where most people check errno directly; its simpler than checking for +/- HUGE_VAL. You can see though that there are traps.
What’s the difference?
OK, so a simple errno=0 above the call fixes it, but why would the Debian source tree have this failure and the upstream not? Even with the same code? The difference is how they are compiled.
The upstream compiles free like this:
gcc -std=gnu99 -DHAVE_CONFIG_H -I. -include ./config.h -I./include -DLOCALEDIR=\"/usr/local/share/locale\" -Iproc -g -O2 -MT free.o -MD -MP -MF .deps/free.Tpo -c -o free.o free.c mv -f .deps/free.Tpo .deps/free.Po /bin/bash ./libtool --tag=CC --mode=link gcc -std=gnu99 -Iproc -g -O2 ./proc/libprocps.la -o free free.o strutils.o fileutils.o -ldl libtool: link: gcc -std=gnu99 -Iproc -g -O2 -o .libs/free free.o strutils.o fileutils.o ./proc/.libs/libprocps.so -ldl
While Debian has some hardening flags:
gcc -std=gnu99 -DHAVE_CONFIG_H -I. -include ./config.h -I./include -DLOCALEDIR=\"/usr/share/locale\" -D_FORTIFY_SOURCE=2 -Iproc -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -MT free.o -MD -MP -MF .deps/free.Tpo -c -o free.o free.c mv -f .deps/free.Tpo .deps/free.Po /bin/bash ./libtool --tag=CC --mode=link gcc -std=gnu99 -Iproc -g -O2 -fstack-protector-strong -Wformat -Werror=format-security ./proc/libprocps.la -Wl,-z,relro -o free free.o strutils.o fileutils.o -ldl libtool: link: gcc -std=gnu99 -Iproc -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z -Wl,relro -o .libs/free free.o strutils.o fileutils.o ./proc/.libs/libprocps.so -ldl
It’s not the compiling of free itself that is doing it, but the library. Most likely something that is called before the strtof() is setting errno which this code then falls into. In fact if you run the upstream free linked to the Debian procps library it fails.
Moral of the story is to set errno before the function is called if you are going to depend on it for checking if the function succeeded.
I have previously written about the gitlab CI runners that use docker. Yesterday I made some changes to procps and pushed them to gitlab which would then start the CI. This morning I checked and it said build failed – ok, so that’s not terribly unusual. The output from the runner was:
gitlab-ci-multi-runner 0.3.3 (dbaf96f) Using Docker executor with image csmall/testdebian ... Pulling docker image csmall/testdebian ... Build failed with Error: image csmall/testdebian: not found
Hmm, I know I have that image, it just must be the runner so, let’s see what images I have:
$ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
Now, I know I have images, I had about 10 or so of them, where did they go? I even looked in the /var/lib/docker directories and can see the json configs, what have you done with my images docker?