pidof lost a shell

pidof is a program that reports the PID of a process that has the given command line. It has an option x which means “scripts too”. The idea behind this is if you have a shell script it will find it. Recently there was an issue raised saying pidof was not finding a shell script. Trying it out, pidof indeed could not find the sample script but found other scripts, what was going on?

What is a script?

Seems pretty simple really, a shell script is a text file that is interpreted by a shell program. At the top of the file you have a “hash bang line” which starts with #! and then the name of shell that is going to interpret the text.

When you use the x option, the pidof uses the following code:

          if (task.cmd &&
                    !strncmp(task.cmd, cmd_arg1base, strlen(task.cmd)) &&
                    (!strcmp(program, cmd_arg1base) ||
                    !strcmp(program_base, cmd_arg1) ||
                    !strcmp(program, cmd_arg1)))

What this means if match if the process comm (task.cmd) and the basename (strip the path) of argv[1] match and one of the following:

  • The given name matches the basename of argv[1]; or
  • The basename of the given name matches argv[1]; or
  • The given name matches argv[1]

The Hash Bang Line

Most scripts I have come across start with a line like

#!/bin/sh

Which means use the normal shell (on my system dash) shell interpreter. What was different in the test script had a first line of

#!/usr/bin/env sh

Which means run the program “sh” in a new environment. Could this be the difference? The first type of script has the following procfs files:

$ cat -e /proc/30132/cmdline
/bin/sh^@/tmp/pidofx^@
$ cut -f 2 -d' ' /proc/30132/stat
(pidofx)

The first line picks up argv[1] “/tmp/pidofx” while the second finds comm “pidofx”. The primary matching is satisfied as well as the first dot-point because the basename of argv[1] is “pidofx”.

What about the script that uses env?

$ cat -e /proc/30232/cmdline
bash^@/tmp/pidofx^@
$ cut -f 2 -d' ' /proc/30232/stat
(bash)

The comm “bash” does not match the basename of argv[1] so this process is not found.

How many execve?

So the proc filesystem is reporting the scripts differently depending on the first line, but why? The fields change depending on what process is running and that is dependent on the execve function calls.

A typical script has a single execve call, the strace output shows:

29332 execve("./pidofx", ["./pidofx"], [/* 24 vars */]) = 0

While the env has a few more:

29477 execve("./pidofx", ["./pidofx"], [/* 24 vars */]) = 0
 29477 execve("/usr/local/bin/sh", ["sh", "./pidofx"], [/* 24 vars */]) = -1 ENOENT (No such file or directory)
 29477 execve("/usr/bin/sh", ["sh", "./pidofx"], [/* 24 vars */]) = -1 ENOENT (No such file or directory)
 29477 execve("/bin/sh", ["sh", "./pidofx"], [/* 24 vars */]) = 0

The first execve is the same for both, but then env is called and it goes on its merry way to find sh. After trying /usr/local/bin, /usr/bin it finds sh in /bin and execs this program. Because of there are two successful execve calls, the procfs fields are different.

What Next?

So now the mystery of pidof missing scripts now has a reasonable reason. The problem is, how to fix pidof? There doesn’t seem to be a fix that isn’t a kludge. Hard-coding potential script names seems just evil but there doesn’t seem to be a way to differentiate between a script using env and, say, “vi ./pidofx”.

If you got some ideas, comment below or in the issue on gitlab.