ps standards and locales

I looked at two interesting issues today around the ps program in the procps project. One had a solution and the other I’m puzzled about.

ps User-defined Format

Issue #9 was quite the puzzle. The output of ps changed depending if a different option had a hyphen before it or not.

First, the expected output

$ ps p $$ -o pid=pid,comm=comm
 pid comm
31612 bash

Next, the unusual output.

$ ps -p $$ -o pid=pid,comm=comm
pid,comm=comm
 31612

The difference being the second we have -p not p.  The second unexpected thing about this is, it was designed that way. Unix98 standard apparently permits this sort of craziness.  To me it is a useless feature that will more likely than not confuse people. Within ps, depending on what sort of flags you start with, you use a sysv parser or a bsd parser. One of them triggered the Unix98 compatibility option while the other did not, hence the change in behavior. The next version of procps will ship with a ps that has the user-defined output format of the first example. I had a look at the latest standard, IEEE 1003.1-2013, doesn’t seem to have anything like that in it.

 

Short Month Length

This one has got me stuck. A user has reported in issue #5 that when they use their locale columns such as start time get mis-aligned because their short month is longer than 3 characters. They also mention some other columns for memory etc are not long enough but that’s a generic problem that is impossible to fix sensibly.

OK, for the month problem the fix would be to know what the month length is and set the column width for those columns to that plus a bit more for the other fixed components, simple really.  Except; how do you know for a specific locale what their short month length is?  I always assumed it was three!  I haven’t found anything that has this information. Note, I’m not looking for strlen() but some function that has the maximum length for short month names (e.g. Jan, Feb, Mar etc).  This also got me thinking how safe some of those date to string functions are if you have a static buffer as the destination. It’s not safe to assume they will be “DD MMM YYYY” because there might be more Ms.

 

So if you know how to work out the short month name length, let me know!

Comments

3 responses to “ps standards and locales”

  1. Karellen Avatar
    Karellen

    The static string buffer wasn’t safe for 3-character strings anyway, even if all the month names are only 3 characters long, because if one of your characters is not in the ASCII codeset, and therefore takes more than one byte to encode (assuming you’re using UTF-8), you’re still hosed.

    You also need to take that into account, as well as zero-width characters (like combining characters and zero-width spaces) and double-width characters, when determining “string width” – which is a totally different thing than “string length”.

    Or, rather, you probably don’t need to take that into account. Rather, you need to be aware of it and use the right library which will take it into account for you. I imagine that ncurses might have something along those lines.

  2. Joachim Breitner Avatar
    Joachim Breitner

    There are only a finite number of months, so if there really is no better way, then you can once calculate the length, by printing out each month into a sensibly large buffer and calculating the maximum.

  3. Marius Gedminas Avatar

    Loop through all 12 months, call strftime() (or was there an nl_langinfo API for this), compute strlen (or perhaps wcswidth()), take the max?

Leave a Reply

Your email address will not be published. Required fields are marked *