Year: 2012

  • pam bugs hurt

    I did some upgrades of what seemed like a million packages today on my Debian sid computer.  I was doing this remotely and when I tried to ssh back in I got kicked out after entering my password, hmmm.

    OK, so I waited until I could get in front of it and tried to login to the console and was greeted with the message “Cannot make/remove an entry” and kicked out, even as root; double hmmm.

    So it was boot into single user time which, thankfully worked. bash loaded ok so it wasn’t a shell problem.  However su failed which means we are into pam problem territory.  I found a bug report that sounded like my problem, but with no clear solution except upgrade in Ubuntu.

     

    The solution was pretty simple for me.  It’s something funky with /etc/pam.d/common-sesssion and a simple ‘/usr/sbin/pam-auth-update’ fixed it for me.

     

     

    Enhanced by Zemanta
  • JFFNMS 0.9.3

    JFFNMS version 0.9.3 has been released today.  This is a vast improvement over the 0.9.x releases and anyone using that train is strongly recommended to upgrade.So what changed? What didn’t change!  A nice summary would be fixing a lot of things that were broken or needed some tweaking. A really, really big thanks to Marek for all the testing and bug reports and also patient “just run this and tell me what it says” tests he did too.  If something wasn’t right before and works now, it is quite likely it is working because Marek told me how it broke.

    A brief overview of what has changed:

    • TFTP transfers work again
    • A lot of the wierd polling effects due to caching fixed
    • Lots of the selects in sub-tables now work
    • The PHP string-to-float brokeness in SLAs worked-around
    • Even more SNMP library cruft removed or escaped
    • HostMIB apps match properly
    • Interface autodiscovery delete and update fields back working

    You can download the file off sourceforge at

    https://sourceforge.net/projects/jffnms/files/JFFNMS%20Releases/

    Enhanced by Zemanta
  • procps-ng 3.3.3 released

    top
    Top in colour mode

    This weekend procps-ng version 3.3.3 was tagged and released for distribution.  There have been many patches and fixes involved in this release as we move from an unchanging static sort of code into something that is easier to maintain and build on various architectures.  The good thing is I’m down to 1 or 2 patches in the Debian archive which is a big change from the 30 or 40-odd I used to carry.  For the sole metric of getting that number down, the project fork has been a success.

     

    There were some post-release bugs I found and these were more to do with the various options turned on or off rather than what you’d see if you did the basic ./configure && make.  One of them was how the version numbers are defined in git, but would only appear if certain files were older than others (such as aclocal.m4 versus config.h.in)  Others were when certain features were turned on.  The make check doesn’t see all of this because it uses the default configure flags.

    One annoying thing of the autotools is conditionally installing man pages. This is where you don’t or cannot compile a binary so you don’t install the corresponding man page.  The automake documentation is of course obscure about this but I cannot see a way of distributing only a man page, so we have this fiddle where a file goes into dist_man_MANS or EXTRA_DIST depending.

    Interestingly, there has been some bike-shedding around Fedora-land (see the link below) regarding the name procps-ng versus procps.  Debian is lucky that we do have different upstream and package names (though not ideal) so apt-get install procps still gives you procps.  There also has been discussion about merging procps-ng into util-linux whichin the short-term won’t be happening.

    Enhanced by Zemanta
  • PHP floats and locales

    I recently had a bug report in JFFNMS that the SLA checks were failing with bizarre calculations.  Things like 300% disk drive utilization and the like.  Briefly, JFFNMS is written in PHP and checks values that come out of rrdtool and makes various comparisons like have you used more than 80% of your disk or have there been too many errors.

    The logs showed strange input variables coming in, all were integers below 10.  I don’t know of many 1 or 3 kB sized disk drives. What was going on?  I ran a rrdtool fetch command on the relevant file and got output of something like 1,780000e+07 which for an 18GB drive seemed ok. Notice the comma, in this locale that’s a decimal point… hmm.

    In lib/api.rrdtool.inc.php there is this line around the rrdtool_fetch area:

    $value[] = ($line1[$i]=="nan")?0:(float)$line1[$i];

    A quick check and I was finding that my 1,7…e+07 was coming back as 1.  We had a float conversion problem.  Or more specifically, php has a float conversion problem.  I built a small check script like the following:

    setlocale(LC_NUMERIC,'pl_PL.UTF-8');
    $linfo = localeconv();
    $pi='3,14';
    print "Decimal is "$linfo[decimal_point]". Pi is $pi and ".(float)($pi)."n";
    print "Half is ".(1/2)."n";

    Which gave the output of:

    Decimal is “,”. Pi is 3,14 and 3

    Half is 0,5

    So… PHP is saying that decimal point is a comma and it uses it BUT if a string comes in with a comma, its not a decimal point. Really?? Are they serious here?  I tried various combinations and could not make it parse correctly.

    The fix was made easier for me because I know rrdtool fetch only outputs values in scientific notation. That means if there is a string with a comma, then it must be a decimal point as it could never be used for a thousands mark.  By using str_replace to replace any comma with a period the code worked again and didn’t even need the locale to be set correctly, or that the locale for the shell where rrdtool is run is the same as the locale in php.

    Enhanced by Zemanta
  • JFFNMS 0.9.3 1st release candidate

    I have been putting a lot of testing into JFFNMS lately.  I have been very lucky to have had someone with the time and patience to try out various sub versions and give me access to their results.

    The end-result of all this testing is a much, much less buggy JFFNMS.  There have been a strack of problems with caching results, for example, where status would not be updated or even worse the status of one device impacted on another.

    The poller parent scheduler had a problem too where it would almost always sit in the first child starving the others of work which slowed things down. The scheduler now is a lot fairer across the children giving a speed up. I’ve heard speed-ups of 15x for this one change alone.

    I also had a curious bug where if a device was set to not gather state it still did and created events but not alerts.  This meant your event table was spammed with down interface alerts even on interface you know are down and you turned state checking off.  0.9.3 now does it the right way.

    The first RC is now uploaded and can be found at https://sourceforge.net/projects/jffnms/files/jffnms%20RC/ to try out.

    I’m a little worried that the pollers now run too fast and could overwhelm the usually crummy control stack found in network devices for parsing SNMP.  I’m interested to hear how people find it.

    Enhanced by Zemanta
  • Back Online

    I’d have to say (in fact I often do) that Telstra sucks.  This is the very large anonymous telco that supposedly provides a telecommunications service.  Alternatively they also force me to have a phone line because it is far cheaper to buy a line and phone service instead of just a line.

    My website, email and pretty much everything else has been offline for almost exactly 2 days and this being the third time in 8 days it has happened.  The service interruption impacts at least a large chunk if not all of the suburb I live in so you’d think the second time they’d look into it more, but apparently not.  They don’t even have technical staff oncall on Sundays either apparently.

    The weird thing about when I called up was they didn’t know about the fault. If it was a single line that would be fine but a large bit of an exchange go and they don’t know about it?  It seems a little too casual to me as it wouldn’t be that difficult to have some sort of alarming.  I just hope that perhaps someone knew but their call centre doesn’t get that information pushed down to it.

    I have a few sponsored Debian packages that now need working on and uploading so that’s what I’ll be getting into once I get some more time.  procps is almost ready as well with the final minor patches to be integrated.

     

    Enhanced by Zemanta
  • JFFNMS 0.9.2 Released

    JFFNMS Interfaces and Events

    JFFNMS version 0.9.2 was released today both as an upstream tar.gz file and a new Debian package.  This version fixes some bugs including making sure it works with PHP5.4.

    The biggest change in PHP 5.4 is that you can no longer call by reference.  Previously you could call a function like myfunc(&blah); which would send a pointer to blah and not the item itself. Now the function definition needs to define what it wants rather than change it each time.

     

    Enhanced by Zemanta
  • psmisc 22.16 Released

    psmisc version 22.16 was released today.  It is a bugfix release that bascially fixes a problem around strings in C.  Process name lengths are only supposed to be 16 characters long, so a 17 bye buffer is ok; until you have processes with brackets which means the string is 18 characters.

    The next wrinkle is that at times the brackets are stripped out so matches fail because the lengths don’t quite line up. You’ll see this with the Debian 22.15-2 version of psmisc where killall won’t find long-named processes.

    So, 22.16 fixes all that.

    Test Processes

    It really shows that psmisc needs a set of tests like procps has already. The difficulty with both is that its not simple in the DejaGNU framework to make test processes. These are not the programs within the package but other processes that the programs can work on.  There really needs to be an equivalent to touch for processes just for this sort of thing.  Creating processes is rather simple, but ensuring they go away is the tricky part, or they die with certain signals.

    Enhanced by Zemanta
  • VMware Player on Debian

    For various reasons, having vmware running on my desktop would be kind of useful.  VMware provide a Free (as in beer) version of their software called VMware Player. I downloaded the file VMware-Player-4.0.2-591240.x86_64.bundle off their website and tried to build it.

    It failed to build. Given my previous lack of success with VMware server, I wasn’t too surprised.  What was surprising was it wasn’t too hard to fix it.  The problem was that the vmnet module would not compile and that was due to three things:

    • net device ops no longer has set_multicast_list (in netif.c)
    • the linux module header needs to be included to define THIS_MODULE
    • skb_frag_t has been redefined and needs an adjustment

    The patch is only a few lines and means I can compile vmware on my Debian sid computer running kernel 3.2.0-1

    vmnet.patch

    To use it, you will need to find where the modules are built, for me it is /usr/lib/vmware/modules/source

    1. mv vmnet.tar vmnet2.tar
    2. tar xf vmnet2.tar
    3. patch -p0 < vmnet.patch
    4. tar cf vmnet.tar vmnet-only

    With that you can run the player which will try to build the modules and you’re done!

     

     

    Enhanced by Zemanta
  • Unlucky sometimes

    Sometimes life throws little curves at you to see if you are still awake, today has been one of those days.

    fglrx is (apparently) fixed

    I’ve had a long-running problem with fglrx on my laptop.  The problem stems from ATI closed-source drivers with one of those laptops that has an ATI and Intel driver. It means I am basically using the slow Intel chip only.  This morning I had enough and backed up my home and started to rebuild the laptop with Debian 6.0.3.

    So I kicked off the very very slow process of reformatting the crypto drive (it has taken 5 hours and still going) let it gurgle on its merry way and started to read my email.  One of the  emails was that my bug about fglrx not working is closed, apparently it is fixed.  If I had read that 10 minutes earlier, a simple ‘apt-get install fglrx-driver‘ would of perhaps fixed it; oh well.

    My problem is now is do I move to the latest driver and hope their fix is my fix or leave it with some ancient version?  My preference is the former; I only hope it works!

    psmisc 22.15 and buffer overflows

    psmisc has a program called pstree which prints the set of processes in a tree fashion.  It hasn’t changed much for quite a while.  I released version 22.15 and the Debian package 22.15-1.  22.15-1 I also adopted the harden CFLAGS as suggested for procps.

    I was a little surprised that I received an important bug.  The report was saying I had a buffer overflow introduced in 22.15-1, but no relevant code had changed.  The compiler options had done their job and stopped a buffer being overflowed.

    But where exactly was the overflow?  Running gdb on pstree quickly showed that it was line 267 of pstree.c which uses strcpy().  That function set off warning bells. The relevant code is:

        PROC *new;
    
        if (!(new = malloc(sizeof(PROC)))) {
            perror("malloc");
            exit(1);
        }
        strcpy(new->comm, comm);

     

    Now comm is the short command name you find in /proc//stat.  It is fixed in the kernel at 16 characters.  The PROC structure has this field as 17 characters long, one extra for the NUL.  I went and checked the Linux source and yes, it is still 16 characters long.  The clue was in the name of the program that it died on.

    #6  new_proc (comm=0x6111b0 "{console-kit-dae}", pid=1571, uid=0)
        at pstree.c:267

     

    That string is 17 characters long. The problem is that 16 characters is for the name only. If the name is in brackets or braces, then that 16 character limit doesn’t apply.  The buffer overflow bug has been there for a long time, but only with the compiler flags did it become visible.

    Given you need to read names out of the /proc filesystem and if someone can fiddle with that you have bigger problems it doesn’t seem to be too much of an issue.  It should be (and is in Debian 22.15-2) fixed but is a nice example of the compiler catching bad things.

     

    Enhanced by Zemanta