PHP floats and locales

I recently had a bug report in JFFNMS that the SLA checks were failing with bizarre calculations.  Things like 300% disk drive utilization and the like.  Briefly, JFFNMS is written in PHP and checks values that come out of rrdtool and makes various comparisons like have you used more than 80% of your disk or have there been too many errors.

The logs showed strange input variables coming in, all were integers below 10.  I don’t know of many 1 or 3 kB sized disk drives. What was going on?  I ran a rrdtool fetch command on the relevant file and got output of something like 1,780000e+07 which for an 18GB drive seemed ok. Notice the comma, in this locale that’s a decimal point… hmm.

In lib/ there is this line around the rrdtool_fetch area:

$value[] = ($line1[$i]=="nan")?0:(float)$line1[$i];

A quick check and I was finding that my 1,7…e+07 was coming back as 1.  We had a float conversion problem.  Or more specifically, php has a float conversion problem.  I built a small check script like the following:

$linfo = localeconv();
print "Decimal is "$linfo[decimal_point]". Pi is $pi and ".(float)($pi)."n";
print "Half is ".(1/2)."n";

Which gave the output of:

Decimal is “,”. Pi is 3,14 and 3

Half is 0,5

So… PHP is saying that decimal point is a comma and it uses it BUT if a string comes in with a comma, its not a decimal point. Really?? Are they serious here?  I tried various combinations and could not make it parse correctly.

The fix was made easier for me because I know rrdtool fetch only outputs values in scientific notation. That means if there is a string with a comma, then it must be a decimal point as it could never be used for a thousands mark.  By using str_replace to replace any comma with a period the code worked again and didn’t even need the locale to be set correctly, or that the locale for the shell where rrdtool is run is the same as the locale in php.

JFFNMS 0.9.3 1st release candidate

I have been putting a lot of testing into JFFNMS lately.  I have been very lucky to have had someone with the time and patience to try out various sub versions and give me access to their results.

The end-result of all this testing is a much, much less buggy JFFNMS.  There have been a strack of problems with caching results, for example, where status would not be updated or even worse the status of one device impacted on another.

The poller parent scheduler had a problem too where it would almost always sit in the first child starving the others of work which slowed things down. The scheduler now is a lot fairer across the children giving a speed up. I’ve heard speed-ups of 15x for this one change alone.

I also had a curious bug where if a device was set to not gather state it still did and created events but not alerts.  This meant your event table was spammed with down interface alerts even on interface you know are down and you turned state checking off.  0.9.3 now does it the right way.

The first RC is now uploaded and can be found at to try out.

I’m a little worried that the pollers now run too fast and could overwhelm the usually crummy control stack found in network devices for parsing SNMP.  I’m interested to hear how people find it.

