Tag: jffnms

  • PHP uniqid() not always a unique ID

    For quite some time modern versions of JFFNMS have had a problem. In large installations hosts would randomly appear as down with the reachability interface going red. All other interface types worked, just this one.

    Reachability interfaces are odd, because they call fping or fping6 do to the work. The reason is because to run a ping program you need to have root access to a socket and to do that is far too difficult and scary in PHP which is what JFFNMS is written in.

    To capture the output of fping, the program is executed and the output captured to a temporary file. For my tiny setup this worked fine, for a lot of small setups this was also fine. For larger setups, it was not fine at all. Random failed interfaces and, most bizzarely of all, even though a file disappearing. The program checked for a file to exist and then ran stat in a loop to see if data was there. The file exist check worked but the stat said file not found.

    At first I thought it was some odd load related problem, perhaps the filesystem not being happy and having a file there but not really there. That was, until someone said “Are these numbers supposed to be the same?”

    The numbers he was referring to was the filename id of the temporary file. They were most DEFINITELY not supposed to be the same. They were supposed to be unique. Why were they always unique for me and not for large setups?

    The problem is with the uniqid() function. It is basically a hex representation of the time.  Large setups often have large numbers of child processes for polling devices. As the number of poller children increases, the chance that two child processes start the reachability poll at the same time and have the same uniqid increases. It’s why the problem happened, but not all the time.

    The stat error was another symptom of this bug, what would happen was:

    • Child 1 starts the poll, temp filename abc123
    • Child 2 starts the poll in the same microsecond, temp filename is also abc123
    • Child 1 and 2 wait poller starts, sees that the temp file exists and goes into a loop of stat and wait until there is a result
    • Child 1 finishes, grabs the details, deletes the temporary file
    • Child 2 loops, tries to run stat but finds no file

    Who finishes first is entirely dependent on how quickly the fping returns and that is dependent on how quicky the remote host responds to pings, so its kind of random.

    A minor patch to use tempnam() instead of uniqid() and adding the interface ID in the mix for good measure (no two children will poll the same interface, the parent’s scheduler makes sure of that.) The initial responses is that it is looking good.

     

  • jffnms 0.9.4

    JFFNMS version 0.9.4 was released today, this version fixes some bugs that have recently appeared in previous versions.

    Alarmed Interfaces and Events
    Alarmed Interfaces and Events

    The triggers rules editor had a problem where some of the rules clicked off the triggers would not appear or could not be edited correctly.

    Most of the Admin screens have the ability to sort the rows. This, unfortunately, didn’t sort but the functionality has been restored.

    Most users are probably unaware of this, but the database schema is first created for MySQL and is then converted for PostgreSQL. The conversi0n process is far from ideal and hasn’t worked until this release. More testing is required for PostgreSQL support but it should be a lot better.

     

    Enhanced by Zemanta
  • JFFNMS 0.9.3 1st release candidate

    I have been putting a lot of testing into JFFNMS lately.  I have been very lucky to have had someone with the time and patience to try out various sub versions and give me access to their results.

    The end-result of all this testing is a much, much less buggy JFFNMS.  There have been a strack of problems with caching results, for example, where status would not be updated or even worse the status of one device impacted on another.

    The poller parent scheduler had a problem too where it would almost always sit in the first child starving the others of work which slowed things down. The scheduler now is a lot fairer across the children giving a speed up. I’ve heard speed-ups of 15x for this one change alone.

    I also had a curious bug where if a device was set to not gather state it still did and created events but not alerts.  This meant your event table was spammed with down interface alerts even on interface you know are down and you turned state checking off.  0.9.3 now does it the right way.

    The first RC is now uploaded and can be found at https://sourceforge.net/projects/jffnms/files/jffnms%20RC/ to try out.

    I’m a little worried that the pollers now run too fast and could overwhelm the usually crummy control stack found in network devices for parsing SNMP.  I’m interested to hear how people find it.

    Enhanced by Zemanta
  • JFFNMS 0.9.1 Released

    JFFNMS 0.9.1 has the database extracts and updates missing from 0.9.0  This is the most problematic part of the project release; ensuring that the database updates correctly.

    Version 0.9.1 is functionally the same as 0.9.0, it is just with less bugs!

     

  • JFFNMS – 3rd RC lucky?

    I’ve just uploaded to sourceforge the third and hopefully last RC for JFFNMS network management system version 0.9.0  The reason for the delay was easter as well as I wanted to test the engines for a long while to make sure I was not getting any orphan children or items. Previous versions had processes that never died or if they died the parent didn’t realise and didn’t handle the item, permanently making the item “out for work”.

    PHP5 has much better job and process handling and the new version takes advantage of this handling.  It’s run well on my on test setup for a week or two. You can find the RC code or just the older releases at https://sourceforge.net/projects/jffnms/files/

  • JFFNMS at RC2, ncurses at 5.8

    After some reports back about [JFFNMS](http://www.jffnms.org/) 0.9.0rc1 I have now updated it to rc2. Thanks for all who gave me information about how it worked in YOUR setup.  I cannot be sure but I’d say the second RC will be the last until the release itself.

    Sven has also given me the nod and ncurses 5.8 migrated into unstable.  We’ve had one report that the new version of ncurses might not play well with stfl (see #616711 ) but generally speaking it should work ok.

    Finally, congratulations to the Debian project on [winning two categories at the Linux New Media Awards](http://www.debian.org/News/2011/20110304). It was especially good to hear the presentation by Karsten Gerloff who is president of the Free Software Foundation Europe.

    ## ncurses bug update
    It seems that the ncurses bug is more serious and is to do with newwin() function in the library. If you get crashes when a program starts and its linked to ncurses 5.8 (even if it is not a Debian system) you may have this problem.

    It doesn’t happen to all ncurses programs, as the stfl example code and mutt work ok.

    Y9VW3CNYRFF6

  • JFFNMS and IPv6

    ipv6-google-rrt.png

    One of the many Free Software projects I work on is JFFNMS, which is a network management system written in PHP. In light that the last IPv4 address blocks have now been allocated to APNIC it’s probably timely to look at how to manage network devices in a new IPv6 world.
    First you need to get the basics sorted out and for that it is best to use the net-snmp command line utilities to check all is well. Then its onto what to do in JFFNMS itself.
    Now fixed with proper markup, I hope.

    (more…)

  • JFFNMS 0.8.5 released

    After my usual battle with PHP and database exports, jffnms 0.8.5 is now released. This program is a network management system written in PHP.  The worst part about the whole maintaining process for it would have to be is the release.

    Why is it so difficult to track changes within a database and PHP code? You don’t get that nice compile-time versus run-time error problems and the database is just diabolical to keep up to date with what you have changed.  Someone needs to invent a git for database states!

    Looking around other PHP based programs, I don’t think anyone else has solved this issue. Well, its out there, enjoy it or not and if you have comments about the program let me know.