Wednesday, 25 November 2009

Package Installation Do's and Don'ts

An important note from Gerry Haskins: Do not apply packages from one Update onto a system installed with a different Update.

Cherry picking packages from a newer Solaris 10 update and installing them on a system running an older update will result in an unsupported configuration and likely lead to system corruption.

Note also that adding a package from the same update means that you will have to re-apply all the patches already on the system to bring the new package to the same level as the rest of the system.

Tuesday, 17 November 2009

Behaviour Driven Infrastructure

I've been following the development of puppet for many years and this gem of a thread caught my attention recently. Martin Englund asks the Puppet Users mailing list:
how do you validate that puppet has done what it is supposed to, and even troublesome, how you validate that it has done what you intended it to do?
This is something I've struggled with over the years with my JASS/SST-based jumpstart build system. I've gone so far as to automate the build testing process using buildbot and pass or fail a build by using regexps to search for errors in the installation process. But my testing ends when the console login prompt appears. Validating whether a system build functions as intended is beyond what I currently test now and even beyond the capabilities of puppet.

This is where Martin's Behaviour Driven Infrastructure (BDI) approach comes into play. Martin is using Cucumber, a Behaviour Driven Development testing tool, to describe a system's behaviour using natural language that is readable and easy to understand by non-technical users (i.e. your IT helpdesk or even your business stakeholders).

Where this really gets interesting is combining puppet, cucumber and a monitoring system such as Nagios to do Test Driven Infrastructure. For example, you can use cucumber-nagios to integrate your cucumber tests with Nagios, then write a test for a new feature you want the system to have, e.g. "Should be able to send email". Initially this would result in Nagios marking the system as having a fault because you haven't yet implemented the feature that passes the test. You then proceed to implement the feature using puppet such that the test passes.

Over time I would imagine the amount of test coverage to grow such that system behaviours like DNS resolution, LDAP authentication, host based firewall policies etc would be tested. Any change to the system that broke one of these tests could be quickly pinpointed and fixed.

A perfect example in the Solaris world is the application of the latest recommended patches to a box. At a minimum it tends to break the sendmail and snmpd configs on my systems and I know to manually backup and restore these files before and after applying the patches. With BDI combined with a monitoring system you could be alerted to these breakages or any others that you weren't previously aware of and rapidly respond to them.

Brilliant!