Tuesday, 26 May 2009

Python ipaddr performance

Last weekend while I was cleaning up my IP address summarization script (I added a setup.py, created a Cheese Shop entry and a downloadable archive) I had a look at the state of IP address manipulation in Python and found a new module called ipaddr. What sparked my interest in this module was that it had already been integrated into upcoming Python 2.7 and 3.1 as a standard library.

It seemed to contain all the functionality of IPy but with much cleaner code. It also has a function called collapse_address_list() that is similar to my summarize function but it can handle a list of non-contiguous networks and IP addresses and collapse them down.

When I wrote my summarize() function I spent quite a while optimizing the algorithm so that it can handle large address ranges in a reasonable amount of time. For example, on my development box it can summarize the two worst case ranges: IPv4 to in 0.1s and IPv6 :: to ffff:ffff:ffff:ffff:ffff:ffff:ffff:fffe in 0.2s.

I had to see how well ipaddr performed so I wrote a little benchmarking script that summarized a /24. The results were very poor with it taking 64.27 seconds to perform 1000 runs. I then ported my summarize() code to use ipaddr and modified collapse_address_list() to use it. The result was a 50 times improvement in performance (1000 runs took 1.21 seconds).

ipaddr is developed by Google and the coders use Reitveld for reviewing patches. So I've submitted my patches there. Feel free to review and comment on the code.

Here's hoping that they are accepted and benefit everyone by making it into the next release(s) of Python!

1 comment:

scholarlysunrise said...

Nice work Matt.

Hope your patch is robust and gets accepted.