How does one analyze 182,000+ lines of Excel looking for correlations? I suppose there may be some Excel magic, but when IP addresses are involved - and I know Excel hasn't had an out-of-the-box IP network sort order, methinks I'm in for a long slog.
The background: I have an export from a customer IP address management tool and we needed to verify the block type assignments were correct. To do this, I needed to understand the top-level aggregates (IP network and mask) for each top-level container (site). There were around 50 aggregates in each of 6 containers. I then needed to verify if the 30,000+ subnetworks in each container did or didn't fall within the 50 associated aggregate blocks.
Where to begin? Looking at, analyzing and deciding on 1 line per second would still take almost 6 and a half working days (8 hours / per day); and 1 per second is a gross underestimate of the actual time required. I needed to automate this somehow.
I had all the data in Excel columns:
[...] | Container | Mask | Type | Name | Network | [...]
The first step was to put them in an order that lends itself to the type of analysis I was going to do. I needed to create a sort order for IP Networks. It's trivial to convert an IP address to a decimal number by multiplying octets by powers of 256. The tricky part is to have Excel do it for you by determining and breaking apart the octets, multiplying and providing the result in a new column. Luckily, the web is littered with such formulas.
With a new column for IP sort order, I could sort first by "Container" then by "IP Sort" within each container sort. This put the 182,000+ lines in an order I could work with:
[...] IP Sort | Container | Mask | Type | Name | Network | [...]
Top-site1 Top-site1 [ ~ 50 of these ] sub-site1 sub-site1 [ ~ 30,000 of these, ordered by IP net ] [...] Top-site2 Top-site2 [ ~ 50 of these ] sub-site2 sub-site2 [ ~ 30,000 of these, ordered by IP net ]
At this point, my Excel expertise was exhausted. This probably required some type of Visual Basic assistance, and if that means we're falling back on scripting, I'm using my go to: Perl. I could have used Perl to access data directly from the Excel, but I needed something quick so a cut and paste into text files was going to do.
First I copied the columns shown above from 'Mask' to 'Network' for each top-level group into files. That gave me 6 files of about 50 lines each. Next, I copied the same columns for the subnetworks into files. That gave me 6 more files with about 30,000 lines each. Getting this data from Excel was pretty easy since the previous sort meant I just needed to highlight consecutive rows and cut-and-paste.
C:\> dir /b *.txt site1-SUB.txt site1-TOP.txt site2-SUB.txt site2-TOP.txt [...]
With my input files ready, I started on my script. First, I'd create an object that would store the data from the columns:
package IPAM; sub new { my $class = shift; my (@p) = @_; my %data; $data{mask} = $p[0]; $data{type} = $p[1]; $data{name} = $p[2]; $data{network} = $p[3]; return bless \%data, $class; } 1;
Next, I'd read my top level input file for a given site, followed by my subnetwork file for that site:
package main; my $nets = readIn( $ARGV[0] ); my $subs = readIn( $ARGV[1] ); sub readIn { my ($file) = @_; my @temp; open my $fh, '<', $file; while (<$fh>) { chomp $_; # Excel data from cut/paste is tab-delimited my ( $mask, $type, $name, $net ) = split /\t/, $_; push @temp, IPAM->new( $mask, $type, $name, $net ); } close $fh; return \@temp; }
Now the "hard" part. I had all my data in, I just needed to know if $subs->{network} was indeed a subnet of $nets->{network} for every given combination. Two loops and some CPAN magic:
use Net::IPv4Addr qw( :all ); for my $net ( @{$nets} ) { for my $sub ( @{$subs} ) { if (ipv4_in_network( $net->{address} . "/" . $net->{mask}, $sub->{address} . "/" . $sub->{mask} ) ) { print $net->{address} . "/" . $net->{mask} . " [" . $net->{blocktype} . "] - " . $sub->{address} . "/" . $sub->{mask} . "\n"; } } }
All that was left to do was run the script. Even that part was automated:
C:\> for /f "delims=-" %i in ('dir /b *.txt') do @ More? echo %i && chksubip.pl %i-TOP.txt %i-SUB.txt
Wow - 6.5 days of estimated work reduced to 20 minutes of script writing and 2 minutes for the script to loop over all that data. I have my life back!
No comments :
Post a Comment