Tuesday, March 05, 2013

Thinking Outside the "Package" for Packets

I responded to a query on a community forum for Perl about creating a DNS update with a spoofed source address. I was ASSURED it was for a contest and the code was what was essential. The contest site recommended scapy, but the poster was looking for a Perl solution. I had the easy answer: Perl Packet Crafter.

The poster went on to detail some code that created the DNS update but when he sent the packet, he couldn't change the source IP address. Obviously. He used Net::DNS as follows:

use strict;
use warnings;
use Net::DNS;

my $update = Net::DNS::Update->new('evil.zz');
$update->push(prerequisite => nxrrset('hacker11.evil.zz. A'));
$update->push(update => rr_add('hacker11.evil.zz. 86400 A 127.0.0.1'));

my $res = Net::DNS::Resolver->new;
$res->nameservers('192.168.200.113');

my $reply = $res->send($update);

The last line sends the update by taking the created DNS data and letting standard socket routines create the Layer 3 header. Perl Packet Crafter (PPC) can create the custom Layer 3 header and with Net::Frame::Layer::DNS (which I wrote), one can easily create the required packet. However, with most of the work done, is it necessary to recode the original script using Net::Frame::Layer::DNS? Turns out ... no.

Because of the layered nature of the Net::Frame suite of modules on which PPC is based, one can easily create any or all layers of a frame with the objects or simply by hand crafting an octet stream. Or ... even using another Perl module that can output the required stream.

For Net::DNS, there is an undocumented sub called make_query_packet() in the Net::DNS::Resolver::Base code that creates a Net::DNS::Packet object on which the data() method can be called to create the necessary octet stream. It may sound complicated, but all it means is replace the last line of code above with:

$dnsdata = $res->make_query_packet($update);

Now, in PPC, you can use the $dnsdata->data call to create the DNS payload in a UDP packet. It looks like the following:

VinsWorldcom@C:\tmp\> ppc.pl -i "Wireless Network Connection"
Welcome to Perl Packet Crafter (PPC)
Copyright (C) Michael Vincent 2012

Wireless Network Connection

ppc> use Net::DNS;
ppc> $update = Net::DNS::Update->new('evil.zz');
ppc> $update->push(prerequisite => nxrrset('hacker11.evil.zz. A'));
ppc> $update->push(update => rr_add('hacker11.evil.zz. 86400 A 127.0.0.1'));
ppc> $res = Net::DNS::Resolver->new;
ppc> $res->nameservers('192.168.200.113');
ppc> $dnsdata = $res->make_query_packet($update);

We've used PPC to enter the original code with the modified last line - instead of send, use the make_query_packet() routine to create the Net::DNS::Packet object. Continuing, we create the packet in PPC:

ppc> $ether = ETHER;
ppc> $ipv4 = IPv4(src=>'1.1.1.1',dst=>'192.168.200.113',protocol=>NF_I+Pv4_PROTOCOL_UDP);
ppc> $udp = UDP(dst=>53,payload=>$dnsdata->data);
ppc> $packet = packet $ether,$ipv4,$udp;

And to be sure before we send it, we can use Net::Frame::Layer::DNS for a nice decode:

ppc> use Net::Frame::Layer::DNS qw(:consts);
ppc> decode $packet;
ETH: dst:55:66:88:78:aa:30  src:c0:c1:c2:08:46:56  type:0x0800
IPv4: version:4  hlen:5  tos:0x00  length:90  id:23417
IPv4: flags:0x00  offset:0  ttl:128  protocol:0x11  checksum:0x53fe
IPv4: src:1.1.1.1  dst:192.168.200.113
UDP: src:50281  dst:53  length:70  checksum:0x5661
DNS: id:5329  qr:0  opcode:5  flags:0x00  rcode:0
DNS: qdCount:1  anCount:1
DNS: nsCount:1  arCount:0
DNS::Question: name:evil.zz
DNS::Question: type:6  class:1
DNS::RR: name:hacker11.[@12(evil.zz)]
DNS::RR: type:1  class:254  ttl:0  rdlength:0
DNS::RR: name:[@25(hacker11.[@12(evil.zz)])]
DNS::RR: type:1  class:1  ttl:86400  rdlength:4
DNS::RR::A: address:127.0.0.1

Job done!

Note this can also be done with other Perl modules like Net::DHCP::Packet with the new() and serialize() calls, for example.

Friday, March 01, 2013

Blog about NOT Logging

A question came across a mailing list I subscribe to about limiting the syslog messages sent from a Cisco router to a syslog server. The question arose since a certain Cisco blade switch has a known bug where it reports the redundant power supply is faulty even though it doesn't have one. The message - sent every 5 minutes - was becoming quite bother to the operations folks since there were 80 such devices all reporting the erroneous error.

The asker had already found the 'logging discriminator ...' command, but couldn't apply it. A quick test in Dynamips and I had the answer for him.

The 'discriminator' option as we applied it looked for a regular expression in the syslog message body and was configured to "drop" the message (not send it to the syslog server). It worked with the following configuration:

logging discriminator NOREPORT msg-body drops "Redundant power supply faulty or in standby mode"
logging host 1.1.1.1 discriminator NOREPORT

Satisfied we had a working fix, it was time for some more investigation.

I've selectively enabled SNMP traps with the 'snmp-server enable traps XXX' commands, but I didn't know it was possible with syslog messages - I never really tried to be honest. In fact, all logging is enabled with a simple command:

logging 192.168.100.254

There are options for which facility or severity to send, but not many options for creative tuning - they're all of certain class or nothing. The 'discriminator' option seemed pretty useful. However ...

The 'discriminator NAME' doesn't work like an access-list where you can add multiple lines. You get one (1) discriminator and you get one (1) time to apply it to the syslog host. So how long can the regular expression be? Not very - as soon as I started to get fancy with the regular expression to block multiple messages, I got errors:

R1(config)#$msg-body drops "((Configured from)|(Interface         ))"
R1(config)#$msg-body drops "((Configured from)|(Interface          ))"
% unmatched ()

With the grouping parenthesis and the logical or vertical bar (pipe), I could only get a maximum of 38 characters. When I tried 39, I started getting the "unmatched" error and looking at the 'show run', my configuration line was truncated at 38 characters:

R1(config)#do sh run | i logg
logging discriminator NOREPORT msg-body drops ((Configured from)|(Interface          )

Notice the last parenthesis is left off (should be two of them). This severely limits the creativity when trying to selectively block syslog messages. There are other alternatives, like 'mnemonic' which will block an entire category of syslog messages by regular expression. So less characters to fit within the 38, but entire classes of messages dropped.

Maybe there's a better way?

Thursday, January 24, 2013

Winsock or Winsuck

In a previous post I talked about updating Netcat to support both IPv4 and IPv6 in a single Windows executable. I added a lot of new features including multicast listener with source specific multicast in IPv4 only.

I added source specific multicast in IPv4 only because the Winsock API did not provide the required structures for IPv6 source specific multicast in a "standard" way. Berkeley sockets did it as you'd expect by adding IPv6 complements for the existing IPv4 functions and structures.

IPv4IPv6
ip_mreq_sourceipv6_mreq_source

Additionally, the address family independent structures and options are also available.

I recently looked at ssmping and found they did source specific multicast for IPv6 on Windows so I decided to look at the source for guidance and revisit my Netcat.

I still irks me that instead of:

    struct ip_mreq_source mreq;
    mreq.imr_multiaddr = *mgroup;

    ...

    mreq.imr_sourceaddr = *rad;

I have to do:

    #define MCAST_JOIN_SOURCE_GROUP         45

    ...

    GROUP_SOURCE_REQ mreq;
    struct sockaddr_in6 g6;
    g6.sin6_addr = *mgroup6;
    memcpy(&mreq.gsr_group, &g6, sizeof(struct sockaddr_in6));

    ...

    g6.sin6_addr = *rad;
    memcpy(&mreq.gsr_source, &g6, sizeof(struct sockaddr_in6));

Moving data back and forth with pointers and addresses, changing the storage structures and those memory copies - argh.

This is why I like Perl. I'm not a programmer by trade. This memory management is hidden from me in Perl and furthermore, Perl code will work across Windows and Linux. Instead in C, I have a bunch of '#ifdef' compiler directives to determine if I'm compiling on Win32 and if I have the proper version. The IPv6 source specific multicast routines aren't available and the address family independent ones are only available on Windows Vista or later (or maybe Windows Server 2008 - the documentation is pretty confusing). Not only do I need separate code for Windows or Linux within those compiler directive branches, I also need legacy and new Windows code if the Windows version is less than Vista.

Ultimately, it works. So I am happy. And since I'm not a "real" programmer and I only do this occasionally, I deal with it - and complain on my blog! Don't even get me started on support for QoS in 'setsockopt()'. How annoying is this for people who program for a living and need to support multiple platforms?

Wednesday, January 23, 2013

Testing DHCPv6: Part 2

In a previous post, I talked about getting a DHCPv6 client on a linux image to use with Qemu and GNS3. That post was mainly focused on documenting the steps to get the DHCPv6 client on the host and ultimately working. I neglected to talk about the DHCPv6 server that I configured on the Cisco router.

Of course, I was just bit by a minor configuration miss that I learned in that exercise and quickly forgot until gently reminded as I watched my DHCPv6 simulation NOT work.

The DHCPv6 server configuration on the router is pretty simple:

ipv6 dhcp pool DHCPv6
 address prefix 2001:DB8:A64:6800::/64
 dns-server 2001:4860:4860::8888
 domain-name dynamips.com

But what I forgot was that SLAAC is mandatory for IPv6 nodes and will work as long as router advertisements are present. So simply setting the M-bit in the router advertisements to force DHCPv6 is not enough. You need to NOT advertise the IPv6 prefix on the interface.

So the interface configuration looks like:

interface FastEthernet2/0
 ipv6 address 2001:DB8:A64:6800::1/64
 ipv6 enable
 ipv6 nd prefix 2001:DB8:A64:6800::/64 no-advertise
 ipv6 nd managed-config-flag
 no ipv6 redirects
 no ipv6 unreachables
 ipv6 dhcp relay destination 2001:DB8:A01:100::1

In the above example, I'm sending the DHCPv6 requests to the DHCPv6 server running on another Cisco router. I've set the M-bit and I've also stopped router advertisements of the prefix with the 'ipv6 nd prefix 2001:DB8:A64:6800::/64 no-advertise' command.

And now it works!

Thursday, December 20, 2012

Multicast IPv6 Anycast RP Design

Like most folks, I heard of multicast but never configured it. That changed back in 2006 when we deployed music on hold over multicast for our Cisco voice infrastructure and also installed VBrick IP television which required multicast. I did the research and created a PIM sparse mode design based on the Cisco Solution Reference Network Design for campus networks.

We used anycast rendezvous points (RP) peered with Multicast Source Discovery Protocol (MSDP). We used a site local multicast scope and properly filtered on the edge. We peered with the multicast RPs in our headquarters site and passed organization local scope groups. Overall, we had a pretty robust, fault tolerant multicast design.

I did more multicast work at my next client through 2011 including dense-mode and simplified multicast forwarding (SMF), custom code for Internet Group Management Protocol (IGMP) joins for multicast on mobile ad-hoc networks (MANET) and lots of other unique designs.

Of course, this was all on IPv4.

Multicast for IPv6 is a different beast and although I haven't seen a requirement to deploy it for a client yet, I wanted to do some testing to get up to speed before the need manifests. I was fortunate to attend the Cisco IPv6 Fundamentals, Design and Deployment class last week. Although much was review for me given my hands-on IPv6 experience over the last 4 years, the multicast module and lab was very informative.

I understood my IPv4 multicast reference design would not map to IPv6. While IPv6 supports anycast RP, there is no MSDP for IPv6. Some (very basic) background:

In a Protocol Independent Multicast Sparse-Mode (PIM-SM) design, multicast sources send their multicast streams to the Rendezvous Point (RP). Multicast listeners are directed to the RP to make the connection and start receiving traffic. To make the design redundant and fault tolerant, we use anycast RP. Anycast is the practice of configuring the same unicast address on multiple devices. Traffic is routed towards the closest instance of the address based on routing protocol metrics. This of course can result in multicast sources sending traffic to one anycast RP instance and multicast listeners being routed to a different anycast RP instance. No connection would be made in this case.

To overcome this issue, we use Multicast Source Discovery Protocol (MSDP) to peer the anycast RPs so they share information about the multicast sources. This way, no matter which instance of the anycast RP a listener connects to, it will be able to establish the stream from the multicast source and start receiving traffic.

With the above explanation, the absence of MSDP in IPv6 is a problem with an anycast RP design. Foregoing anycast RP and just using a single RP does not provide fault tolerance should the RP go down. What to do?

RFC 4610 addresses this issue and Cisco implements it in IOS 15 and other flavors like XE. However, I learned in class that Cisco had another approach they called "Anycast RP with prefix arbitration".

The concept is simple using standard routing rules of longest match prefix. Simply configure the anycast address on multiple devices with different masks. Instead of equal cost multipath routing in redundant environments, all traffic will be routed to the anycast instance with the longest prefix (anycast RP primary). Should that RP go down, all traffic will be routed to the anycast instance with the second longest prefix (anycast RP secondary), and so on (anycast RP tertiary ...). This is very similar to "floating static routes"; static routes with a manually configured admin distance to bring up a BRI interface when the primary frame-relay goes down (remember 1990's).

  • Configure primary anycast RP with longest prefix
  • Configure secondary anycast RP with second longest prefix
  • Configure tertiary anycast RP with third longest prefix
  • And so on ...
  • Advertise the anycast network from each device via routing protocol

This eliminates the need for MSDP to peer the anycast RPs since all traffic - both sources and listeners - will be routed to the same anycast RP instance. Or will it?

Consider an instance where multicast sources and / or listeners are directly connected to the devices which host the primary and secondary anycast RPs. Connected routes override any longest prefix match since they are connected. So there is a possibility where listeners won't be able to find sources. And that's what happened to me when I finished the documented multicast lab and decided to test this design.

The lab was only two routers with a client connected off each, so it's pretty obvious why it failed. I decided to test a more real-world scenario. Consider the following:

The relevant configurations:

C1#show run interface Loopback0
interface Loopback0
 description Anycast RP (Primary)
 no ip address
 ipv6 address 2001:DB8:AFE:FE00::1/120
 ipv6 enable
 ipv6 eigrp 1
end

C2#show run interface Loopback0
interface Loopback0
 description Anycast RP (Secondary)
 no ip address
 ipv6 address 2001:DB8:AFE:FE00::1/119
 ipv6 enable
 ipv6 eigrp 1
end

With anycast RP primary configured on C1 (/120 mask) and the anycast RP secondary configured on C2 (/119 mask), we expect all traffic to be routed to C1. Indeed it is from both D1 and D2:

D1#show ipv6 route 2001:db8:afe:fe00::1
Routing entry for 2001:DB8:AFE:FE00::/120
  Known via "eigrp 1", distance 90, metric 156160, type internal
  Route count is 1/1, share count 0
  Routing paths:
    FE80::C804:11FF:FE44:1C, FastEthernet1/0
      Last updated 00:00:12 ago

D2#show ipv6 route 2001:DB8:AFE:FE00::1
Routing entry for 2001:DB8:AFE:FE00::/120
  Known via "eigrp 1", distance 90, metric 156160, type internal
  Route count is 1/1, share count 0
  Routing paths:
    FE80::C804:11FF:FE44:1D, FastEthernet1/0
      Last updated 00:03:05 ago

"FastEthernet1/0" is the telltale that D1 and D2 will send their traffic for the anycast RP to C1 (see diagram above). When we shutdown the Loopback0 interface (anycast RP primary) on C1, traffic fails over:

D1#show ipv6 route 2001:db8:afe:fe00::1
Routing entry for 2001:DB8:AFE:FE00::/119
  Known via "eigrp 1", distance 90, metric 156160, type internal
  Route count is 1/1, share count 0
  Routing paths:
    FE80::C805:11FF:FE44:1C, FastEthernet1/1
      Last updated 00:03:25 ago

D2#show ipv6 route 2001:DB8:AFE:FE00::1
Routing entry for 2001:DB8:AFE:FE00::/119
  Known via "eigrp 1", distance 90, metric 156160, type internal
  Route count is 1/1, share count 0
  Routing paths:
    FE80::C805:11FF:FE44:1D, FastEthernet1/1
      Last updated 00:03:27 ago

So the solution works! But what routes do C1 and C2 see? When operating in steady state (each anycast RP interface on C1 and C2 is up), all routing doesn't point to C1:

C1#show ipv6 route 2001:DB8:AFE:FE00::1
Routing entry for 2001:DB8:AFE:FE00::1/128
  Known via "connected", distance 0, metric 0, type receive
  Route count is 1/1, share count 0
  Routing paths:
    receive via Loopback0
      Last updated 00:00:28 ago

C2#show ipv6 route 2001:DB8:AFE:FE00::1
Routing entry for 2001:DB8:AFE:FE00::1/128
  Known via "connected", distance 0, metric 0, type receive
  Route count is 1/1, share count 0
  Routing paths:
    receive via Loopback0
      Last updated 00:19:13 ago

Notice C1 and C2 each see the anycast RP address as their local Loopback0 interface. In the case of C1, that's correct. In the case of C2, it's correct (according to routing rules), but not desired (according to our anycast RP with prefix arbitration design). There isn't a way to "fix" it as it isn't broken, it just highlights a design constraint on the "Anycast RP with prefix arbitration" design:

  • Never directly connect multicast sources or listeners to a device acting as an anycast RP

This may not be a problem in the design I tested as most people won't connect end stations to the core layer. But consider collapsed core designs or instances where the RPs may be configured on distribution layer switches that connect servers which are multicast sources.

"Anycast RP with prefix arbitration" is a pretty easy, straight forward design, but like anything new, test first and understand the limitations.

Tuesday, December 18, 2012

IPv6 TFTP from IPv4

No, the title doesn't imply a great new translation technology for that indispensable file transfer protocol TFTP. Instead, this is to highlight an "oversight" - I won't go so far as to call it a "bug" - in Cisco IOS.

I'm testing with version Advanced IP Services 12.4 (24)T on a 7200 series router - just in case that matters.

For many services on Cisco routers and switches, I've been using the "source interface" command to explicitly tell the device what address to source the updates from. Normally, I point it to a loopback interface. This makes looking at logs pretty easy when DNS resolves the loopback address to the device name.

So for example:

ip flow-export source Loopback0
logging source-interface Loopback0
snmp-server trap-source Loopback0
snmp-server source-interface informs Loopback0

In most cases, we'll even use "update-source LoopbackX" for iBGP neighbors.

This makes looking at a Syslog and SNMP Trap aggregator easy. As long as they resolve addresses to names, I see content like:

Router1  Informational  Local7  Interface FastEthernet0/0 up
SwitchA  Emergency      Local6  Power supply 1 down  

Instead of:

10.254.254.1  Informational  Local7  Interface FastEthernet0/0 up
10.254.254.2  Emergency      Local6  Power supply 1 down  

Now that we've shown why this is good practice, I'll also add that we track nightly TFTP backups of configurations in TFTP logs and the same principle applies. So we use the 'ip tftp source-interface Loopback0' command. Notice however that all previous commands don't start with 'ip', the TFTP 'source-interface' command does. Big deal? With IPv6 it turns out ... YES, it is.

Granted the backup routine tested connected to the devices via IPv4 and requested a TFTP backup via SNMP to the IPv4 address of the TFTP server - so we didn't lose a night's worth of backups and wake up to an error log. The benefits of testing first! However, with IPv6 enabled and an IPv6 address on the Loopback0 interface, IPv6 TFTP should work. And in the test, it didn't.

Here's the relevant configuration:

ip tftp source-interface Loopback0

interface Loopback0
 ip address 10.254.254.1 255.255.255.255
 ipv6 address 2001:DB8:AFE:FE00::1/128
 ipv6 enable

interface FastEthernet2/0
 description To TFTP Server
 ip address 192.168.100.1 255.255.255.0
 ipv6 address 2001:DB8:192:168::1/64
 ipv6 enable

The TFTP server in the test lives at:

192.168.100.254
2001:db8:192:168::254

Again, IPv4 TFTP worked as expected. The Loopback0 address (10.254.254.1) shows in the TFTP logs. But with IPv6, something strange happened:

R1#copy run tftp
Address or name of remote host []? 2001:db8:192:168::254
Destination filename [r1-confg]?
.....
%Error opening tftp://2001:db8:192:168::254/r1-confg (Timed out)
R1#

And the resultant TFTP server log shows:

TFTP# crapps.pl -S tftpd -6
Starting MODE       -> TFTP Server
Listening on        -> [::]:69 (udp)
TFTP Root directory -> .

afe:fe01::  62506  WRQ  OCTET  r1-confg  STARTED
afe:fe01::  62506  WRQ  OCTET  r1-confg  STARTED
afe:fe01::  62506  WRQ  OCTET  r1-confg  File './r1-confg' already exists
afe:fe01::  62506  WRQ  OCTET  r1-confg  Timeout occurred on DATA packet 1
...

Who the heck is "afe:fe01::"? My Loopback0 IPv6 address is "2001:db8:afe:fe00::1". True, but my Loopback0 IPv4 address is "10.254.254.1", or in hex used directly as an IPv6 address is "afe:fe01::". I remember a saying about computers doing exactly what you tell them to. The Cisco router is sourcing the TFTP from the 'ip tftp source-interface Loopback0' - 'ip' as in "IPv4".

So is IPv6 TFTP broken? No, you just need to remove the 'source-interface' command:

R1#config term
Enter configuration commands, one per line.  End with CNTL/Z.
R1(config)#no ip tftp source-interface Loopback0
R1(config)#end
R1#copy run tftp
Address or name of remote host []? 2001:db8:192:168::254
Destination filename [r1-confg]?
!!
8774 bytes copied in 5.540 secs (1584 bytes/sec)
R1#

And confirmed on the TFTP server:

TFTP# crapps.pl -S tftpd -6
Starting MODE       -> TFTP Server
Listening on        -> [::]:69 (udp)
TFTP Root directory -> .

2001:db8:192:168::1  52000  WRQ  OCTET  r1-confg  STARTED
2001:db8:192:168::1  52000  WRQ  OCTET  r1-confg  SUCCESS [8774 bytes]

Much better. Of course now the source is the interface of the router that the TFTP traverses, in this case, FastEthernet2/0. This will also be the same for IPv4 TFTP now.

Nightly TFTP backups are one of those automated tasks we set and forget. Sure there are monitors in place to catch changes and email alerts, but how often does something go wrong? Imagine waking up to an error log an no backups. Not then end of the world, but certainly not something you want to see before your first cup of coffee. Test and test again, especially when incorporating IPv6.

Thursday, December 13, 2012

BGP Redistribution Redo

I've always been weary of redistributing an IGP into BGP rather than using explicit network statements. Sure it automates the process, but if I'm going to be redistributing BGP into the IGP (say for MPLS CPE) - hopefully with a 'route-map' - there is a potential for funky redistribution loops and issues. Better to break the cycle and statically configure BGP with 'network' statements to explicitly advertise only what you want.

Of course, I'll break my own rules in the name of testing or in this case, a lab exercise. I'm in a Cisco IPv6 training class and our BGP routing lab instructed us to:

"Configure IBGP between R1 and R2 using the parameters that are listed in the table."

ParameterR1R2
SourceLoopback 1Loopback 1
Redistribute into BGPIPv6 Connected
Set origin IGP
IPv6 Connected
Set origin IGP

The original configuration was pretty straight forward. EIGRP was used as the IGP to which we were to add BGP. EIGRP was already advertising the connected routes so the EIGRP admin distance was modified so IBGP would be preferred. I was already cringing, but decided to play along.

The relevant parts of the provided R1 configuration follow. You can assume R2 was identical with the appropriate corresponding addresses for interfaces and peers.

interface Loopback1
 ipv6 address 2001:DB9:121:100::1/64
!
interface Loopback2
 ipv6 address 2001:DB9:121:200::1/64
!
interface Serial0/0/0
 no ip address
 encapsulation frame-relay IETF
 frame-relay lmi-type cisco
!
interface Serial0/0/0.1 point-to-point
 description To R2
 ipv6 address 2001:DB9:123:1::1/64
 ipv6 eigrp 1
 frame-relay interface-dlci 122
!
ipv6 router eigrp 1
 eigrp router-id 10.12.1.1
 no shutdown
 passive-interface Loopback1
 passive-interface Loopback2
 distance 250 255

At this point, the BGP configuration was pretty easy. I added the following:

router bgp 65012
 bgp router-id 10.12.1.1
 no bgp default ipv4-unicast
 bgp log-neighbor-changes
 neighbor 2001:DB9:122:100::1 remote-as 65012
 neighbor 2001:DB9:122:100::1 update-source Loopback1
 !
 address-family ipv6
  neighbor 2001:DB9:122:100::1 activate
  redistribute connected route-map BGPCONN
  no synchronization
 exit-address-family
!
route-map BGPCONN permit 10
 match source-protocol connected
 set origin igp

It's no surprise that BGP came up and all was working. From R1:

W1P2R1#show bgp ipv6 unicast summary
BGP router identifier 10.12.1.1, local AS number 65012

Neighbor    V       AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
2001:DB9:122:100::1
            4   65012    1027    1044     47   0    0 00:02:05 4

And I could see my R1 routes on R2:

W1P2R2#show ipv6 route 2001:db9:121:200::/64
Routing entry for 2001:DB9:121:200::/64
  Known via "bgp 65012", distance 200, metric 0, type internal
  Backup from "eigrp 1 [250]"
  Route count is 1/1, share count 0
  Routing paths:
    2001:DB9:121:100::1
      Last updated 00:00:59 ago

But a subsequent 'show' command revealed an issue:

W1P2R2#show ipv6 route 2001:db9:121:200::/64
Routing entry for 2001:DB9:121:200::/64
  Known via "eigrp 1", distance 250, metric 20640000, type internal
  Route count is 1/1, share count 0
  Routing paths:
    FE80::219:55FF:FE35:1B90, Serial0/0/0.1
      Last updated 00:00:00 ago

The route was no longer learned via BGP, but through EIGRP. Weird. The previous command shows the admin distance was correct; iBGP [200] should be preferred over EIGRP [250 - modified]. What was happening?

I did the "up-arrow, enter" troubleshooting technique; that is, I ran the previous 'show ipv6 route ...' command over and over. Of course it worked. I noticed the "Last updated" timer reset every minute as the route flip-flopped between EIGRP and BGP. I started thinking about the BGP walker process and how it runs every 60 seconds.

Since we were in a lab, I had no issues with running 'debug bgp ipv6 unicast updates' on R2 and verified my instinct was correct.

*Dec 13 15:05:50.176: BGP(1): no valid path for 2001:DB9:121:1::/64
*Dec 13 15:05:50.176: BGP(1): no valid path for 2001:DB9:121:100::/64
*Dec 13 15:05:50.176: BGP(1): no valid path for 2001:DB9:121:200::/64
*Dec 13 15:05:50.176: BGP(1): no valid path for 2001:DB9:121:300::/64
*Dec 13 15:05:50.196: BGP(1): nettable_walker 2001:DB9:121:1::/64 no best path
*Dec 13 15:05:50.196: BGP(1): nettable_walker 2001:DB9:121:100::/64 no best path
*Dec 13 15:05:50.196: BGP(1): nettable_walker 2001:DB9:121:200::/64 no best path
*Dec 13 15:05:50.200: BGP(1): nettable_walker 2001:DB9:121:300::/64 no best path

*Dec 13 15:06:50.220: BGP(1): Revise route installing 2001:DB9:121:1::/64 -> 2001:DB9:121:100::1 (::) to main IPv6 table
*Dec 13 15:06:50.220: BGP(1): Revise route installing 2001:DB9:121:100::/64 -> 2001:DB9:121:100::1 (::) to main IPv6 table
*Dec 13 15:06:50.220: BGP(1): Revise route installing 2001:DB9:121:200::/64 -> 2001:DB9:121:100::1 (::) to main IPv6 table
*Dec 13 15:06:50.220: BGP(1): Revise route installing 2001:DB9:121:300::/64 -> 2001:DB9:121:100::1 (::) to main IPv6 table

BGP on R2 put the routes for the advertised R1 connected interfaces - including the network containing the address for the R1 Loopback1 interface it was peering with - into the routing table. The routes had a next hop of the R1 Loopback1 interface. When these routes entered the routing table, they displaced the EIGRP routes for the same networks. A minute later when BGP walker ran again, there was no IGP (EIGRP) path to the next hop for those routes, so BGP removed them (as seen in the first 8 lines of the 'debug'). The routes were quickly replaced with the EIGRP routes as seen in the above "show ipv6 route ..." commands. Sixty seconds later when BGP walker ran again, the BGP routes were learned and now had a valid next hop as the IGP routes were in the routing table and BGP introduced the same routes - as seen in the last 4 lines of the above 'debug'. Every sixty seconds this repeated and flip-flopped the routes between EIGRP and BGP.

Knowing the problem, the fix was easy. Just use the existing 'route-map' to block the Loopback1 network from being advertised as a connected route. On R1, I used:

route-map BGPCONN deny 5
 match ipv6 address PEER
!
ipv6 access-list PEER
 permit ipv6 2001:DB9:121:100::/64 any

The complementary statements on R2 and a 'clear bgp ipv6 unicast *' had everything working perfectly!

The lesson learned just reinforced my original statement regarding redistribution - proceed with caution.

 

Copyright © VinsWorld. All Rights Reserved.