This is just where I keep my network notes. Maybe you keep yours in a notebook, or a text file. Mine are just on this public webpage.

Some of this is copied from other sites (I try and cite those where possible) and other stuff is from lab reproductions.

My test setup is a CML cluster.

Contact Me

Email: ariadne@haske.org

This document is built on labwork and Interconnections, See [1]

Terms

Bridge: A device that participates in the spanning tree algorithm.
Root Bridge: The bridge that wins the STP election.
Bridge ID: Three fields, next to each other. Bridge Priority, Extension ID (the VLAN), MAC Address
BPDU: Bridge Protocol Data Unit. The frame used in 802.1D STP.
STP: Spanning tree protocol. Frequently cited at 802.1D.
802.1D: An IEEE standard. The oldest Ethernet STP.
Root ID: - The bridge that has won and is winning the elections.
Designated ports: AKA DP. Sends BPDUs downstream.
Root Port: AKA, RP. AKA, Upstream. Receives BPDUs, from upstream switch. Each bridge can have only one RP. RP is picked by port-selection-algo
TCN: Topology change notification. Sent by the bridge that sees a STP change, upstream via it's RP. This is it's own message.
TCA Bit: Topology Change Acknowledge, sent by the upstream bridge, to let the TC reporting bridge know it relay'd the TCN upstream. This is inside a config BPDU.
TC Bit: Topology Change. The root bridge sets the TC to tell other bridges to set their mac address tables to max age. This is inside a config BPDU.

How STP makes a loop free topology.

STP elects root and designated ports, aka RP, and DPs. It also moves STP ports into Blocking.

A bridge can only have one RP.
All ports on the root are DPs.
Ports on the root bridge never enter blocking.
Blocked ports must keep receiving BPDUs to stay blocked (the election must continue, forever)
if two would-be DPs send and receive BPDUs.
- There is a loop.
- The port that has the inferior BPDU will block.

All bridges turn on send BPDUs on all STP ports, themselves as root.
STP ports (bridges) compare BPDUs.
Bridge with lowest Bridge ID is root, (Lowest priority, if priority is default, lowest mac, usually the oldest switch)
All ports on root bridge are DP, and BPDU cost field is set to zero.
Root sends BPDUs.
DPs send configuration BDPUs.
RPs receive configuration BPDUs.
Root bridge sends BPDU, cost is 0, with port identifiers set.
A non-root bridge can only have one RP.
Non-root bridge gets BPDUs. It uses the port selection Algo to pick one RP.
Non-root bridge starts STP elections on all other ports, by sending BPDUs. It takes the cost inside the received BPDU, and adds it's port cost.
If a DP gets a BDPU, STP blocks the port if the received BPDU is better.

Port Selection Algo

All choices are made based on the received BPDU.
Modifications are made on the upstream switch.

Lowest cost to root.
Lowest system priority of advertising switch.
Lowest MAC of advertising switch.
Port Identifier Byte of advertising switch (port priority + port number)

Spanning Tree Protocol
    Protocol Identifier: Spanning Tree Protocol (0x0000)
    Protocol Version Identifier: Spanning Tree (0)
    BPDU Type: Configuration (0x00)
    BPDU flags: 0x01, Topology Change
    Root Identifier: 32768 / 1 / 52:54:00:10:43:6f
    Root Path Cost: 0
    Bridge Identifier: 32768 / 1 / 52:54:00:10:43:6f
    Port identifier: 0x0002     < ------------------------- first byte is "port priority" the default on Cisco is 128, or 0x80
    Message Age: 0
    Max Age: 20
    Hello Time: 2
    Forward Delay: 15

Timers

Hello Time is usually 2 seconds between BPDUs.
Forward Delay is typically 15 seconds. It's between off -> listening -> learning.

Device Priority.

4 bits, goes in geometric sequence starting from 0 to 61440.

switch(config)# spanning-tree vlan 60 priority ?
% Bridge Priority must be in increments of 4096.
% Allowed values are: 
  0     4096  8192  12288 16384 20480 24576 28672
  32768 36864 40960 45056 49152 53248 57344 61440

Root bridges election in Spanning Tree.

Two bridges send each other BPDUs, they compare bridge IDs to see who will keep sending BPDUs

The bridge with the lower ID (priority + mac address) wins. The non-root-bridge copies this bridge ID into it's BPDU, and sends that downstream.

The default for priority is 32768 or 0x80 on the wire. Because the 802.1D committee exists, the priority is this, plus the vlan ID.

Always configure a root bridge, or the oldest device with probably the lowest mac address wins the root bridge election.

Path Cost

The root bridge BPDU gets stuff tack'd onto it. The root bridge advertises itself as 0 cost.

Cost is the value of the link, towards the root bridge.

 ┌───────┐                                                                    
 │  SW1  │                                                                    
 └───┬───┘                                                                    
     │                                                                        
     │                                                                        
     │  Cost in BPDU from SW1 is 0                                                     
     │                                                                        
Eth0 │ ◄──── Interface is Assigned a cost of 100 by SW2 based on link Speed
 ┌───┴───┐                                                                    
 │  SW2  │                                                                    
 └───┬───┘                                                                    
Eth1 │                                                                        
     │                                                                        
     │   Cost in BPDU on-the-wire is now 100, SW2 Eth0 Cost                   
     │                                                                        
Eth0 │                                                                        
 ┌───┴───┐                                                                    
 │  SW2  │                                                                    
 └───────┘

Portfast

For end Hosts

Does not protect against BPDUs

Loop Prevention

Best practice is to set the root to 0 and the secondary to 4096.

A unidirectional failure on a root or alternate port will cause spanning tree to loop, as other switches will unblock ports, and the unidirectional failure will still forward frames. To prevent this, turn on stp loop guard so ... if a port doesn't get a BPDU, it enters STP loop-inconsistent disabling the port.

This is done per interface, and is pretty tedious.

switch(config)# interface Ethernet 1/1
switch(config-if)# spanning-tree guard loop

More details here.

Port Types

Designated ports: send BPDUs downstream.
Root Ports are the best port towards the root bridge, either the lowest total cost or the lowest advertised priority or lowest advertised port ID (interface number).

Root Path Cost

Root Path Cost - What the interfaces costs + the advertised cost to the root. The root sends a cost of 0.

STP Path Calculations

spanning-tree pathcost method long

Speed	Short-Mode Cost	Long-Mode Cost
10 Mbps	100	2000000
100 Mbps	19	200000
1 Gbps	4	20000
10 Gbps	2	2000
20 Gbps	1	1000
40 Gbps	1	500
100 Gbps	1	200
1 Tbps	1	20
10 Tbps	1	2

802.1D - Spanning Tree

The 802.1D committee wanted two learning states¹, one with and one without learning station addresses. This is why it's more complicated.

Interconnections - Radia Perlman, page 67.

┌─────────────┐
│     off     │
└──────┬──────┘
       │
       │  Turn on interface
       │
┌──────▼──────┐
│  Listening  │ Receive + Send BPDUs
└──────┬──────┘
       │
       │  forward delay (default 15s)
       │
┌──────▼──────┐
│  Learning   │ Receive + Send BPDUs + Program CAM
└──────┬──────┘
       │
       │  forward delay (default 15s)
       │
┌──────▼──────┐
│  Forwarding │ Receive + Send BPDUs + Program CAM + Forward Frames
└─────────────┘

BPDU Frame Format

This is a RSTP BPDU.

Spanning Tree Protocol

    Protocol Identifier: Spanning Tree Protocol (0x0000)
    Protocol Version Identifier: Rapid Spanning Tree (2)
    BPDU Type: Rapid/Multiple Spanning Tree (2x02)
    BPDU flags: 0x3c, Forwarding, Learning, Port Role: Designated
    
    0... .... = Topology Change Acknowledgment: No
    .0.. .... = Agreement: No
    ..1. .... = Forwarding: Yes
    ...1 .... = Learning: Yes
    .... 11.. = Port Role: Designated (3)
    .... ..0. = Proposal: No
    .... ...0 = Topology Change: No
    
    Root Identifier: 32768 / 1 / aa:bb:cc:00:07:00
    Root Path Cost: 100
    
    Bridge Identifier: 32768 / 1 / aa:bb:cc:00:0a:00
    Port identifier: 0x8003
    Message Age: 1
    Max Age: 20
    Hello Time: 2
    Forward Delay: 15
    Version 1 Length: 0

This is what the BPDU looks like on-the-wire

┌───────────────────────────────┬───────────────┬───────────────┐
│                               │               │               │
│          Protocol ID          │    Version    │   BPDU Type   │
│                               │               │               │
│1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8│
└───────────────────────────────┴───────────────┴───────────────┘
             2 bytes                  1 byte         1 byte

┌───────────────┬───────────────────────────────────────────────►
│               │
│     Flag      │                    Root ID
│               │
│1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
└───────────────┴───────────────────────────────────────────────►
    1 byte                            8 bytes

◄───────────────────────────────────────────────────────────────►

                           Root ID

 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
◄───────────────────────────────────────────────────────────────►
                          8 bytes

◄───────────────┬───────────────────────────────────────────────►
                │
    Root ID     │              Root Path Cost
                │
 1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
◄───────────────┴───────────────────────────────────────────────►
    8 bytes                       4 bytes

◄───────────────┬───────────────────────────────────────────────►
 Root Path Cost │
                │                Bridge ID
                │
 1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
◄───────────────┴───────────────────────────────────────────────►
  4 bytes                         8 bytes

◄───────────────────────────────────────────────────────────────►

                           Bridge ID

 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
◄───────────────────────────────────────────────────────────────►
                          8 bytes

◄───────────────┬───────────────────────────────┬───────────────►
                │                               │ Message age
   Bridge ID    │           Port ID             │  (in 1/256s of a second)
                │                               │
 1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8
◄───────────────┴───────────────────────────────┴───────────────►
    8 bytes                2 Bytes                   2 Bytes

◄───────────────┬───────────────────────────────┬───────────────►
                │           Max Age             │ Hello Time
   Message Age  │        (in 1/256ths)          │  (in 1/256ths of a second)
                │                               │
 1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8
◄───────────────┴───────────────────────────────┴───────────────►
    2 Bytes                 2 Bytes                   2 Bytes

◄───────────────┬───────────────────────────────┬───────────────┐
                │  Forward Delay                │   Version 1   │
   Hello Time   │    (in 1/256ths of a second)  │    Length     │
                │                               │               │
 1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8│
◄───────────────┴───────────────────────────────┴───────────────┘
    2 Bytes                 2 Bytes                   1 Byte

┌───────────────────────────────┐
│                               │
│      Version 3 Length         │
│                               │
│1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8│
└───────────────────────────────┘
           2 Bytes

Port elections

Bridge Priority, Vlan, Bridge MAC, Port Priority, Port Number

Default settings

Who is the root?

Both bridges temporarily send BPDUs with themselves both set as root.

+--------+                                                                                       +-------+                                                                 
|        |                                                                                       |       |                                                                 
|      1 +-- 32768 / 1 / 52:54:00:4b:99:08 / 8001 ------- 32768 / 1 / 52:54:00:e8:3a:ff / 8001 --+ 1     |                                                                 
|  SW1 2 +-- 32768 / 1 / 52:54:00:4b:99:08 / 8002 ------- 32768 / 1 / 52:54:00:e8:3a:ff / 8002 --+ 2 SW2 |                                                                
|      3 +-- 32768 / 1 / 52:54:00:4b:99:08 / 8003 ------- 32768 / 1 / 52:54:00:e8:3a:ff / 8003 --+ 3     |                                                                 
|        |                                                                                       |       |                                                                 
+--------+                                                                                       +-------+

SW1 wins with 4b. SW1 has the lower MAC address.

32768 / 1 / 52:54:00:4b:99:08 / 8001 < 32768 / 1 / 52:54:00:e8:3a:ff

Setting Bridge priority to zero

Who is the root?

Both bridges temporarily send BPDUs with themselves both set as root.

+--------+                                                                                       +-------+                                                                 
|        |                                                                                       |       |                                                                 
|      1 +-- 32768 / 1 / 52:54:00:4b:99:08 / 8001 ----------- 0 / 1 / 52:54:00:e8:3a:ff / 8001 --+ 1     |                                                                 
|  SW1 2 +-- 32768 / 1 / 52:54:00:4b:99:08 / 8002 ----------- 0 / 1 / 52:54:00:e8:3a:ff / 8002 --+ 2 SW2 |                                                                
|      3 +-- 32768 / 1 / 52:54:00:4b:99:08 / 8003 ----------- 0 / 1 / 52:54:00:e8:3a:ff / 8003 --+ 3     |                                                                 
|        |                                                                                       |       |                                                                 
+--------+                                                                                       +-------+

SW2 wins with 0. SW2 has the lower bridge priority.

32768 / 1 / 52:54:00:4b:99:08 / 8001 > 0 / 1 / 52:54:00:e8:3a:ff

Port Blocking, Port Default

Which ports block?

+-----------+                                                                                       +---------------+                                                                  
|           |                                                                                       |               |                                                                  
|      DP 1 |-- 32768 / 1 / 52:54:00:4b:99:08 / 8001 -----------------------------------------------| 1 RP          |                                                                  
|  SW1 DP 2 |-- 32768 / 1 / 52:54:00:4b:99:08 / 8002 -----------------------------------------------| 2 BLK  SW2    |                                                                  
|      DP 3 |-- 32768 / 1 / 52:54:00:4b:99:08 / 8003 -----------------------------------------------| 3 BLK         |                                                                  
|           |                                                                                       |               |                                                                  
+-----------+                                                                                       +---------------+

All ports on root bridge are DP.
SW2 gets three BPDUs, the best BPDU is on port 1, it has the lowest port number.
SW2 sets the other two ports to BLK.

Port Blocking, Port Priority

Which ports block?

+-----------+                                                                                       +---------------+                                                                  
|           |                                                                                       |               |                                                                  
|      DP 1 |-- 32768 / 1 / 52:54:00:4b:99:08 / 8001 -----------------------------------------------| 1 BLK         |                                                                  
|  SW1 DP 2 |-- 32768 / 1 / 52:54:00:4b:99:08 / 8002 -----------------------------------------------| 2 BLK  SW2    |                                                                  
|      DP 3 |-- 32768 / 1 / 52:54:00:4b:99:08 / 0003 -----------------------------------------------| 3 RP          |                                                                  
|           |                                                                                       |               |                                                                  
+-----------+                                                                                       +---------------+

All ports on root bridge are DP.
SW2 gets three BPDUs, the best BPDU is on port 3, it has the lowest priority. 00
SW2 sets the other two ports to BLK.

Topology Change Notifications (TCNs)

A TCN is a kind of BPDU message.
There is no root ID or bridge ID.
The TCN is sent out the RP.

Spanning Tree Protocol
    Protocol Identifier: Spanning Tree Protocol (0x0000)
    Protocol Version Identifier: Spanning Tree (0)
    BPDU Type: Topology Change Notification (0x80)

Bridge sees change in STP topology, sends TCN to upstream bridge.
Upstream sees TCN, sends a regular BDPU back with TCN-Ack set.
Upstream bridge sends TCN upstream, this continues until TCN reaches the root.
Root Bridge sees the TCN, marks BPDUs with TC bit set.
All bridges see TC, and set their max-age to 15 seconds.
Root bridge stops sending TCs.

The default for Cisco is keeping a mac-address in CAM for 300 seconds (5 minutes)

Receiving a TCN sets this max age to the forward delay usually 15 seconds. This means any server that is not actively sending, will have it's traffic flooded onto that VLAN.

switch# show mac address-table aging-time 
Global Aging Time:  300

Finding TCNs

switch# show spanning-tree vlan 20 detail | s Spanning
 VLAN0020 is executing the rstp compatible Spanning Tree protocol
  Bridge Identifier has priority 32768, sysid 20, address aabb.cc00.0100
  Configured hello time 2, max age 20, forward delay 15, transmit hold-count 6
  Current root has priority 8212, address aabb.cc00.0200
  Root port is 7 (Ethernet1/2), cost of root path is 200
  Topology change flag not set, detected flag not set
  Number of topology changes 8 last change occurred 01:07:20 ago   < ----
          from Ethernet1/2                                         < ----
  Times:  hold 1, topology change 35, notification 2
          hello 2, max age 20, forward delay 15 
  Timers: hello 0, topology change 0, notification 0, aging 300

On the device

switch# show spanning-tree vlan 20 detail | i VLAN|transitions 
 VLAN0020 is executing the rstp compatible Spanning Tree protocol
 Port 2 (Ethernet0/1) of VLAN0020 is designated forwarding 
   Number of transitions to forwarding state: 2
 Port 4 (Ethernet0/3) of VLAN0020 is alternate blocking 
   Number of transitions to forwarding state: 1
 Port 7 (Ethernet1/2) of VLAN0020 is root forwarding 
   Number of transitions to forwarding state: 2
 Port 8 (Ethernet1/3) of VLAN0020 is alternate blocking 
   Number of transitions to forwarding state: 0
 Port 12 (Ethernet2/3) of VLAN0020 is designated forwarding 
   Number of transitions to forwarding state: 2

In the logs

switch# show logging | i %LINK
*Jul  8 04:22:24.660: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Jul  8 04:22:24.702: %LINK-3-UPDOWN: Interface Ethernet0/1, changed state to up
*Jul  8 04:22:24.715: %LINK-3-UPDOWN: Interface Ethernet0/2, changed state to up
*Jul  8 04:22:24.740: %LINK-3-UPDOWN: Interface Ethernet0/3, changed state to up
*Jul  8 04:22:24.769: %LINK-3-UPDOWN: Interface Ethernet1/0, changed state to up
*Jul  8 04:22:24.794: %LINK-3-UPDOWN: Interface Ethernet1/1, changed state to up
*Jul  8 04:22:24.819: %LINK-3-UPDOWN: Interface Ethernet1/2, changed state to up
*Jul  8 04:22:24.858: %LINK-3-UPDOWN: Interface Ethernet1/3, changed state to up
*Jul  8 04:22:24.888: %LINK-3-UPDOWN: Interface Ethernet2/0, changed state to up
*Jul  8 04:22:24.903: %LINK-3-UPDOWN: Interface Ethernet2/1, changed state to up
*Jul  8 04:22:24.927: %LINK-3-UPDOWN: Interface Ethernet2/2, changed state to up
*Jul  8 04:22:24.942: %LINK-3-UPDOWN: Interface Ethernet2/3, changed state to up
*Jul  8 04:22:24.965: %LINK-3-UPDOWN: Interface Ethernet3/0, changed state to up
*Jul  8 04:22:24.989: %LINK-3-UPDOWN: Interface Ethernet3/1, changed state to up
*Jul  8 04:22:25.013: %LINK-3-UPDOWN: Interface Ethernet3/2, changed state to up
*Jul  8 04:22:25.033: %LINK-3-UPDOWN: Interface Ethernet3/3, changed state to up
*Jul  8 04:22:26.685: %LINK-5-CHANGED: Interface Vlan1, changed state to administratively down
*Jul  8 04:24:58.575: %LINK-5-CHANGED: Interface Ethernet0/0, changed state to administratively down
*Jul  8 04:25:06.138: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Jul  8 04:26:59.260: %LINK-5-CHANGED: Interface Ethernet0/0, changed state to administratively down
*Jul  8 04:27:11.982: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Jul  8 04:28:43.205: %LINK-5-CHANGED: Interface Ethernet0/0, changed state to administratively down
*Jul  8 04:31:09.988: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Jul  8 04:33:53.881: %LINK-5-CHANGED: Interface Ethernet0/0, changed state to administratively down
*Jul  8 04:34:02.140: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Jul  8 05:00:52.111: %LINK-5-CHANGED: Interface Ethernet0/0, changed state to administratively down
*Jul  8 05:00:59.749: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Jul  8 05:03:48.728: %LINK-5-CHANGED: Interface Ethernet0/0, changed state to administratively down
*Jul  8 05:03:54.050: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Jul  8 05:07:04.113: %LINK-5-CHANGED: Interface Ethernet0/0, changed state to administratively down
*Jul  8 05:07:06.713: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Jul  8 05:07:31.603: %LINK-5-CHANGED: Interface Ethernet0/0, changed state to administratively down
*Jul  8 05:07:36.280: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Jul  8 05:11:32.247: %LINK-3-UPDOWN: Interface Vlan10, changed state to up
*Jul  8 06:35:29.308: %LINK-5-CHANGED: Interface Ethernet0/0, changed state to administratively down
*Jul  8 06:35:43.756: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up

References

R. Perlman, Interconnections: Bridges, Routers, Switches, and Internetworking Protocols, 2nd ed. Boston, MA: Addison-Wesley, 1999.

Layer 2 Configuration Guide, Cisco IOS-XE 17.16.X

802.1Q Frame Format

32 bits added to a ethernet frame to multiplex VLANs

                                   ┌────── Priority Code Point(PCP)
                                   │         Used for LAN CoS
                                   │
                                   │   ┌── Drop Elgible Indicator (DEI)
                                   │   │
                                   ▼   ▼
┌───────────────────────────────┬─────┬─┬───────────────────────┐
│    Tag Protocol Identifier    │     │ │                       │
│     (TPID) Set to 0x8100      │ PCP │ │       VLAN ID         │
│                               │     │ │                       │
│1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8│1 2 3│4│5 6 7 8 1 2 3 4 5 6 7 8│
└───────────────────────────────┴─────┴─┴───────────────────────┘
            16 bits                3   1        12 bits

VLAN ID	Purpose
0	reserved for 802.1P
1	default vlan
2-1001	normal network operations
1002-1005	reserved
1006-4094	extended vlan range

Only works if the attached device sends a BPDU. Cannot prevent a switch from being attached to a port. 802.1x helps with that.

Detects a BPDU, and err-disables a port

The global command only affects ports that have portfast already turned on.

switch(config)# spanning-tree portfast bpduguard default

... should be set so access ports go errdisable when a rogue switch is connected and require an operator to correct.

Seeing `err-disabled` status

switch# show int status

Port      Name               Status       Vlan       Duplex  Speed Type 
[output omitted]
Et2/3                        err-disabled 1            auto   auto unknown
Et3/0                        connected    trunk        auto   auto unknown
Et3/1                        connected    1            auto   auto unknown

Turning on automated recovery

switch(config)# errdisable recovery cause bpduguard

Verify

switch# show errdisable recovery 
ErrDisable Reason            Timer Status
-----------------            --------------
arp-inspection               Disabled
bpduguard                    Enabled

[output omitted]
          
Interface       Errdisable reason       Time left(sec)
---------       -----------------       --------------
unicast-flood                Disabled
vmps                         Disabled
psp                          Disabled
dual-active-recovery         Disabled
evc-lite input mapping fa    Disabled
Recovery command: "clear     Disabled

Timer interval: 300 seconds

Interfaces that will be enabled at the next timeout:

Interface       Errdisable reason       Time left(sec)
---------       -----------------       --------------
Et2/3                  bpduguard          296

SPAN

Local

monitor session 1 source interface GigabitEthernet1/0/1 both
monitor session 1 destination interface GigabitEthernet1/0/2

RSPAN

VLAN Encapsulated.
Does not support layer 2 protocols. (CDP, BPDUs)
If the source is a trunk port, you can use the filter keyword to select specific vlans.

Source Switch

vlan 3000
 remote-span
monitor session 1 source interface GigabitEthernet1/0/1 both
monitor session 1 destination remote vlan 3000

Destination switch

vlan 3000
 remote-span
monitor session 1 source remote vlan 3000
monitor session 1 destination interface GigabitEthernet1/0/2

ERSPAN

GRE Encapsulated.

These will encapsulate BPDUs and other Layer 2 protocols.

These need ip routing turned on.

These do not support QoS.

Source switch

monitor session 1 type erspan-source
 !
 ! Could also put a vlan here
 !
 source interface Gi2
 destination
  erspan-id 100
  ip address 10.0.12.2
  origin ip address 10.0.12.1
 no shutdown

Destination switch

monitor session 1 type erspan-destination
 destination interface Gi2
 source
  erspan-id 100
  !
  ! An outside address on this box, not a loopback.
  ! this is the de-encapsulation interface.
  !
  ip address 10.0.12.2
 no shutdown

References

https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst9300/software/release/17-12/configuration_guide/nmgmt/b_1712_nmgmt_9300_cg/configuring_span_and_rspan.html

https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst9300/software/release/17-12/configuration_guide/nmgmt/b_1712_nmgmt_9300_cg/configuring_erspan.html

https://www.cisco.com/c/en/us/td/docs/iosxr/cisco8000/traffic-mirroring/b-traffic-mirroring-configuration-guide-cisco8k/erspan-overview/restrictions-for-erspan.html

OSPF is protocol 89.

Terms

IFF: If and only if
LSA: Link State Advertisement
LSDB: Link-state Database
OSPF Process ID: Just where the databases live. Not transmitted. Allows multiple OSPF processes.
DR: Designated Router. The network vertex for a broadcast or NBMA network. Used to simplify the number of FULL adjacencies.
Advertising Router: The router that created the LSA. The value in this field is the RID.
RID: Router ID. A unique 32-bit number to identify the router in a graph. Doesn't have to be an IP-the-box, but is usually a loopback.
The Update Rule: A router can only modify an LSA, iff it's RID is inside the "Advertising Router" field.
LS Sequence: Higher sequence numbers are newer LSAs. The first sequence number in any LSA is 8000000.
LS Checksum: Used to ensure the LSA was transmitted without corruption. Everything is checked except LS Age.
LS Age: LSAs time out in an hour, and are refreshed every 30 minutes. LSA Age increments when they go through routers.

Packet Types

Type	Name	Purpose
1	Hello	OSPF puts the neighbor ID into it's hello messages.
2	Database Description (DBD/DDP)	Used to sync a new neighbor rapidly. Large update packet, to transfer the LSDB in bulk. Contains lots of LSAs.
3	Link-State Request (LSR)	The router wants a specific LSA.
4	Link-State Update (LSU)	The neighbor sends a specific LSA.
5	Link-State Acknowledgment (LSAck)	To confirm a device got the intended LSAs, it transmits the exact same LSAs back to the receiver.

These can be thought of as the five steps.

We say hello, using each others names, to confirm we can both hear one another.
We share state (like the weather).
I ask how something went.
You tell me how it went.
To make sure I really got it, I'll repeat it word-for-word.

Hello Packets

These things must match for an adjacency to form

Subnet
Subnet mask
Interface MTU
Area
Area flags (NSSA, Stub)
Is DR/BDR enabled
Authentication
Hello time
Dead time

These must not match

Router ID

Check with debug ip ospf event

Broadcast Network Multicast Packet to acknowledge multiple neighbors

Ethernet II, Src: aa:bb:cc:00:4b:00 (aa:bb:cc:00:4b:00), Dst: IPv4mcast_05 (01:00:5e:00:00:05)
Internet Protocol Version 4, Src: 10.0.0.6, Dst: 224.0.0.5
Open Shortest Path First
  OSPF Header
  OSPF Hello Packet
      Network Mask: 255.255.255.0
      Hello Interval [sec]: 10
      Options: 0x12, (L) LLS Data block, (E) External Routing
      Router Priority: 1
      Router Dead Interval [sec]: 40
      Designated Router: 10.0.0.2
      Backup Designated Router: 10.0.0.1
      Active Neighbor: 1.1.1.1
      Active Neighbor: 2.2.2.2
      Active Neighbor: 3.3.3.3
      Active Neighbor: 4.4.4.4
      Active Neighbor: 5.5.5.5

OSPF Adjacency State Machine

State	Description
Down	OSPF is running, no hello packets received yet.
Attempt	NBMA mode, the router has sent OSPF packets.
Init	The router sees hello packets.
2-Way	The router sees it's own router-id in the hello packet.
ExStart	Routers vote on who exchanges LSDB first.
Loading	Router DB has been exchanged, router is requesting specific LSAs.
Full	LSDBs for this area are identical on both sides.

DR and BDR

OSPF uses explicit acknowledgments (re-sending the LSAs), so as neighbors and adjacencies grow, the amount of OSPF traffic on a network increases.

A network with six ospf routers forming a full-mesh requires 30 adjacencies.

To mitigate the scaling problem, on broadcast segments OSPF elects a DR, and BDR, to maintain the LSDB.

The RFC calls this a "network vertex". We can also use the term DR.

All routers listen for hello on 224.0.0.5
DR floods LSAs to the routers with 224.0.0.5
DROTHER talks to the DR/BDR on 224.0.0.6

In the diagram (from the RFC), everything connects to N2, so problem solved.

                                    **FROM**
                +---+      +---+
                |RT3|      |RT4|              |RT3|RT4|RT5|RT6|N2 |
                +---+      +---+        *  ------------------------
                  |    N2    |          *  RT3|   |   |   |   | X |
            +----------------------+    T  RT4|   |   |   |   | X |
                  |          |          O  RT5|   |   |   |   | X |
                +---+      +---+        *  RT6|   |   |   |   | X |
                |RT5|      |RT6|        *   N2| X | X | X | X |   |
                +---+      +---+

                          Broadcast or NBMA networks

See OSPF LSAs to see what the actual contents of the LSAs are.

The DR

Forms full adjacencies.

R1# show ip ospf neighbor 

Neighbor ID     Pri   State           Dead Time   Address         Interface
2.2.2.2          50   FULL/BDR        00:00:31    10.0.0.2        Ethernet0/0
3.3.3.3           1   FULL/DROTHER    00:00:37    10.0.0.3        Ethernet0/0
4.4.4.4           1   FULL/DROTHER    00:00:34    10.0.0.4        Ethernet0/0
5.5.5.5           1   FULL/DROTHER    00:00:32    10.0.0.5        Ethernet0/0
6.6.6.6           1   FULL/DROTHER    00:00:31    10.0.0.6        Ethernet0/0

First router online on the segment is the DR.

Drother

Only forms full adjacencies with the DR, and BDR.
When it sends LSAs, sends them to the DR/BDR via 224.0.0.6.

R1# show ip ospf neighbor 

Neighbor ID     Pri   State           Dead Time   Address         Interface
2.2.2.2          50   FULL/BDR        00:00:31    10.0.0.2        Ethernet0/0
3.3.3.3           1   FULL/DROTHER    00:00:37    10.0.0.3        Ethernet0/0
4.4.4.4           1   FULL/DROTHER    00:00:34    10.0.0.4        Ethernet0/0
5.5.5.5           1   FULL/DROTHER    00:00:32    10.0.0.5        Ethernet0/0
6.6.6.6           1   FULL/DROTHER    00:00:31    10.0.0.6        Ethernet0/0

Network LSAs

These are sent by the DR to describe the routers on this segment.

See OSPF LSAs to see what the actual contents of the LSA.

Identical Databases

Each router can perform it's own SPT via Dijkstra's algorithm.

LSAs are flooded throughout an area, all routers in the same area should have the same LSAs and same database.

R1# show ip ospf database database-summary  | s Area 0
Area 0 database summary
  LSA Type      Count    Delete   Maxage
  Router        5        0        0       
  Network       5        0        0       
  Summary Net   8        0        0       
  Summary ASBR  2        0        0       
  Type-7 Ext    0        0        0       
    Prefixes redistributed in Type-7  0
  Opaque Link   0        0        0       
  Opaque Area   0        0        0       
  Subtotal      20       0        0

R2# show ip ospf database database-summary | s Area 0
Area 0 database summary
  LSA Type      Count    Delete   Maxage
  Router        5        0        0       
  Network       5        0        0       
  Summary Net   8        0        0       
  Summary ASBR  2        0        0       
  Type-7 Ext    0        0        0       
    Prefixes redistributed in Type-7  0
  Opaque Link   0        0        0       
  Opaque Area   0        0        0       
  Subtotal      20       0        0

Can also check with checksums

show ip ospf | i Checksum

LSAs

The Router ID is what is used to build the SPT. It's very important it's both

Correct
Easy to identify the router

  +-------------------------+ Three fields to differentiate LSAs
  |         LS Age          |     - LS Type
  +-------------------------+     - Link State ID
  |  Options      LS Type   |     - Advertising Router
  +-------------------------+
  |     Link State ID       |  < -- Unique number from the Advertising Router for Each LSA
  +-------------------------+
  |   Advertising Router    |  < -- Router ID
  +-------------------------+
  |    LS Sequence Number   |  < -- How old the LSA is. LSAs with higher numbers are updates to older LSAs
  +-------------------------+
  |      LS Checksum        |
  +-------------------------+
  |        Length           |
  +-------------------------+

OSPF Hierarchy

OSPF has four levels of routing hierarchy

O - Intra-area (same area) OI - Inter-area (same OSPF domain) E1 - External type 1 (To an attached but non-OSPF domain) E2 - External type 2 (to the Internet)

The bit E is what makes E1 and E2 routes. The bit being set is an E2 route, which is considered less preferred.

Code	Number	RFC Name	Purpose	Description
O	1	Router-LSA	interfaces on a router	Flooded, Single Area, never crosses area boundary.
O	2	Network-LSA	routers on a network	Flooded, Single area, only sent by the DR.
IA	3	Summary-LSA	networks in other areas	ABRs send these, to describe, routes to networks
E1, E2	4	Summary-LSA	next-hop to a ASBR	ASBRs send these, to describe, routes to AS boundary routers.
E1, E2	5	AS-external-LSA	routes to E1 or E2 networks	ASBRs send these, to describe, routes to an AS.
E1, E2	7	NSSA Summaries		NSSA ASBRs send these, to describe, routes to an AS.

Type 5 LSAs

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            LS age             |     Options   |      5        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        Link State ID                          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                     Advertising Router                        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                     LS sequence number                        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |         LS checksum           |             length            |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         Network Mask                          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |E|     0       |                  metric                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      Forwarding address                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      External Route Tag                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |E|    TOS      |                TOS  metric                    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      Forwarding address                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Default Route

OSPF has two ways of originating a default route.

default-information originate if a default route is present.

default-information originate always do it anyway.

Cost

Default OSPF is all links above 100Mbps are the same cost.

auto-cost reference-bandwidth 40,000

Network Types

OSPF Representation of routers and networks

CLI	Network Types	LSA Type 1 or 2	Use-case
`ip ospf network broadcast`	Broadcast	2 - DR Election	Ethernet, Token Ring, FDDI
`ip ospf network non-broadcast`	NBMA¹	2 - DR Election	X.25, frame-relay, ATM. Requires a full-mesh.
`ip ospf network point-to-point`	point-to-point	1 - No DR	Serial links, Unnumbered, TDM, HDLC, PPP (Full Adjacency)
`ip ospf network point-to-multipoint`	Hub and spoke on Ethernet	1 - No DR	Hub and Spoke Topologies, like DMVPN or Frame Relay

RFC compliant (??) implementation. For actual nbma networks use ip ospf network point-to-multipoint.

The DR (which should be the HUB or bad things happen) needs to have static neighbor statements.

Moy                         Standards Track                    [Page 13]

RFC 2328                     OSPF Version 2                   April 1998

                                                  **FROM**

                                           *      |RT1|RT2|
                +---+Ia    +---+           *   ------------
                |RT1|------|RT2|           T   RT1|   | X |
                +---+    Ib+---+           O   RT2| X |   |
                                           *    Ia|   | X |
                                           *    Ib| X |   |

                     Physical point-to-point networks


                                                  **FROM**
                      +---+                *
                      |RT7|                *      |RT7| N3|
                      +---+                T   ------------
                        |                  O   RT7|   |   |
            +----------------------+       *    N3| X |   |
                       N3                  *

                              Stub networks

                                                  **FROM**
                +---+      +---+
                |RT3|      |RT4|              |RT3|RT4|RT5|RT6|N2 |
                +---+      +---+        *  ------------------------
                  |    N2    |          *  RT3|   |   |   |   | X |
            +----------------------+    T  RT4|   |   |   |   | X |
                  |          |          O  RT5|   |   |   |   | X |
                +---+      +---+        *  RT6|   |   |   |   | X |
                |RT5|      |RT6|        *   N2| X | X | X | X |   |
                +---+      +---+

                          Broadcast or NBMA networks

Area summary

These will show up as a IA route in OSPF, and a route-to-null on the ABR.

requires a route present in the RIB.

v4 example.

router ospf 1
 router-id 2.2.2.2
 area 1 range 10.0.0.0 255.255.224.0

v6 example.

router ospfv3 1
 !
 address-family ipv6 unicast
  area 1 range 2001:DB8::/56
 exit-address-family

Route-Filtering

You can use the same command to tell the router to ... exclude these routes from the backbone, via the not-advertise keyword.

Using range

The area command is now a route-filter.

v4 example.

router ospf 1
 router-id 2.2.2.2
 area 1 range 10.0.0.0 255.255.224.0 not-advertise

v6 example.

router ospfv3 1
 !
 address-family ipv6 unicast
  area 1 range 2001:DB8::/56 not-advertise
 exit-address-family

Using filter-lists

These are a bit harder to use, in and out are inbound and outbound to the area.

For this topology

             Area 0                               Area 1               
                                                               
                                 |           10.0.10.0/24            
                                 |         2001:db8:0:10/64          
                                 |                            +----+ 
                              +----+       +------------------+ R3 | 
+----+                        |    +-------+                  +----+ 
| R1 +------------------------+ R2 |                        
+----+                        |    +------+     
             10.0.0.0/24      +----+      |                   +----+ 
           2001:db8:0:0/64       |        +-------------------+ R4 | 
                                 |           10.0.20.0/24     +----+ 
                                 |         2001:db8:0:20/64

ip prefix-list PREFIX_LIST_LOOPBACK_v4 seq 10 deny 1.1.1.1/32
ip prefix-list PREFIX_LIST_LOOPBACK_v4 seq 20 deny 2.2.2.2/32
ip prefix-list PREFIX_LIST_LOOPBACK_v4 seq 30 deny 3.3.3.3/32
!
router ospf 1
 area 0 filter-list prefix PREFIX_LIST_LOOPBACK_v4 in
 area 1 filter-list prefix PREFIX_LIST_LOOPBACK_v4 in

!
ipv6 prefix-list PREFIX_LIST_v6 seq 10 deny FD::1/128
ipv6 prefix-list PREFIX_LIST_v6 seq 20 deny FD::3/128
ipv6 prefix-list PREFIX_LIST_v6 seq 30 deny FD::4/128
!
router ospfv3 1
 !
 address-family ipv6 unicast
  area 0 filter-list prefix PREFIX_LIST_v6 in
  area 1 filter-list prefix PREFIX_LIST_v6 in

Sham Link

The Problem

A customer with L3VPN service via OSPF-BGP-VPNv4 decides to connect two sites together via OSPF backdoor, a direct connection they manage themselves.

When they turn on their private OSPF peering, all the traffic between these two sites now prefers the new link, vs the L3VPN cloud.

The Solution: Sham Links

Sham links are needed because the routes provided by an L3VPN are O IA. When the OSPF backdoor link comes up it will be preferred for two reasons:

OSPF has a lower AD than BGP.
O routes are prefered over O IA

A sham link makes two PE routers at different sites in the same customer VRF form an intra-area connection.

From OSPF Sham-Link Support for MPLS VPN - Cisco.

Before you create a sham-link between PE routers in an MPLS VPN, you must:

Configure a new interface with a /32 address on the remote PE so that OSPF packets can be sent over the VPN backbone to the remote end of the sham-link. The /32 address must meet the following criteria:

Belong to a VRF

Not be advertised by OSPF

Be advertised by BGP

You can use the /32 address for other sham-links

References

https://datatracker.ietf.org/doc/html/rfc2328

Type 1 and Type 2 describe what's inside an area.

Type 1 - Here are my links.

Type 2 - Here is my attached network.

Type 1 - Router

R1# show ip ospf database router 1.1.1.1

            OSPF Router with ID (1.1.1.1) (Process ID 1)

                Router Link States (Area 0)

  LS age: 32
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 1.1.1.1
  Advertising Router: 1.1.1.1
  LS Seq Number: 8000007B
  Checksum: 0x1A77
  Length: 36
  Number of Links: 1

    Link connected to: a Transit Network
     (Link ID) Designated Router address: 10.0.0.1
     (Link Data) Router Interface address: 10.0.0.1
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

DROther

R4#show ip ospf database router 4.4.4.4

            OSPF Router with ID (4.4.4.4) (Process ID 1)

                Router Link States (Area 0)

  LS age: 135
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 4.4.4.4
  Advertising Router: 4.4.4.4
  LS Seq Number: 8000007C
  Checksum: 0x5D18
  Length: 36
  Number of Links: 1

    Link connected to: a Transit Network
     (Link ID) Designated Router address: 10.0.0.1
     (Link Data) Router Interface address: 10.0.0.4
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

**DR Describing the network

Type 2 - Network

R4# show ip ospf database network 

            OSPF Router with ID (4.4.4.4) (Process ID 1)

                Net Link States (Area 0)

  LS age: 183
  Options: (No TOS-capability, DC)
  LS Type: Network Links
  Link State ID: 10.0.0.1 (address of Designated Router)
  Advertising Router: 1.1.1.1
  LS Seq Number: 80000002
  Checksum: 0x4481
  Length: 48
  Network Mask: /24
        Attached Router: 1.1.1.1
        Attached Router: 2.2.2.2
        Attached Router: 3.3.3.3
        Attached Router: 4.4.4.4
        Attached Router: 5.5.5.5
        Attached Router: 6.6.6.6

Broadcast Network, with a DR

R1# show ip ospf database router 1.1.1.1

            OSPF Router with ID (1.1.1.1) (Process ID 1)

                Router Link States (Area 0)

  LS age: 32
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 1.1.1.1
  Advertising Router: 1.1.1.1
  LS Seq Number: 8000007B
  Checksum: 0x1A77
  Length: 36
  Number of Links: 1

    Link connected to: a Transit Network
     (Link ID) Designated Router address: 10.0.0.1
     (Link Data) Router Interface address: 10.0.0.1
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

DROther

R4#show ip ospf database router 4.4.4.4

            OSPF Router with ID (4.4.4.4) (Process ID 1)

                Router Link States (Area 0)

  LS age: 135
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 4.4.4.4
  Advertising Router: 4.4.4.4
  LS Seq Number: 8000007C
  Checksum: 0x5D18
  Length: 36
  Number of Links: 1

    Link connected to: a Transit Network
     (Link ID) Designated Router address: 10.0.0.1
     (Link Data) Router Interface address: 10.0.0.4
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

**DR Describing the network

R4# show ip ospf database network 

            OSPF Router with ID (4.4.4.4) (Process ID 1)

                Net Link States (Area 0)

  LS age: 183
  Options: (No TOS-capability, DC)
  LS Type: Network Links
  Link State ID: 10.0.0.1 (address of Designated Router)
  Advertising Router: 1.1.1.1
  LS Seq Number: 80000002
  Checksum: 0x4481
  Length: 48
  Network Mask: /24
        Attached Router: 1.1.1.1
        Attached Router: 2.2.2.2
        Attached Router: 3.3.3.3
        Attached Router: 4.4.4.4
        Attached Router: 5.5.5.5
        Attached Router: 6.6.6.6

From the DR

R1# show ip ospf database router 1.1.1.1

            OSPF Router with ID (1.1.1.1) (Process ID 1)

                Router Link States (Area 0)

  LS age: 32
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 1.1.1.1
  Advertising Router: 1.1.1.1
  LS Seq Number: 8000007B
  Checksum: 0x1A77
  Length: 36
  Number of Links: 1

    Link connected to: a Transit Network
     (Link ID) Designated Router address: 10.0.0.1
     (Link Data) Router Interface address: 10.0.0.1
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

From a DROther

R4#show ip ospf database router 4.4.4.4

            OSPF Router with ID (4.4.4.4) (Process ID 1)

                Router Link States (Area 0)

  LS age: 135
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 4.4.4.4
  Advertising Router: 4.4.4.4
  LS Seq Number: 8000007C
  Checksum: 0x5D18
  Length: 36
  Number of Links: 1

    Link connected to: a Transit Network
     (Link ID) Designated Router address: 10.0.0.1
     (Link Data) Router Interface address: 10.0.0.4
      Number of MTID metrics: 0
       TOS 0 Metrics: 10

The

Theory

BGP works on the premise that if a router sees its own AS path, it must be a loop.
The default timer is 60 seconds with 180 seconds for hold time. This means worst-case is 3 minutes to fail-over.
BGP aggregate-address only works if there is a subnet inside the aggregate range in BGP.

Working with BGP

Only consider traffic in one direction at a time
Accepting a route will affect outgoing traffic
Advertising a route will affect incomming traffic
Filter out everything except the routes needed
BGP DOES NOT LOAD BALANCE

On Cisco IOS bgp soft-reconfig-backup tells the router "if you must, save a entire table" otherwise rely on RFC2918, which are dynamic updates.

Soft reconfig is ancient, pre-RFC.

Soft Reconfig via Route Refresh (trusting the other device)!

clear ip bgp <neighbor_ip> soft in¹

https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_bgp/configuration/xe-16/irg-xe-16-book/bgp-4-soft-configuration.html

Example of a BGP AS Path

These read left to right like a book. This prefix was most recently from AS 7018.

7018 701 15 i
            ^ this means IGP, and AS 15 has an IGP route for it like OSPF or EIGRP

BGP Best Path Selection

- Higher Weigth                                       
- Higher Local Preference                            
- Locally Originated                                 (Network or Aggregate Command)
- Shortest AS-PATH
- Lowest Origin Type                                 (IGP > EGP > Incomplete)
- Lowest MED                                         (Neighbor ASes must be the same)
- Prefer eBGP > Confederated eBGP > iBGP
- Prefer path with lowest IGP metric to next hop
- Determine if bestpath is enabled
  - Prefer external path which is oldest
  - Prefer path from router with lower ID
  - Prefer path with shorter cluster length
  - Prefer path from lowest neighbor address

Cisco - Select BGP Best Path Algorithm

BGP Path Attributes

RFC 4271 - BGP-4

Well-known mandatory
Well-known discretionary
Optional transitive
Optional nontransitive

Path Attribute	Category
Origin	Mandatory
AS_PATH	Mandatory
NEXT_HOP	Mandatory
LOCAL_PREF	Discretionary
ATOMIC_AGGREGATE	Discretionary
AGGREGATOR	Optional Transitive
COMMUNITY	Optional Transitive
MULTI_EXIT_DISC	Optional Non-Transitive
ORIGINATOR_ID	Optional Non-Transitive
CLUSTER_LIST	Optional Non-Transitive

Origin

IGP > EGP > Incomplete

IGP means it came from an IGP. This is the highest preference.
Incomplete means its likely a redistributed route

Next Hop

eBGP, routers in different AS, destination outside AS. The Next hop will be the advertising router.
iBGP, routers in same AS, destination inside AS. The Next hop will be the advertising router.
iBGP, routers in same AS, destination outside AS. The Next hop is the external peer who advertised the address.

... When the third option happens ...

Advertise into the IGP the external links to the BGP peers.
Tell the AS border router to change the next hop to its own IP address. [next-hop-self]

LOCAL_PREF

Controls traffic Outgoing traffic.
Only shared between iBGP peers, used to determine the exit. Higher is better.

MULTI_EXIT_DISC

Controls incoming traffic.
Lower is better

ATOMIC_AGGREGATE

BGP can aggregate smaller prefixes into larger ones even if a smaller prefix comes from a different AS.

A router in AS 105 gets these prefixes from its peers.

192.168.0.0/24 (123 204)
192.168.1.0/24 (123 205)

If the administrator chooses, they can aggregate this, but lose path information.

192.168.0.0/23 (105) ATOMIC_AGGREGATE.

Downstream peers can not remove this tag

AGGREGATOR

AS and Router ID of the BGP router that did the atomic aggregation.

COMMUNITY

Usually used to tag routes from a specific customer.

Tag	Purpose
INTERNET	Default community.
NO_EXPORT	Do not share with other ASes
NO_ADVERTISE	Do not share with other routers
LOCAL_AS	????

ORIGINATOR_ID

For route reflectors The origaning router puts its Router_ID here. If it sees this, it knows a loop as occured.

CLUSTER_LIST

For route reflectors
The sequence of Router_IDs through which the route has passed. If a router seeis its Router_ID a loop has occured.

WEIGHT

Cisco specific & this router only
Routes learned are 0
Locally generated routes are 32768

Route Reflectors

A RR will not change any attributes of a route.

If a route is learned from a non-client iBGP peer, reflect to clients
If a route is learned from a client, reflect to everyone
If a route is learned from a eBGP peer, reflect to everyone

Only the route reflector is aware of the reflecting. The clients are dumb

If you configure route reflectors as a cluster you must manually configure the cluster_ID

BGP by default will summarize.

Use no auto-summary.

Using redistribute under BGP will make the resulting route show up with an orign code of incomplete.

Sending a default route

neighbor A.B.C.D default-originate

To get iBGP routers to update the next-hop to be themselves when advertising to other iBGP routers use

neighbor A.B.C.D next-hop-self

This makes it so other iBGP routers don't need reachability information for the physical link to the next AS.

BGP Finite State Machine

Idle - check the config
Connect - TCP is probably broken
Active - Listening for TCP
OpenSent
OpenConfirm
Established

Fixing next-hop issues

Just because the route shows up in show ip bgp doesn't mean it will install. BGP needs to be able to reach the next-hop.

Add the transit routes the IGP.
Use next-hop self in BGP.
Use a route-map to set the next hops.

Route Reflection

Terms

Cluster List - Router ID of the route Reflector. Used to prevent loops between RRs.
Originator - Route reflector peer. Used to prevent loops between clients.

Three rules for route reflectors

If the route is recieved from a non-client peer, reflect to clients only.
If the route is recieved from a client peer, reflect to non-client peers, and client peers.
If the route is recieved from an EBGP peer, reflect to all client and non-client peers.

Notes

Route reflectors can be clients of each other. This causes extra overhead.
If multiple route reflectors server the same cluster they should have the same Cluster_ID.

BGP Route Reflectors Loop Prevention

If a BGP router that receives a route from an iBGP neighbor in the incoming update detects the presence of its own Router-ID in the Originator-ID attribute it will reject the update.
If a BGP router that receives a route from an iBGP neighbor is configured to operate as a route reflector and in the incoming update detects the presence of its own Cluster-ID in the Cluster-list attribute it will reject the update.

Confederations

NEXT_HOP is preserved throughout the confederation.

MED is preserved for routes advertised into the confederation

LOCAL_PREF is preserved throughout the confederation

AS_PATH for privates ASes is used within the confederation

Force interior confederation MEDs to be considered:

bgp deterministic-med

Route Reflectors are generally preferred.

IF you want to add two BGP speakers to the same router reflector cluster, specify the cluster ID.

clients can not detect inter-cluster loops. They don't have the attributes in the BGP table.

BGP redistribution into anything

EIGRP Terminology

Successor route: The current best path, with the smallest metric. The "successful" route.
Successor: The first next-hop router for the successor route.
Feasible distance (FD): Lowest metric to reach a subnet. The sum of the RD + local cost.
Reported distance (RD): The metric inside a route update from another router. The sending router included it's FD, which becomes out RD.
Feasibility condition: If another path is actually a backup, the RD will be less than the current FD.
Feasible successor: A route that satisfies the feasibility condition and is maintained as a backup route.
Split Horizon: Never advertise a network, out the same interface it was learned on.
Poison Reverse: If you must advertise a network out the same interface it was received on, advertise the delay as infinity.

Example.

R2 sends an update

10.0.0.0/24 - RD is 2000

R3 Sends an update

10.0.0.0/24 - RD is 2050

R1 calculates total path metric.

R2 is 2000 + 1000 = 3000.
R3 is 2050 + 50 = 2100. < - Successor route.

R1 sees it has an reported distance less than the current distance, so installs that route as the feasible successor.

+--------+            1000             +--------+    10.0.0.0/24      
|   R1   +-----------------------------+   R2   +---------------------                           
+-----+--+                             +-+------+      2000                                                                               
      |            +--------+            |                            
      +------------+   R3   +------------+                                                                                                        
         50        +--------+      50

Example with the EIGRP topology table

R1# show ip eigrp topology 10.0.0.0/24
EIGRP-IPv4 Topology Entry for AS(1)/ID(1.1.1.1) for 10.0.0.0/24
  State is Passive, Query origin flag is 1, 1 Successor(s), FD is 2100
P 10.0.0.0/24, 1 successors, FD is 2100                <--- Feasible Distance
        via 10.0.13.3 (2100/2050), GigabitEthernet0/3  <--- Successor Route
        via 10.0.12.2 (3000/2000), GigabitEthernet0/2  <--- Feasible Successor
                       |     |
                       |     +-- Reported Distance 
                       +-------- Path Metric
        
                                                             (RD 2000 < FD 2100)

Metric calculation

metric = ([K1 * bandwidth + (K2 * bandwidth) / (256 - load) + K3 * delay] * [K5 / (reliability + K4)]) * 256

K1, set to 1 K3, set to 1

Wide metrics allow for faster links.

Unequal Cost Multi Path

EIGRP can load balance over the successor and feasible successor routes with a variance command.

Timers

Hello packets are every 5 seconds, on 60 seconds on T1 links.
- The deadtime is 3x the hold timer.

Initial Bringup

Send Hello packets, to 224.0.0.10
- Doesnt' require multicast to be on
- Unicast Init from neighbor, set Seq, Set Ack to 0
  - Neighbor Sends back Ack as prior sequence number.
  - Update Messages

Stuck in Active

The router is too busy to answer the query (generally due to high CPU utilization).
The router has memory problems and cannot allocate the memory to process the query or build the reply packet.
The circuit between the two routers is not good; there are not enough packets that get through to keep the neighbor relationship up, but some queries or replies are lost between the routers.
unidirectional links (a link on which traffic can only flow in one direction because of a failure)

Update Message

AS number
Prefixes
End-of-table Flag

Prefixes

Type (internal, etc)
Reliability
Load
MTU
Hop Count
Delay
Bandwidth
Flags
- Source Withdrawn
- Candidate Default
- Route is Active
- Route is Replicated
Next-hop
Prefix Length

Network

The CLI parser is converting the IP into binary, then comparing it to the wild mask.
The CLI parser will only save the matched bits of the IP.
The CLI parser will not save the zeroth network, anything starting with 0.
The CLI parser will only save the matched bits of an IP if if finds bits that are "on"
Using the "all" mask of 255.255.255.255 creates this statement 'network 0.0.0.0' and matches everything.
Using the "unique-ip" mask of 0.0.0.0 means "match this single address"
The wildcard mask only accepts contiguous numbers "Discontiguous mask is not supported."

192.0.2.5 127.255.255.255 - becomes 128.0.0.0, the rest of the bits get dropped.

References

https://www.cisco.com/c/en/us/support/docs/ip/enhanced-interior-gateway-routing-protocol-eigrp/16406-eigrp-toc.html

VRRP

HSRP

GLBP

Terms

GLBP - Gateway Load Balancing Protocol.
AVG - Active Virtual Gateway. The AVG response to ARP requests, with the same IP, but different MAC addresses to load balance for GLBP.
AVF - Active Virtual Forwarder. A router in a GLBP group that is forwarding packets. All AVFs have their own mac, and are responsible for forwarding traffic destined towards that MAC.

Cisco proprietary
224.0.0.102
UDP 3222
AVG is highest priority
Max of 4 active AVFs
Two states: Active, Listen
MD5 is supported

References

Cisco - Configuring GLBP

I learned this protocol using IOS-XR.

Async, no echo - Please respond to this packet with the control plane of the far device.

BFD Async without Echo

          Peer-A to Peer-B, lets agree to use BFD.
          
          Peer-A, I see your control packets.
          
          Peer-B, I also see your control packets.
          
          
          L3 SRC A
          L3 DST B
          
         +------------------------------->
+-------+                                 +-------+
|Peer-A |                                 |Peer-B |
+-------+                                 +-------+
         <-------------------------------+

Async, with echo - Just loop the BFD packets back onto the link, please.

BFD Async with Echo

The packets never leave the data plane, and never touches the control plane of Peer-A or Peer-B.


           L3 SRC A
           L3 DST A

!
! Peer A tests it's return path
!
+-------+                                   +-------+
|       | +-------------------------------+ |       |
|Peer-A |                                 | |Peer-B |
|       | <-------------------------------+ |       |
+-------+                                   +-------+


           L3 SRC A
           L3 DST A
!
! Peer B also tests it's return path
!
+-------+                                   +-------+
|       | +-------------------------------+ |       |
|Peer-A | |                                 |Peer-B |
|       | +-------------------------------> |       |
+-------+                                   +-------+

Ports

BFD is UDP, to an application on the network device

BFD Control is sent as SRC UDP 49512 --> Destination 3784

BFD Payload is sent as SRC UDP 3785 --> Destination 3785

BFD State Machine

Courtesy of the RFC

RFC 5880           Bidirectional Forwarding Detection          June 2010

(removed) 

The following diagram provides an overview of the state machine.
Transitions involving AdminDown state are deleted for clarity (but
are fully specified in sections 6.8.6 and 6.8.16).  The notation on
each arc represents the state of the remote system (as received in
the State field in the BFD Control packet) or indicates the
expiration of the Detection Timer.

                             +--+
                             |  | UP, ADMIN DOWN, TIMER
                             |  V
                     DOWN  +------+  INIT
              +------------|      |------------+
              |            | DOWN |            |
              |  +-------->|      |<--------+  |
              |  |         +------+         |  |
              |  |                          |  |
              |  |               ADMIN DOWN,|  |
              |  |ADMIN DOWN,          DOWN,|  |
              |  |TIMER                TIMER|  |
              V  |                          |  V
            +------+                      +------+
       +----|      |                      |      |----+
   DOWN|    | INIT |--------------------->|  UP  |    |INIT, UP
       +--->|      | INIT, UP             |      |<---+
            +------+                      +------+

Async - If the other side doesn't recieve the packets, it's declared down.
BOB - BFD over Bundle
BLB - BFD over Logical Bundle - (VLANS, Sub-interfaces). This requires multipath to be enabled. Multipath doesn't inject BFD packets into the HP queue.

IOS-XR Commands

multipath include location 0/1/CPU0
bundle coexistence bob-blb logical
show tech-support routing bfd file

IOS-XR Examples

Take the session down if latency grows to 150ms for a single echo packet.

bfd fast detect 
bfd multiplier 50
echo latency detect

Take the session down if latency grows to 300ms for a single echo packet.

bfd fast detect 
bfd multiplier 50
bfd echo latency detect percentage 200

Take the session down if the latency grows to 150ms for 3 consequitive echo packets

bfd fast detect
bfd multiplier 50
bfd echo latency detect percentage 100 count 3

Disable echo mode

bfd 
interface g0/0/0/0
 echo disable

Protecting the BFD data-plane packets from QoS

192.168.100.1 <-> 192.168.100.2

!
! Config for 192.168.100.1
!
ipv4 access-list BFD-TRAFFIC
 5 permit udp host 192.168.100.1 any range 3784 3785
 10 permit udp host 192.168.100.2 any range 3784 3785
!
class-map match-any BFD-CLASS
 match access-group ipv4 BFD-TRAFFIC
!
policy-map OUT
class BFD-CLASS
 priority level 1
 police rate 10 kbps
!
interface TenGig <>
 service-policy output OUT
 bfd address-family ipv4 multiplier 3
 bfd address-family ipv4 destination 192.168.100.1
 bfd address-family ipv4 fast-detect
 bfd address-family ipv4 minimum-interval 100
!

Enabling BFD on RSVP (IOS)

A Config

ip rsvp signalling bfd hello
!
! this very dangerous because CPU load will affect processing of BFD control packets
!
int f0/0.45
 ip rsvp signalling hello bfd
 bfd interval 50 min_rx 50 multiplier 3

Verification

show ip rsvp hello bfd nbr

Mutual Route-Redistribution

Tag EIGRP as 100
TAG OSPF as 1

Route maps should take the form DENY -> PERMIT.
Routes are tagged when they are advertised.

Route tags appear on-the-wire and can be read by other routers. ospf.lsa.asext.extrttag == 100

In this example, EIGRP becomes a Type-5 OSPF update, with a route-tag of 100. If we look for these tags can exclude them in redistribution updates.

route-map ospf-into-eigrp deny 10
 description previously tagged EIGRP traffic
 match tag 100
!
route-map ospf-into-eigrp permit 20
 match source-protocol ospf 1 ospfv3 1
 set tag 1
!
route-map eigrp-into-ospf deny 10
 description previously tagged OSPF traffic
 match tag 1
!
route-map eigrp-into-ospf permit 20
 match source-protocol eigrp 100
 set tag 100
!
router eigrp 100
 redistribute ospf 1 metric 1000000 100 255 1 1500 route-map ospf-into-eigrp
!
router ospf 1
 redistribute eigrp 100 subnets route-map eigrp-into-ospf

A very basic setup, that assumes a working underlay. I implemented this on my home lab of c7200s in GNS3 running 15.2(4)S7. My underlay was IS-IS to router loopbacks.

Site 1 EIDs - 192.168.100.0/24
Site 2 EIDs - 192.168.101.0/24

xTR for Site 1 - Lo0 16.16.16.16
xTR for Site 2 - Lo0 19.19.19.19

Site 1 - xTR

config

R18# show run | s lisp
router lisp
 database-mapping 192.168.100.0/24 18.18.18.18 priority 1 weight 50
 ipv4 itr map-resolver 16.16.16.16
 ipv4 itr
 ipv4 etr map-server 16.16.16.16 key cisco
 ipv4 etr
 exit

verify

R18# show ip lisp map-cache 
LISP IPv4 Mapping Cache for EID-table default (IID 0), 2 entries

0.0.0.0/0, uptime: 00:19:42, expires: never, via static send map-request
  Negative cache entry, action: send-map-request
192.168.101.0/24, uptime: 00:10:08, expires: 23:49:44, via map-reply, complete
  Locator      Uptime    State      Pri/Wgt
19.19.19.19  00:10:08  up           1/50

Site 2 - xTR

config

R19# show run | s lisp
router lisp
 database-mapping 192.168.101.0/24 19.19.19.19 priority 1 weight 50
 ipv4 itr map-resolver 16.16.16.16
 ipv4 itr
 ipv4 etr map-server 16.16.16.16 key cisco
 ipv4 etr
 exit

verify

R19#show ip lisp map-cache 
LISP IPv4 Mapping Cache for EID-table default (IID 0), 2 entries

0.0.0.0/0, uptime: 00:11:50, expires: never, via static send map-request
  Negative cache entry, action: send-map-request
192.168.100.0/24, uptime: 00:11:29, expires: 23:48:23, via map-reply, complete
  Locator      Uptime    State      Pri/Wgt
  18.18.18.18  00:11:29  up           1/50

MS/MR

config

R16# show run | s lisp
router lisp
 site 1
  authentication-key cisco
  eid-prefix 192.168.100.0/24
  exit
 !
 site 2
  authentication-key cisco
  eid-prefix 192.168.101.0/24
  exit
 !
 ipv4 map-server
 ipv4 map-resolver
 exit

verify

R16# show lisp site name 1
Site name: 1
Allowed configured locators: any
Allowed EID-prefixes:
  EID-prefix: 192.168.100.0/24 
    First registered:     00:25:12
    Routing table tag:    0
    Origin:               Configuration
    Merge active:         No
    Proxy reply:          No
    TTL:                  1d00h
    State:                complete
    Registration errors:  
      Authentication failures:   0
      Allowed locators mismatch: 0
    ETR 10.0.0.23, last registered 00:00:28, no proxy-reply, no map-notify
                   TTL 1d00h, no merge, nonce 0x3E715231-0x150380FC
                   state complete
      Locator      Local  State      Pri/Wgt
      18.18.18.18  yes    up           1/50 

R16# show lisp site name 2
Site name: 2
Allowed configured locators: any
Allowed EID-prefixes:
  EID-prefix: 192.168.101.0/24 
    First registered:     00:25:24
    Routing table tag:    0
    Origin:               Configuration
    Merge active:         No
    Proxy reply:          No
    TTL:                  1d00h
    State:                complete
    Registration errors:  
      Authentication failures:   0
      Allowed locators mismatch: 0
    ETR 10.0.0.26, last registered 00:00:37, no proxy-reply, no map-notify
                   TTL 1d00h, no merge, nonce 0x2F281A3C-0x0760FD58
                   state complete
      Locator      Local  State      Pri/Wgt
      19.19.19.19  yes    up           1/50

A Packet (an ICMP Request)

Capture is here

Frame 4156: 134 bytes on wire (1072 bits), 134 bytes captured (1072 bits) on interface -, id 0
Ethernet II, Src: ca:17:30:54:00:08 (ca:17:30:54:00:08), Dst: ca:1a:39:b0:00:08 (ca:1a:39:b0:00:08)
Internet Protocol Version 4, Src: 10.0.0.24, Dst: 19.19.19.19
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 120
    Identification: 0x0096 (150)
    010. .... = Flags: 0x2, Don't fragment
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 63
    Protocol: UDP (17)
    Header Checksum: 0x0aa2 [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 10.0.0.24
    Destination Address: 19.19.19.19
User Datagram Protocol, Src Port: 1024, Dst Port: 4341
    Source Port: 1024
    Destination Port: 4341
    Length: 100
    Checksum: 0x0000 [zero-value ignored]
    [Stream index: 2]
    [Timestamps]
    UDP payload (92 bytes)
Locator/ID Separation Protocol (Data)
    Flags: 0xc0
    Nonce: 939002 (0x0e53fa)
    0000 0000 0000 0000 0000 0000 0000 0001 = Locator-Status-Bits: 0x00000001
Internet Protocol Version 4, Src: 192.168.100.100, Dst: 192.168.101.100
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 84
    Identification: 0xc736 (50998)
    010. .... = Flags: 0x2, Don't fragment
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 63
    Protocol: ICMP (1)
    Header Checksum: 0x2959 [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 192.168.100.100
    Destination Address: 192.168.101.100
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0xc078 [correct]
    [Checksum Status: Good]
    Identifier (BE): 82 (0x0052)
    Identifier (LE): 20992 (0x5200)
    Sequence Number (BE): 1 (0x0001)
    Sequence Number (LE): 256 (0x0100)
    [Response frame: 4157]
    Timestamp from icmp data: Jul 20, 2023 18:00:03.000000000 Eastern Daylight Time
    [Timestamp from icmp data (relative): 0.551525000 seconds]
    Data (48 bytes)

0000  53 4e 08 00 00 00 00 00 10 11 12 13 14 15 16 17   SN..............
0010  18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27   ........ !"#$%&'
0020  28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37   ()*+,-./01234567

Sources

LISP Fundamentals and Troubleshooting Basics - Cisco

Terms

Multicast: A one-to-many service using UDP packets destined to group IP address. Hosts subscribe to the group, routers replicate for the group.
IGMP: Internet Group Management Protocol. A host uses IGMP to request a multicast stream. Switches see it (for snooping), and the FHR uses this to build the MDT.
PIM: Protocol Independent Multicast. Multicast capable routers communicate to each over via PIM.
IIL: Incoming Interface List, part of the MDT.
OIL: Outgoing Interface List, part of the MDT.
MDT: Multicast Distribution Tree. The full set of links participating in multicast, via PIM, IGMP, including IILs, and OILs.
RP: Rendezvous Point. A router designated as the root of a shared tree.
(*,G): Star comma Gee. AKA, a shared tree. These require a RP. Called Star comma Gee, because typing "show ip mroute" ... this is what shows up.
(S,G): Ess comma Gee. AKA a source tree. These do not require a RP.
Source Tree: AKA, SPT, or shortest path tree. SPT is best tree.
RPT: Rendezvous Point Tree, this is a *,G that points towards the RP.
ASM: Any Source Multicast. The host only knows the group it wants to receive (239.10.10.10).
SSM: Source Specific multicast. The host already knows the source, and group address (10.0.0.1, 232.10.10.10).
Upstream: Towards the source.
Downstream: Towards group members.
FHR: First hop router. This router receives a multicast stream.
LHR: Last Hop router receives IGMP messages from receivers, which are translated into PIM join messages.
MRIB: The multicast routing table. Shows RPTs, SPTs, RPFIs, OILs, and IILs.
MFIB: The forwarding table. This is used for programming the hardware.
RIB: Routing Information Base
DF: Designated Forwarder. Used in PIDIR-PIM.

Harder Terms

RPF - Reverse Path Forwarding

PIM is protocol independent, in the sense, that if a stream turns on, it must have a source, so it takes the form (10.0.0.1, 239.1.1.1), a (S,G).

If we do show ip route 10.0.0.1, we'll see the interface the router intends to send any traffic towards that source address. This is the "upstream" interface.

As multicast traffic flows from 10.0.0.1, it should flow into the upstream interface, and out of any downstream interfaces (the OIL).

Tracing the traffic back to the source this way is called "reverse path forwarding" and the interface along this path is the RPF.

The PIM neighbor on the RPF is called the RPF neighbor.

Any multi-cast traffic from any given source, not received on the RPF is discarded. This prevents loops.

Shared Trees

(*,G) entries in the mroute table require fewer resources, since multiple sources can use the same tree.

(*,G) entries in the mroute table represent a security risk, because any source can send to this shared tree.

Theory (in v4)

Multicast is always TO a group, a destination, or a set of destinations.

Multicast comes from an older time. Unlike Unicast addresses, you can tell via bits if a v4 address is multicast.

A multicast address always start with 1110

Address Scopes	Description
`224.0.0.0/4`	Multicast Supernet
`224.0.0.0/24`	Local Control (TTL=1)
`224.0.1.0/24`	Internetwork Control (an example is NTP, Cisco RP-Announce, Cisco RP-Discovery)
`232.0.0.0/8`	Source-Specific Multicast (SSM). Via an extension PIM can build (S,G) MDTs.
`233.0.0.0/8`	GLOP! Companies with a 16-bit ASN can have globally static multicast. 233.X.Y.0/8
`239.0.0.0/8`	Organization-Local Scope. Exactly like RFC1918, but for multicast.

Common L3 Addresses

Same Broadcast Domain

Protocol	Multicast Address
all-hosts	224.0.0.1
all-routers	224.0.0.2
OSPF-hello	224.0.0.5
OSPF-DR	224.0.0.6
RIPv2	224.0.0.9
EIGRP	224.0.0.10
PIM	224.0.0.13
mDNS	224.0.0.251

Can be forwarded

Protocol	Multicast Address
ntp	224.0.1.1
cisco-rp-announce	224.0.1.39
cisco-rp-discovery	224.0.1.40

Protocol	Multicast Address	Notes
ntp	224.0.1.1
cisco-rp-announce	224.0.1.39	Candidate RPs announce every 60s. Highest IP wins.
cisco-rp-discovery	224.0.1.40	Mapping agent floods RP-to-group mappings.

IANA Assignments

PIM forms adjacencies in only one direction

The multicast source is the root of the tree. Packets flow downstream from the source. Control plane traffic like PIM joins flow upstream to the RP, or to the reciever.

Protocol	Multicast Address
all-hosts	224.0.0.1
all-routers	224.0.0.2
OSPF-hello	224.0.0.5
OSPF-DR	224.0.0.6
RIPv2	224.0.0.9
EIGRP	224.0.0.10
PIM	224.0.0.13
mDNS	224.0.0.251

PIM

PIM Mode	Full Name	How it works
PIM-DM	Dense Mode	No RP. Floods everywhere, routers send prune messages to un-join. Assumes everyone wants the traffic.
PIM-SM	Sparse Mode	Complex. Requires a RP, RP Discovery, and phases. Uses register messages, and both tree types.
PIM Sparse-Dense	Sparse-Dense Mode	Runs sparse for groups with a known RP, dense for groups without. Legacy transitional mode.
Bidir-PIM	Bidirectional	Shared tree only, traffic flows both toward and away from RP. No SPT switchover. Good for many-to-many applications.
PIM-SSM	Source Specific	No RP. Receiver specifies both source and group (S,G).

PIM Message Types

Type	Message Type	Destination	Purpose
0	Hello	224.0.0.13 (all PIM routers)	Establish adjacency, negotiate parameters.
1	Register	RP address (unicast)	First-hop router notifies RP of new source, encapsulates multicast data until SPT is built.
2	Register stop	First-hop router (unicast)	RP tells first-hop router to stop sending Register messages.
3	Join/prune	224.0.0.13 (all PIM routers)	Join or prune a multicast tree, either (*,G) toward RP or (S,G) toward source.
4	Bootstrap	224.0.0.13 (all PIM routers)	BSR floods RP-set information throughout the domain so all routers know candidate RPs.
5	Assert	224.0.0.13 (all PIM routers)	Elect a single forwarder on a multi-access segment when duplicate traffic is detected.
8	Candidate RP advertisement	Bootstrap router (BSR) (unicast)	Candidate RPs advertise themselves to the BSR.
9	State refresh	224.0.0.13 (all PIM routers)	PIM-DM only. Prevents prune state from timing out and triggering a re-flood.
10	DF election	224.0.0.13 (all PIM routers)	Bidir-PIM only. Elects a Designated Forwarder per link to forward traffic toward the RP.

Auto RP

Cisco devices can announce their willingness to be an RP, via cisco-rp-announce

A different service, a mapping agent, will read these messages, pick a winner, then advertise that out via cisco-rp-discovery

5.5.5.5, Candidate RP.
4.4.4.4, mapping agent.

R4# show ip pim autorp 
AutoRP Information: 
  AutoRP is enabled.
  RP Discovery packet MTU is 1500.
  224.0.1.40 is joined on Loopback0.
  AutoRP groups over sparse mode interface is enabled

PIM AutoRP Statistics: Sent/Received
  RP Announce: 0/16, RP Discovery: 64/42

These packets are slow.

R4#debug ip pim auto-rp 
PIM Auto-RP debugging is on
R4#
!
! Sent to cisco-rp-discovery
!
*Apr 25 19:57:08.940: Auto-RP(0): Build RP-Discovery packet
*Apr 25 19:57:08.941: Auto-RP(0):  Build mapping (224.0.0.0/4, RP:5.5.5.5), PIMv2 v1,
*Apr 25 19:57:08.942: Auto-RP(0): Send RP-discovery packet of length 48 on GigabitEthernet0/3 (1 RP entries)
*Apr 25 19:57:08.943: Auto-RP(0): Send RP-discovery packet of length 48 on GigabitEthernet0/4 (1 RP entries)
*Apr 25 19:57:08.945: Auto-RP(0): Send RP-discovery packet of length 48 on GigabitEthernet0/0 (1 RP entries)
*Apr 25 19:57:08.948: Auto-RP(0): Send RP-discovery packet of length 48 on Loopback0(*) (1 RP entries)
*Apr 25 19:57:12.008: Auto-RP(0): Received RP-discovery packet of length 48, from 10.0.45.5, ignored
!
! Received by cisco-rp-announce
!
*Apr 25 19:58:30.159: Auto-RP(0): Received RP-announce packet of length 48, from 5.5.5.5, RP_cnt 1, ht 181
*Apr 25 19:58:30.159: (0): pim_add_prm:: 224.0.0.0/240.0.0.0, rp=5.5.5.5, repl = 0, ver =3, is_neg =0, bidir = 0, crp = 0
*Apr 25 19:58:30.160: Auto-RP(0): Update
*Apr 25 19:58:30.160:  prm_rp->bidir_mode = 0 vs bidir = 0 (224.0.0.0/4, RP:5.5.5.5), PIMv2 v1
R4# undebug all
All possible debugging has been turned off

Dense

Based on RFC 3973 Protocol Independent Multicast Dense Mode (PIM-DM)

Push Model
- Good for when every subnet probably wants this traffic
No PIM DR
- All FHR forward multicast traffic
  - Multicast traffic is flooded out every interface that isn't the RPF.
Eventually builds a SPT after prunes
IGMP joins turn into graft messages
Prunes last 3 minutes
- Flood and Prune
- Routers with no Receivers or duplicate S,G traffic prune.
- 224.0.0.13 to find neighbors
- Receivers prune back
- Router attached to LAN listens for multicast control plane.
  - Receives source traffic
    - Insert (*,G) and (S,G) into mrib
    - Incoming traffic is attached to IIL
    - OIL is all other interfaces
    - Flood to OIL
    - PIM dense always uses SPT.
Prune occurs
- Traffic flows stop, but (S,G) remains in table
- Multicast fails RPF
- No downstream neighbor or reciever
- Downstream sent prune
- LAN Prune override exception
After pruning
- Flood again, prune back, flood again, prune back

PIM Sparse

Based on RFC4601 - Protocol Independent Multicast Sparse Mode (PIM-SM)

Explicit joins everywhere. No flooding.
LHR, sends a PIM-Join towards the RP, building a (*,G).
Phased
- 1. The RPT tree
  - Receivers sending their (*,G) messages towards the RP.
  - FHR encapsulates the multicast traffic directly towards the RP.
  - PIM-Register
  - RP de-encapsulates the traffic, sending it down the RPT.
- 1. Register Stop
  - The RP sends a (S,G) towards the source.
  - When multicast packets start showing up, without encapsulation, the RP sends a Register-Stop.
- 1. SPT tree
  - LHR requests a (S,G) entry towards it's upstream, until it's joined to the (S,G) tree.
  - When the LHR starts getting two copies of the traffic, it sends a (S,G,rpt) prune message, towards the RP. (A prune specific to the RPT)
If two LHRs exist, and duplicate traffic is detected a PIM elections happens.
- These Asserts are every 3 minutes.
- RPTbit, 0 is preferred and means "has (S,G) tree"
  - Metric Preference (Administrative Distance)
    - Metric
      - IP address of subnet interface.
Specify the tunnel, for the pim-register messages on Cisco via ip pim register-source loopback 0
The tunnel interface encapsulates the entire multicast packet, which adds 28 bytes of overhead. Packets close to the MTU will be silently dropped on IOS-XE.

PIM-SM-register-register-stop-prune.pcap

a DR is elected by highest priority, or highest IP in the subnet.

DR sends the PIM join upstream.

The RP always gets the stream, even if it has no receivers to forward it to.

BIDIR-PIM

Based on RFC 4601 - Bidirectional Protocol Independent Multicast (BIDIR-PIM)

Superset of PIM-SM
No (S,G) entries
Traffic can flow up and down the same tree.
Still needs RPs
- RP must be dedicated to BIDIR-PIM.
Each bidirectional link has a DF election.
- Ingress packets on any PIM interface can be forwarded downstream onto DF links.
  - No DF links, no forwarding.
- Ingress packets to a DF can be forwarded upstream via the RPF towards the RPA.

MSDP

RPs register to each other, in different multicast domains.
RP sends a SA (source active) message.
Still needs PIM running for the S,G.
TCP port 639.
Has keepalives.

show ip msdp peer show ip msdp sa-cache

Shared-Tree (*,G)

Shared trees are essential for multiple senders to the same group
A single tree is built for each group, regardless of source
- 3 sources, 1 tree
Selects a router as the root of the tree
If a receiver is on the same subnet as the sending host, it will need to revert to PIM Dense for that segment
This isn't always better. Shared trees will typically take suboptimal paths through a network
Source trees are better distributed, hence they are more robust
RP Selection is a hassle

Source Based Multicast (S,G)

PIM dense uses a separate tree for each multicast source and destination group.
Groups do not share trees.
- 3 Sources 3 trees.

Commands

show pim rpf hash

show pim range-list

show pim topology

show mrib route

show ip mroute

What interface should I receive this host traffic from?

show ip rpf 10.0.0.

show ip mfib

See if multicast even works

show ip pim stats

See if PIM adjacency traffic even arrives.

show ip pim interface detail

See results of DF election

show ip pim interface df

FLAGS
 A - Accepting. This interface is accepting data
 F - Forwarding. Where to send multicast traffic

Nexus 7K

show forwarding multicast route group <>

L2 Addresses

MAC addresses are 48 bits.

The first 25 bits are always.

0000 0001 . 0000 0000 . 0101 1110 . 0??? ????
       01 :        00 :        5E :
        ^                           ^
        |                           └─  Multicast requires this bit be 0.
        |
        └─ Individual/Group. Multicast requires this bit be 1.

So the first six bytes are 01:00:5E

The last 23 bits come from the IP address.

A Multicast IP

Mapping 232.10.10.10 → 01:00:5E:0A:0A:0A

Copy the low order 23 bits directly from the v4 address.


  232.10.10.10/8
  (in binary)
  1110 1000 . 0000 1010 . 0000 1010 . 0000 1010
               \______________________________/
               Remember these 23 bits.

Building the L2 Address.

Ethernet Multicast MAC Address

          1 :         0 :        5E :        0A :       0A  :        0A
  0000 0001 . 0000 0000 . 0101 1110 . 0000 1010 . 0000 1010 . 0000 1010
  \__________________________________/|\______________________________/
        Assigned first 25 bits        |   Same bits as above.
        (always 01:00:5E)             |  (24 bits → 23 bits, 1 bit dropped)
                                      |
                                      |
                                      └─  Multicast requires this bit be 0

Quirks and Tech Debt.

Because we copied only 23 bits, vs 28 bits, we have 5 bits of overlap.

v4 is 32 bits, minus those four bits that can never change 1110 to get 28 bits.

All these IPs share the same multicast L2 address.

All 32 IPv4 addresses mapping to 01:00:5E:0A:0A:0A
══════════════════════════════════════════════════════════════════════════════
Address           Octet 1    Octet 2    Octet 3    Octet 4
──────────────────────────────────────────────────────────────────────────────
224. 10.10.10     1110 0000  0000 1010  0000 1010  0000 1010
224.138.10.10     1110 0000  1000 1010  0000 1010  0000 1010
225. 10.10.10     1110 0001  0000 1010  0000 1010  0000 1010
225.138.10.10     1110 0001  1000 1010  0000 1010  0000 1010
226 .10.10.10     1110 0010  0000 1010  0000 1010  0000 1010
226.138.10.10     1110 0010  1000 1010  0000 1010  0000 1010
227 .10.10.10     1110 0011  0000 1010  0000 1010  0000 1010
227.138.10.10     1110 0011  1000 1010  0000 1010  0000 1010
228 .10.10.10     1110 0100  0000 1010  0000 1010  0000 1010
228.138.10.10     1110 0100  1000 1010  0000 1010  0000 1010
229 .10.10.10     1110 0101  0000 1010  0000 1010  0000 1010
229.138.10.10     1110 0101  1000 1010  0000 1010  0000 1010
230 .10.10.10     1110 0110  0000 1010  0000 1010  0000 1010
230.138.10.10     1110 0110  1000 1010  0000 1010  0000 1010
231 .10.10.10     1110 0111  0000 1010  0000 1010  0000 1010
231.138.10.10     1110 0111  1000 1010  0000 1010  0000 1010
232 .10.10.10     1110 1000  0000 1010  0000 1010  0000 1010  < --- This is our SSM address.
232.138.10.10     1110 1000  1000 1010  0000 1010  0000 1010
233 .10.10.10     1110 1001  0000 1010  0000 1010  0000 1010  < --- An address in the GLOP block.
233.138.10.10     1110 1001  1000 1010  0000 1010  0000 1010
234 .10.10.10     1110 1010  0000 1010  0000 1010  0000 1010
234.138.10.10     1110 1010  1000 1010  0000 1010  0000 1010
235 .10.10.10     1110 1011  0000 1010  0000 1010  0000 1010
235.138.10.10     1110 1011  1000 1010  0000 1010  0000 1010
236 .10.10.10     1110 1100  0000 1010  0000 1010  0000 1010
236.138.10.10     1110 1100  1000 1010  0000 1010  0000 1010
237 .10.10.10     1110 1101  0000 1010  0000 1010  0000 1010
237.138.10.10     1110 1101  1000 1010  0000 1010  0000 1010
238 .10.10.10     1110 1110  0000 1010  0000 1010  0000 1010
238.138.10.10     1110 1110  1000 1010  0000 1010  0000 1010
239 .10.10.10     1110 1111  0000 1010  0000 1010  0000 1010
239.138.10.10     1110 1111  1000 1010  0000 1010  0000 1010  < --- an Organizational scope address.
══════════════════════════════════════════════════════════════════════════════
                       ^^^^  ^
                       ||||  | 
                       └└└└──└─ I incremented these five bits to show the pattern.

Lab Stuff.

BPF - Capture all PIM, but not PIM hello messages.

ip proto 103 and not ether[34] == 0x20

Sending Multicast

iperf --client 239.10.10.10 --udp --time 3600 --interval 1 --bandwidth 1pps --ttl 15 --len 1000

Receiving Multicast

iperf --server --udp --bind 239.10.10.10 --interval 1

The C9000-L series, does not support Catalyst Center, and has lower stackwise Speeds.

Two Tier Collapsed Core

cisco-campus-two-tier-collapsed-core-cisco

The core and distribution switches are the same
The center is running StackWise Virtual

Three Tier

cisco-campus-three-tier-with-network-services-layer

Layer 2 Access with traditional multilayer

Layer 2 is a single wiring closest, or access uplink pair.
FHRP is used, but limits bandwidth to one uplink, vs both.

The Campus Network

Campus networks are always oversubscribed.
Over-subscription rates between 4-20 are common.
Networks with over-subscription that results in queuing should implement QoS for voice traffic.

Access Layer

9200 (160Gbps stack-wise ring)
9300 (480Gbps stack-wise ring)
9400 (modular chassis)

Considerations

mGig, so access speeds can scale
UPOE+, 90W with perpetual power (survives reboots)

Distribution Layer

9400 (modular chassis)
9500
9600 (modular chassis)

Considerations

Service heavy (FHRPs, Routing, SVIs)
Typical L2 boundary
Used to interconnect all the access layer switches in a building
Used to interconnect Access layer switches, once they can't form a full-mesh
Also contains the failure domain of the access layer.
Simplified Distribution, using stackwise virtual to remove FHRP.

Core Layer

9500
9600 (modular chassis)

Considerations

No services
Layer 3 only
Always on
Ideally, a minimum of 100G to conserve ports.

cisco-campus-lan-core

Traditional Design

cisco-campus-looped-access

Needs STP to block ports

Traditional Design - Loop Free

cisco-campus-loop-free-access

Other Designs

SD-Access

Cisco Catalyst Center
Cisco Identity Services Engine

cisco-campus-sd-access-design

Open Standards Based Overlay

MP-BGP
VXLAN

cisco-campus-bgp-evpn-vxlan

Campus LAN Best Practices - Security

DHCP Snooping, to prevent users from hooking up a DHCP server from home on accident.
Dynamic ARP inspection, to prevent a ARP attack, where the attack sends ARP replies with the IPs in the subnet.
BDPU Guard, to prevent home switches.
802.1x, port authentication
Cisco Umbrella, Cisco's DNS offering.

Campus LAN Best Practices - High Availability

SSO: Stateful Switch Over, used to sync RPs in modular switches.
NSF: Non-Stop Forwarding allows graceful restarting of a L3 protocol. Allows the data-plane to continue while the new RP
MLS: Multi-layer Switch.
StackWise: Older tech, to combine switches together. Up to 8 switches can be stacked. They operate as one switch.
StackWise Virtual: Two MLS devices, are combined to become one logical device.
StackWise Virtual Link: The control/data path between the two switches. Should be two links minimum.
GIR: Graceful Insertion or Removal. Influencing paths by changing route-metrics or adjusting FHRP priorities.

Etherchannel

Use a dynamic protocol, to check on link health

References

https://www.cisco.com/c/en/us/td/docs/solutions/CVD/Campus/cisco-campus-lan-wlan-design-guide.html

Cisco Catalyst Center: Formerly Cisco DNA center. Speaks NETCONF, SNMP, SSH southbound, REST/HTTPS Northbound.
Campus Fabric: Equipment managed without Catalyst Center, can be CLI or NETCONF/RESTCONF.
ISE: Identity Services Engine. Cisco's modern AAA server.
SD-Access: Campus Fabric managed with Cisco Catalyst Center and Cisco ISE.
SGT: Scalable Group tags, formally called Security Group Tags. These are managed by ISE.
SGT Policy: Instead of identifying traffic based on IP or MAC, traffic can be identified by SGT.
Overlay: LISP, VXLAN and CTS (Cisco TrustSec, carries SGTs inside of VXLAN-GPO.
VXLAN-GPO: Cisco extended the VXLAN header to include SGTs (Now called Scalable Group Tags)
Underlay: Usually IS-IS, since it's IPv4 and IPv6 agnostic. Even the underlay can be automatically deployed.
Control Plane Node: Contains the LISP MS/MR databases Endpoint-to-location, or EID-to-RLOC. Each node contains the full database.
Fabric Border Node: Connects other L3 networks to SDA fabric.
Fabric WLC: Connects APs and the WLC to the SDA fabric.
Fabric Intermediate Node: Only does underlay services, like IS-IS or IP transport.
Fabric Edge Node: Connects campus host devices to the SDA fabric, usually an access layer or distribution layer device. Is a LISP xTR, with an anycast gateway, with overlay host protocols, (like DHCP).

Fabric Edge Onboardin

(Method 1) Open Auth or MAB, user connects to a port -> host pool.
(Method 2) 802.1x authenticates the device -> host pool.
Host pool has a SGT, SVI and VRF instance.
SVI is the anycast gateway (same IP address and MAC for that SVI & VRF) on all edge nodes.
Host address is now an EID (MAC, /32 IPv4, /128 IPv6), that can be registered with the control plane node.
Control plane signaling is LISP, dataplane is managed via VXLAN-GPO.

Fabric Border Nodes Types

Internal Border: WLC, Firewall, Data center
Default Border: Internet.
Internal + Default: Both.

Wireless

If the WLC can participate in the fabric, it's a fabric aware WLC. It performs PxTR (proxy lisp encap/de-encap) for hosts connected to fabric APs, and registers their EIDs with the control nodes.

Control plane traffic is CAPWAP inside of VXLAN-GPO. Dataplane traffic can just ride VXLAN-GPO

LISP

The LISP instance ID is the VRF.

Cisco Catalyst Center

NCP: Network Control Platform. This module is connect via API to the GUI, and is what talks to the network gear via NETCONF, SNMP, or SSH. Does all the underlay automation.
NDA: Network Data Platform. Data collection and analytics. Netflow, Syslog, ERSPAN, etc.
ISE: Is required. 802.1x, Mac Authentication Bypass (MAB), or Web Authentication (WebAuth). Can talk to AWS or Active Directory. ISE is tightly integrated via API calls to CatC.

Terms

DIA: Direct Internet Access. What we usually have has residential customers. No real guarantee of service, but tends to be fast.
SLA: Service Level Agreement. Business Internet, especially, to connect sites together tends to have a SLA.
MPLS: A kind of VPN service provided by an ISP, to connect business sites together. Comes with a SLA. More expensive than DIA.
BFD: Bidirectional Forwarding Detection

Devices

Manager: AKA vManage, AKA, the NMS. What a human interacts with, the GUI
Validator: AKA vBond. Initial Authentication and provisioning, (Cisco calls this orchestration) Responsible for NAT traversal.
Controller: AKA vSmart. Holds the current state of the network, (routes and data policy) maintains active connections to the edges and programs them.
WAN Edge: AKA vEdge. What gets programmed. Provides data-plane between sites, via circuits like DIA, or MPLs.
vEDGE: Old hardware-based Viptela gear, pre-Cisco acquisition. Unfavored.

Marketing Terms

Cisco SD-WAN Cloud OnRamp: AKA, CoR. Edges can perform analytics to SaaS or IaaS offerings to select the best path, via jitter.

Validator

Should be give a FQDN, so WAN edges have no problems finding it on connection to a DIA.

FQDNs also mean we aren't putting a static IP into a config.

Initial authentication is done with PKI, and RSA encryption.

Can not be placed behind NAT, unless the NAT device does a 1:1 static translation.

This device does the load balancing if multiple controllers are being used.

The Validator has a permanent dTLS tunnel to all the controllers.

Controllers

Keeps all the routes between sites, that are managed via the OMP protocol (like BGP, but proprietary)
Logical tunnel topologies (such as hub and spoke, regional, and partial mesh)
Service Chaining
Traffic Engineering
Segmentation per VPN

WAN Edge

Dataplane for a site
Has OMP, BGP, OSPF, EIGRP, ACLs, ARP, HA, and QoS.
Connects via dTLS to the controllers.
Connects via dTLS to other edges.

SD-WAN Policy

Policies are further classified as

Local Policy: Programed on the edges. ACLs, QoS, routing, and AAA.
Centralized Policy: Route policy, before being sent to the edges, (Topology, VPN Membership, Application Aware Routing)

Application Aware Routing

If two edges connect to each other over dTLS, BFD is run over the tunnel.
For AAR, or CoR, the edge will send HTTP probes and measure the jitter and/or loss.
The score for an app is the vQoS (Viptela Quality of Experience) from 0 to 10, 10 being best.

VPNs

VPN0: Underlay Signaling, transport WAN. Typically public addresses or SRC-NAT Public addresses.

VPN512: OOB Management

VPNn: Any number from 1 to 65527. Not 0. Not 512. Used for service-side (also known as LAN-side) traffic.

sd-wan commands

show sdwan control local-properties

DTLS Tunnels to SDWAN Manager and SDWAN Controllers

show sdwan control connections

show sdwan control connection-history

OMP

show sdwan omp peers

show sdwan omp routes

show sdwan omp tlocs

show sdwan omp services

show sdwan omp multicast-routes

Validator Only

show orchestrator connections

Initial Bringup

Pasting in the bootstrap

tclsh
puts [open "bootflash:name-of-bootstrap-file.cfg" w+] {
<list of certs goes here>
<must be done via an actual terminal>
<like SecureCRT>
<with character and line send delay>
}

Copy via HTTP using Python

Get the current IP

python -c "import socket; s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); s.connect(('8.8.8.8', 80)); print(s.getsockname()[0]); s.close()"

Start the server with above IP

python -m http.server 8000 --bind 10.0.0.1

Copy into cisco box

copy tftp://10.0.0.1:8000/.cfg bootflash:/.cfg

controller-mode enable

Terms

RADIUS - Remote Authentication Dial-In User Service. Created to provide AAA for ISP users, or Dial-In for businesses.
TACACS - Terminal Access Controller Access-Control System. An AAA protocol to provide support for authenticate once, authorize many.
TACACS+ - Same as above, basically an upgraded version, not backward compatible.
EAP - Extensible Authentication Protocol, 802.1x, used for LAN Auth, only works with RADIUS.

TACACS+ Flows

Authentication Flow

tacacs-plus-authentication-flows

Authorization and Accouting Flow

tacacs-plus-auth-and-accounting-flows

Log Message Severity Levels

Keyword	Severity	Description	Mnemonic
Emergency	0	System unusable	Even
Alert	1	Immediate action required	A
Critical	2	Critical Event (Highest of 3)	Computer
Error	3	Error Event (Middle of 3)	Expert
Warning	4	Warning Event (Lowest of 3)	Will
Notification	5	Normal, More Important	Not
Informational	6	Normal, Less Important	Ignore
Debug	7	Requested by User Debug	Debugs

Mnemonic courtesy of Romelchand

NTP

Server Only - Based on Internal Clock

ntp master <stramum>

Client/Server - Based on other NTP clocks and stratum

ntp server <address|hostname>

An Example Config

I found a list of time servers here.

ntp server pool.ntp.org
ntp server time.nist.gov
ntp server time.cloudflare.com
ntp source <loopback-should-go-here>
!
! NTP Master 7 ... if internet connectivity is lost, and external NTP fails, this box can still serve NTP.
!
ntp master 7

A caution: Using pool.ntp.org

Consider if the NTP Pool is appropriate for your use. If business, organization or human life depends on having correct time or can be harmed by it being wrong, you shouldn't "just get it off the Internet". The NTP Pool is generally very high quality, but it is a service run by volunteers in their spare time. Please talk to your equipment and service vendors about getting local and reliable service setup for you. See also our terms of service. We recommend time servers from Meinberg, but you can also find time servers from End Run, Spectracom and many others.

Stop on first match.
end-of-list, no matches, deny.

An ACL to just count traffic should always end with

permit ip any any

Block a specific host

Necessary because the default action at the end is "deny any"

access-list 1 deny host 10.0.0.1
access-list 1 permit any

Allow a host range

This allows packets from 192.168.10.0/24 to travel to 192.168.200.0/24

access-list 101 permit ip 192.168.10.0 0.0.0.255 192.168.200.0 0.0.0.255

Deny access except from specific hosts

Usually required for features like CoPP

access-list 10 permit 10.0.0.1
access-list 10 permit 10.0.0.2
access-list 10 permit 10.0.0.3

References

https://www.cisco.com/c/en/us/support/docs/ip/access-lists/26448-ACLsamples.html

CoPP Configuration.

This was performed on an C8000v, running 17.13.1a

A simple ACL that matches based on ICMP.

ip access-list extended ACL_ICMP_UNKNOWN
 permit icmp any any

Make class-map to use the ACL.

class-map CLASS_MAP_ICMP_UNKNOWN
 match access-group name ACL_ICMP_UNKNOWN

Make a policy map that uses the above class-maps

policy-map POLICY_MAP_COPP
 class CLASS_MAP_ICMP_UNKNOWN
  police cir 10000 conform-action transmit  exceed-action drop
 class class-default

Apply it to the control plane.

control-plane
 service-policy input COPP-POLICY-MAP

Validate

router# show policy-map control-plane input 
 Control Plane 

  Service-policy input: POLICY_MAP_COPP

    Class-map: CLASS_MAP_RFC1918 (match-all)  
      0 packets, 0 bytes
      5 minute offered rate 0000 bps
      Match: access-group name ACL_RFC1918

    Class-map: CLASS_MAP_ICMP_UNKNOWN (match-all)  
      0 packets, 0 bytes
      5 minute offered rate 0000 bps, drop rate 0000 bps
      Match: access-group name ACL_ICMP_UNKNOWN
      police:
          cir 1000000 bps, bc 31250 bytes
        conformed 0 packets, 0 bytes; actions:
          transmit 
        exceeded 0 packets, 0 bytes; actions:
          drop 
        conformed 0000 bps, exceeded 0000 bps

    Class-map: class-default (match-any)  
      0 packets, 0 bytes
      5 minute offered rate 0000 bps, drop rate 0000 bps
      Match: any

Test Setup

This uses python3, scapy, and sendpfast, to send icmp packets with random sources.

Install sendpfast

sudo apt install tcpreplay

Start a python virtual environment.

python3 -m venv venv
source venv/bin/activate

Install scapy inside it.

pip install scapy

Modify then paste in the following python script.

dst iface

cat > flood.py << 'EOF'
from scapy.all import *
import random

def random_public_ip():
    while True:
        ip = f"{random.randint(1,223)}.{random.randint(0,255)}.{random.randint(0,255)}.{random.randint(1,254)}"
        if not (ip.startswith("10.") or 
                ip.startswith("192.168.") or 
                ip.startswith("172.") and 16 <= int(ip.split(".")[1]) <= 31):
            return ip

pkts = [Ether()/IP(src=random_public_ip(), dst="192.168.52.198")/ICMP() for _ in range(1000)]
sendpfast(pkts, pps=10000, loop=100, iface="ens18")
EOF

In a different terminal run something like this to see the packets leaving the interface.

sudo tcpdump -i ens18 icmp -n

This requires raw sockets to run.

sudo venv/bin/python3 flood.py

SA - Source Address
DA - Destination Adress

                      INSIDE NETWORK                                   OUTSIDE NETWORK

           ┌────────────────────────────────────┐         ┌──────────────────────────────────────┐
           │                                    │         │                                      │
           │       ┌────────────┬─────────────┐ │         │       ┌─────────────┬──────────────┐ │
           │ ────► │    SA      │     DA      │ │ ──────► │ ────► │    SA       │     DA       │ │
  ┌──────┐ │       │Inside Local│Outside Local│ │         │       │Inside Global│Outside Global│ │ ┌───────┐
  │Inside│ │       └────────────┴─────────────┘ │  ┌───┐  │       └─────────────┴──────────────┘ │ │Outside│
  │ Host │ │                                    │  │NAT│  │                                      │ │ Host  │
  └──────┘ │ ┌────────────┬─────────────┐       │  └───┘  │ ┌─────────────┬──────────────┐       │ └───────┘
           │ │    SA      │     DA      │       │         │ │    SA       │     DA       │       │
           │ │Inside Local│Outside Local│ ◄──── │ ◄────── │ │Inside Global│Outside Global│ ◄──── │
           │ └────────────┴─────────────┘       │         │ └─────────────┴──────────────┘       │
           │                                    │         │                                      │
           └────────────────────────────────────┘         └──────────────────────────────────────┘

Based on a diagram here.

NAT Overload - Port Address Translation or PAT

This is Source NAT.¹

Packets to R3 will appear to be from 10.0.0.2

          192.168.0.0/24             10.0.0.0/24        
┌────┐.1                 .2┌────┐.2             .1┌────┐
│ R1 │─────────────────────│ R2 │─────────────────│ R3 │
└────┘E0/0             E0/0└────┘E0/1         E0/1└────┘
                           ▲    ▲                       
                           │    │                       
           Inside ─────────┘    └─────── Outside

R1

interface Ethernet0/0
 ip address 192.168.1.1 255.255.255.0

ip route 0.0.0.0 0.0.0.0 192.168.1.2

R2

interface Ethernet0/0
 ip address 192.168.1.2 255.255.255.0
 ip nat inside

interface Ethernet0/1
 ip address 10.0.0.2 255.255.255.0
 ip nat outside

ip nat inside source list 1 interface Ethernet0/1 overload

ip access-list standard 1
 10 permit 192.168.1.0 0.0.0.255

R3

interface Ethernet0/1
 ip address 10.0.0.3 255.255.255.0

ip route 0.0.0.0 0.0.0.0 10.0.0.2

R2 Debugs during NAT

Performed with the above configs via CML IOL routers version 17.12.1.

R2# debug ip nat 1
IP NAT debugging is on for access list 1

*Sep 16 21:32:21.386: NAT: Entry assigned id 4
*Sep 16 21:32:21.386: NAT*: ICMP id=5->1024
*Sep 16 21:32:21.386: NAT*: s=192.168.1.1->10.0.0.2, d=10.0.0.3 [17]
*Sep 16 21:32:21.387: NAT*: ICMP id=1024->5
*Sep 16 21:32:21.387: NAT*: s=10.0.0.3, d=10.0.0.2->192.168.1.1 [17]

R2# show ip nat translations
Pro Inside global      Inside local       Outside local      Outside global
icmp 10.0.0.2:1024     192.168.1.1:5      10.0.0.3:5         10.0.0.3:1024

Source NAT, because the source address needs to be changed to access outside hosts. As packets move through the router, they will create entries for return packets.

Captured on-wire.

packet #1 - who has 10.0.6.10? Tell 10.0.0.20
packet #2 - 10.0.0.10 is at ce:b1:5f:58:1d:8a

ARP Request

> Ethernet II

    Destination: Broadcast (ff:ff:ff:ff:ff:ff)
    Source: 1a:20:4e:9e:fb:9c (1a:20:4e:9e:fb:9c)
    Type: ARP (0x0806)

> Address Resolution Protocol (request)

    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: 1a:20:4e:9e:fb:9c (1a:20:4e:9e:fb:9c)
    Sender IP address: 10.0.0.20
    Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Target IP address: 10.0.0.10

ARP Reply

> Ethernet II

    Destination: 1a:20:4e:9e:fb:9c (1a:20:4e:9e:fb:9c)
    Source: ce:b1:5f:58:1d:8a (ce:b1:5f:58:1d:8a)
    Type: ARP (0x0806)
    Padding: <lots of zeros>

> Address Resolution Protocol (reply)

    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (2)
    Sender MAC address: ce:b1:5f:58:1d:8a (ce:b1:5f:58:1d:8a)
    Sender IP address: 10.0.0.10
    Target MAC address: 1a:20:4e:9e:fb:9c (1a:20:4e:9e:fb:9c)
    Target IP address: 10.0.0.20

ARP Spoofing: happens when an attacker users a known MAC address on the network, usually the network router for the subnet.
ARP Poisoning: happens when ARP tables on devices (routers, switches, hosts) contain false mappings.

Successful ARP attacks lead to traffic hijacking, traffic denial, or man-in-the-middle attacks.

Dynamic ARP Inspection

Minimum config

ip dhcp snooping vlan 10
ip arp inspection vlan 10
ip arp inspection validate src-mac dst-mac ip 
!
! Ports
!
interface GigabitEthernet0/1
 description towards DHCP server
 ip arp inspection trust
 ip dhcp snooping trust

Validation

access-1# show ip dhcp snooping binding 
MacAddress          IpAddress        Lease(sec)  Type           VLAN  Interface
------------------  ---------------  ----------  -------------  ----  --------------------
52:54:00:0D:65:73   10.10.10.102     80574       dhcp-snooping   10    GigabitEthernet0/0
Total number of bindings: 1

access-1# show ip arp inspection 

Source Mac Validation      : Enabled
Destination Mac Validation : Enabled
IP Address Validation      : Enabled

 Vlan     Configuration    Operation   ACL Match          Static ACL
 ----     -------------    ---------   ---------          ----------
   10     Enabled          Active                         

 Vlan     ACL Logging      DHCP Logging      Probe Logging
 ----     -----------      ------------      -------------
   10     Deny             Deny              Off          

 Vlan      Forwarded        Dropped     DHCP Drops      ACL Drops
 ----      ---------        -------     ----------      ---------
   10            134              0              0              0

 Vlan   DHCP Permits    ACL Permits  Probe Permits   Source MAC Failures
 ----   ------------    -----------  -------------   -------------------
   10             48              0              0                     0

 Vlan   Dest MAC Failures   IP Validation Failures   Invalid Protocol Data
 ----   -----------------   ----------------------   ---------------------
          
 Vlan   Dest MAC Failures   IP Validation Failures   Invalid Protocol Data
 ----   -----------------   ----------------------   ---------------------
   10                   0                        0                       0

Reference

Cisco - Dynamic ARP Inspection

Practical Networking - Gratuitous ARP

Assured Forwarding PHB Group

Four AF classes, each should get it's own resources.

AF11 (DSCP 10) 001010 AF12 (DSCP 12) 001100 AF13 (DSCP 14) 001110

AF21 (DSCP 18) 010010 AF22 (DSCP 20) 010100 AF23 (DSCP 22) 010110

AF31 (DSCP 26) 011010 AF32 (DSCP 28) 011100 AF33 (DSCP 30) 011110

AF41 (DSCP 34) 100010 AF42 (DSCP 36) 100100 AF43 (DSCP 38) 100110

Terms

1 second, is 1000 ms.
1 millisecond: Network latency is measured in ms, or 1 thousandth of a second 0.001.
1 microsecond: 1 μs (a millionth) of a second. 0.000 001. 1000 μs is 1 ms.
1 nanosecond: 1 ns (a billionth) of a second. 0.000 000 001. 1000 ns is 1 μs.
NTP: An older time standard. Can sync time between 10 to 1 ms.
PTP: Modern time standard. Can sync time between 10 to 1 ns.
PTPv1: - Defined in IEEE 1588-2002
PTPv2: - Defined in IEEE 1588-2008, not backwards compatible.
PTPv2.1: - Defined in IEEE 1588-2019, is backward compatible.
1588 Clock: A clock in the PTP time domain. Clocks have ports.
Terminating Clock: A clock with one port.
Ordinary Clock: a clock in a terminating device.
Boundary Clock:: a clock in a transmitting device, like an ethernet switch. Connects PTP domains.
Transparent Clock: a boundary clock that can correct for delay and modifies the PTP event message.
Grandmaster: All clocks sync to this one clock.
Master: All clocks in a subdomain sync to the master. The master sync's to the grand master.

Time Terms

Epoch: The start of time.
Offset: The estimated time between a master clock sending time, and a slave clock receiving it.

Uses

Robotics, synchronizing movements.
Mobile Phone networks, telemetry, billing, logging
Financial Networks, trade settling fairness.
Power Networks, to sync to the 60hz grid.
Science network, seismic data

Process

After PTP has time from something like a GPS device, it can pass that time along, so long as the devices in the path can mark and read the timestamps

PTP Delay and Offset Calculations

General Messages

Announce: Used to determine which Grand Master is selected Best Master
Follow_Up: Used to convey a captured timestamp of a transmitted SYNC message
Delay_Response: Used to measure delay between IEEE 1588 devices
Pdelay_Response_Follow_Up: Used between IEEE 1588 devices to measure the delay on an incoming link
Management: Used between management devices and clocks
Signaling: Used by clocks to deliver how messages are sent

Event Messages

Sync: Used to convey time
Delay_Request: Used to measure delay from downstream devices
Pdelay_Request: Used to initiate and measure delay
Pdelay_Response: Used to respond and measure delay

SyncE synchronizes clock frequency over an Ethernet port. It does not synchronize time-of-day, that's done by PTP, IEEE 1588.

Setting as oscillator to a frequency is syntonization.

References

ITU-T Rec. G.8261 - Architecture and the wander performance of SyncE networks

ITU-T Rec. G.8262 - Synchronous Ethernet clocks for SyncE

ITU-T Rec. G.8264 - Ethernet Synchronization Messaging Channel (ESMC)

Config Options

ITU-T G.813 Option 1 clock (QL-SEC)

EEC-option 1

ITU-T G.812 type IV clock (QL-ST3)

EEC-option 2

Terms

Synchronous Ethernet and IEEE 1588 in Telecoms

Time Interval: Distance between two events, (measured in seconds), milliseconds, microseconds, nanoseconds, picoseconds
Frequency: Rate of a repetitive event. Measured in cyles per second. A device that produces frequency is an oscilator.
T0: System Clock (line interface output)
T1: Timing Reference signal derived from STM-N (STS-N/SyncE) input.
T2: Timing Reference signal derived from 2048/1544 kbit input [input from PDH]
T3: Timing reference signal derived from 2048 or 2048 1544 with SSM.
T4: Clock-interface output.
OSC: Internal ST3 oscillator
SSM: Synchronization Status Message
ESMC: Ethernet Synchronization Message Channel
MTIE: Maximum time interval error is a measure of the worst case phase variation of a signal with respect to a perfect signal over a given period of time.
TDEV: Time deviation is a statistical analysis of the phase stability of a signal over a given period of time.

Netflow v5 - v4 flows only v9 - template based IPFIX

Flexible Netflow

Netflow needs four things to work:

Records
Exporters
Monitors
Interfaces

IOS-XE

flow record FLOW_RECORD_IPV4
 match ipv4 protocol
 match ipv4 source address
 match ipv4 destination address
 match transport source-port
 match transport destination-port
 match interface input
 collect interface output
 collect counter bytes long
 collect counter packets long
 collect timestamp sys-uptime first
 collect timestamp sys-uptime last
!
flow exporter FLOW_EXPORTER
 !
 ! IPFix is standards based netflow.
 !
 export-protocol ipfix
 destination 10.0.52.100
 source GigabitEthernet2
 transport udp 2055
 template data timeout 60
!
flow monitor FLOW_MONITOR_IPV4
 exporter FLOW_EXPORTER
 cache timeout active 60
 record FLOW_RECORD_IPV4
!
interface GigabitEthernet1
 ip flow monitor FLOW_MONITOR_IPV4 input
 ip flow monitor FLOW_MONITOR_IPV4 output

IOS-XR

flow exporter-map EXPORTER_MAP_1
version v9
options interface-table
template data timeout 600
!
dscp 48
transport udp 2055
source Loopback1
destination <IP 1>
!
flow monitor-map MONITOR_MAP_INTERNET
  record ipv4
  exporter EXPORTER_MAP_1
  cache timeout active 60
  cache timeout inactive 5
!
sampler-map SAMPLER_MAP_INTERNET
  random 1 out-of 500
!
interface ten 1/1
  flow ipv4 monitor MONITOR_MAP_INTERNET sampler SAMPLER_MAP_INTERNET ingress
  flow ipv4 monitor MONITOR_MAP_INTERNET sampler SAMPLER_MAP_INTERNET egress

Lab validations

R1# show flow monitor FLOW_MONITOR_IPV4 statistics 
  Cache type:                               Normal (Platform cache)
  Cache size:                               200000
  Current entries:                               4
  High Watermark:                                4

  Flows added:                                   8
  Flows aged:                                    4
    - Active timeout      (    60 secs)          4


R1# show flow monitor FLOW_MONITOR_IPV4 cache sort highest counter bytes long top 10 format table
Processed 3 flows
Aggregated to 3 flows
Showing the top 3 flows

IPV4 SRC ADDR    IPV4 DST ADDR    TRNS SRC PORT  TRNS DST PORT  INTF INPUT            IP PROT  intf output                     bytes long             pkts long    time first     time last
===============  ===============  =============  =============  ====================  =======  ====================  ====================  ====================  ============  ============
10.0.10.101      10.0.20.101              48640           5000  Gi4                        17  Gi1                                 334100                   325  20:37:12.210  20:37:44.424
10.0.12.2        224.0.0.5                    0              0  Gi1                        89  Null                                   600                     6  20:36:54.026  20:37:41.568
10.0.12.1        224.0.0.5                    0              0  Null                       89  Gi1                                    600                     6  20:36:52.808  20:37:38.836

Commands

show chassis detail

show chassis rmi

Lightweight Modes

Client-Serving AP Modes

Local: This is the default mode. A local mode AP tunnels all client traffic, for all WLANs, in CAPWAP, to the controller. In this mode, the AP’s radios are operational only when the AP is connected to its controller. Local mode APs do not support mesh operation. All AP models support Local mode.
FlexConnect: In this mode, client traffic can either be tunneled in CAPWAP to the controller, or egress at the AP’s LAN port, depending on the WLAN configuration. FlexConnect mode APs do not support mesh operation. All models support FlexConnect mode.
Bridge and Flex+Bridge: These modes are used in mesh deployments, where wireless rather than wired backhaul is used for CAPWAP connectivity. Not all AP models support these modes; see the relevant mesh documentation for information about support for mesh operation.

Network Management AP Modes

Monitor: In this mode, the AP radios are dedicated to monitoring the Wi-Fi channel for RRM and rogue detection. All AP models support this mode.
Rogue Detector: In this mode, the AP radios are disabled; the AP monitors the LAN to detect on-wire rogue activity. This mode is not supported on Cisco Wave 2 or 802.11ax APs and is deprecated.
Sniffer: In this mode, the AP radio operates in promiscuous mode and captures all Wi-Fi traffic on a channel. These packets are tunneled in CAPWAP to the controller, which forwards them to a machine running OmniPeek or Wireshark for storage and analysis.
SE-Connect: In this mode, the AP provides a dedicated connection to CleanAir for spectrum analysis by software such as Spectrum Expert or Chanalyzer. SE-Connect mode is supported only on SE models with CleanAir.

Cisco Wireless Controller Configuration Guide, Release 8.10

Basic Ansible

This was done on a home lab running Debian 11. tesseract is my control-node.

Add Ansible to Sources list
Update the OS Sources
Install Ansible
Create SSH keys
Tell Ansible to use ssh-agent so you don't have to retype passwords
Use Ansible to copy the controle node SSH key to the ansible hosts
Use an Ansible playbook to ping the devices
Use an Ansible playbook to upgrade the devices

Add Ansible to Sources list

$ echo "deb http://ppa.launchpad.net/ansible/ansible/ubuntu focal main" | sudo tee /etc/apt/sources.list.d/ansible.list
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 93C4A3FD7BB9C367
$ sudo apt update

Install Ansible

$ sudo apt install ansible

Define hosts, Create Host file

Do not put special characters (like -) into the group names. Hosts should be FQDNs.

ariadne@tesseract:~/ansible$ cat /etc/ansible/hosts 
[proxmox]
<hosts redacted>

[docker]
<hosts redacted>

[k8s]
<hosts redacted>

[linux]
<hosts redacted>

Define Defaults, Modify ansible.cfg

ariadne@tesseract:/etc/ansible$ cat ansible.cfg 
# [output omitted]

[defaults]
host_key_checking = False
remote_user = ariadne

Create a public SSH key to allow passwordless access

I'm using an internal linux host called tesseract. It doesn't use a password, it's a home lab.

ariadne@tesseract:~$ ssh-keygen -t rsa -b 4096 -C "ariadne@tesseract.haske.org"

Write a playbook to copy the SSH keys

ariadne@tesseract:~/ansible$ cat copy_ssh_keys_test.yml 
---
- name: Copy SSH key to hosts
  hosts: all
  become: yes

  tasks:
  - name: Set authorized key taken from file
    authorized_key:
      user: ariadne
      state: present
      key: "{{ lookup(file, /home/ariadne/.ssh/id_rsa.pub) }}"

Run it

ariadne@tesseract:~/ansible$ ansible-playbook -k copy_ssh_keys.yml 
SSH password: 

PLAY [Copy SSH key to hosts] ***********************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *****************************************************************************************************************************************************************************************************************************************
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]

TASK [Set authorized key taken from file] **********************************************************************************************************************************************************************************************************************
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
ok: [hosts-redacted]
changed: [hosts-redacted]
changed: [hosts-redacted]
changed: [hosts-redacted]
changed: [hosts-redacted]
changed: [hosts-redacted]
changed: [hosts-redacted]
changed: [hosts-redacted]
changed: [hosts-redacted]

PLAY RECAP *****************************************************************************************************************************************************************************************************************************************************
hosts.redacted    : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
hosts.redacted    : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Write a Playbook to Upgrade Everything

ariadne@tesseract:~/ansible$ cat upgrade-everything.yml 
---
- name: Update and upgrade apt packages
  hosts: all
  become: true
  tasks:
    - name: Update apt cache and upgrade all packages
      apt:
        upgrade: yes
        update_cache: yes
        cache_valid_time: 86400 #One day

Sources

https://docs.ansible.com/ansible/latest/installation_guide/installation_distros.html#installing-ansible-on-debian https://docs.ansible.com/ansible/latest/inventory_guide/connection_details.html

Have a valid user with AAA new-model turned on

conf t
aaa new-model
aaa authentication login default local
aaa authorization exec default local
username admin privilege 15 secret cisco123

Restconf

RESTCONF uses HTTP or HTTPS, so turn on the webserver

conf t
ip http secure-server

Turn on RESTCONF

conf t
restconf

Validate

RESTCONF relies on DMI and nginx

restconf-router# show platform software yang-management process
confd            : Running    
nesd             : Running    
syncfd           : Running    
ncsshd           : Running    
dmiauthd         : Running    
nginx            : Running    
ndbmand          : Running    
pubd             : Running

Get an IP Address

This is done from the linux commandline via curl

--insecure is added because Cisco generates it's own self-signed certificates.

ariadne@tesseract:~$ curl --insecure --user admin:cisco123 \
   -H "Accept: application/yang-data+json" \
   https://192.168.52.199/restconf/data/Cisco-IOS-XE-native:native/interface/Loopback=0

{
  "Cisco-IOS-XE-native:Loopback": {
    "name": 0,
    "ip": {
      "address": {
        "primary": {
          "address": "1.1.1.1",
          "mask": "255.255.255.255"
        }
      }
    }
  }
}

Set an IP Address

Also done from the linux commandline via curl, just with a PATCH message.

ariadne@tesseract:~$ curl --insecure --user admin:cisco123 \
   -X PATCH \
   -H "Accept: application/yang-data+json" \
   -H "Content-Type: application/yang-data+json" \
   https://192.168.52.199/restconf/data/Cisco-IOS-XE-native:native/interface/Loopback=0 \
   -d '{
     "Cisco-IOS-XE-native:Loopback": {
       "name": 0,
       "ip": {
         "address": {
           "primary": {
             "address": "2.2.2.2",
             "mask": "255.255.255.255"
           }
         }
       }
     }
   }'

Use NETCONF-YANG

Ensure a Valid user with AAA new-model is turned on, and available (see above)
Turn on NETCONF-YANG

conf t
netconf-yang

Validate

restconf-router#show netconf-yang status 
netconf-yang: enabled
netconf-yang ssh port: 830
netconf-yang candidate-datastore: disabled

I performed this lab inside a linux virtual environment.

Load a python virtual environment

python3 -m venv ~/netconf-lab

Activate it

source ~/netconf-lab/bin/activate

Install ncclient

pip install ncclient

Enter the python shell

python

Connect to device:

>>> conn = manager.connect(
    host="192.168.52.199",
    port=830,
    username="admin",
    password="cisco123",
    hostkey_verify=False,
    device_params={"name": "iosxe"}
)

Paste in a payload, follow the XML

>>> payload = """
<config>
  <native xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-native">
    <interface>
      <Loopback>
        <name>5</name>
        <ip>
          <address>
            <primary>
              <address>5.5.5.5</address>
              <mask>255.255.255.255</mask>
            </primary>
          </address>
        </ip>
      </Loopback>
    </interface>
  </native>
</config>
"""
>>> conn.edit_config(target="running", config=payload)
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="urn:uuid:5edcd8ca-3e51-4581-8bce-87f7eb939735" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0"><ok/></rpc-reply>

Reference

Programmability Configuration Guide, Cisco IOS XE 17.17.x

Terms

Term	Definition
MR-APS	inter-chassis APS.
APS	Automatic Protection Switching for POS
UNI	User Network Interface
NNI	Network Node Interface
Interworking	Getting L2 information from Ethernet to work over Sonet or frame relay.
STE	Section Terminating Equipment
LTE	Line terminating equipment
PTE	Path terminating equipment
POH	Path overhead - This layer represents end-to-end status.
LOH	Line overhead - Typically major nodes in SONET like ADMs
SOH	Section overhead - Optical regenators
SPE	Synchronous payload envelope
BIP	Bit Interleaved Parity
FEBE	Far End Block Error

Sonet

Path Payloads must match. Check Scrambling.

Network elements are expected to terminate and understand their layer, and layer overhead

If a SONET reciever at the Line level counts a BIP, it returns it to sender. The sender increments the line FEBE

It's been a while, the below might be wrong.

+-------------------------------------------------- PATH -------------------------------------------------+
|                                                                                                         |
|                                                                                                         |
|   +--------------- LINE --------------------+            +------------------ LINE-------------------+   |
|   |                                         |            |                                          |   |
v   v                                         v            v                                          v   v

+---+      +------------+       +-----+       +------------+      +-----+       +------------+        +---+
|CPE|------|Terminal    |-------|Regen|-------|Add/Drop    |------|Regen|-------|Terminal    |--------|CPE|
+---+ DS-n | Multiplexer| OC-N  +-----+ OC-N  | Multiplexer| OC-N +-----+ OC-N  | Multiplexer|  DS-n  +---+
           +------------+                     +------------+                    +------------+

    ^      ^            ^       ^     ^       ^            ^      ^     ^       ^            ^        ^
    |      |            |       |     |       |            |      |     |       |            |        |
    +------+            +-------+     +-------+            +------+     +-------+            +--------+
    SECTION              SECTION       SECTION             SECTION       SECTION              SECTION

C2 Byte

C2 Defines the SONET payload

An old note, probably from a standard document.

The SONET standard defines the C2 byte as the path signal label. The purpose of this byte 
is to communicate the payload type that the SONET Framing OverHead (FOH) encapsulates. 
The C2 byte functions similar to Ethertype and Logical Link Control (LLC)/Subnetwork 
Access Protocol (SNAP) header fields on an Ethernet network. The C2 byte allows a single
interface to transport multiple payload types simultaneously.

This table lists common values for the C2 byte:

Hex Value	SONET Payload Contents
00	Unequipped.
01	Equipped - non-specific payload.
02	Virtual Tributaries (VTs) inside (default).
03	VTs in locked mode (no longer supported).
04	Asynchronous DS3 mapping.
12	Asynchronous DS-4NA mapping.
13	Asynchronous Transfer Mode (ATM) cell mapping.
14	Distributed Queue Dual Bus (DQDB) cell mapping.
15	Asynchronous Fiber Distributed Data Interface (FDDI) mapping.
16	IP inside Point-to-Point Protocol (PPP) with scrambling.
CF	IP inside PPP without scrambling.
E1- FC	Payload Defect Indicator (PDI).
FE	Test signal mapping (see ITU Rec. G.707).
FF	Alarm Indication Signal (AIS).

An Example:

Framing: SONET
SPE Scrambling: Enabled
C2 State: Stable   C2_rx = 0xCF (207)   C2_tx = 0x16 (22) / Scrambling Derived
S1S0(tx): 0x0  S1S0(rx): 0x2 / Framing Derived

Monitoring at each Network Element is usually helpful

POS - Spawned interface from SONET controller.

controller SONET0/2/0/0

clock source internal

Sonet YELLOW is RDI (Remote Defect indication)

Packet Over Sonet

Document: Troubleshooting Bit Error on SONET Links
URL: http://www.cisco.com/en/US/tech/tk482/tk607/technologies_tech_note09186a0080094a79.shtml
Section: When Do Particular BIP Errors Occur?

In addition, you must understand that BIP errors have different error detection resolutions, which are explained here:

B1: B1 can detect up to eight parity errors per frame. This level of resolution is not acceptable at OC-192 rates. Even-numbered errors can elude the parity check on links with high error rates.

B2: B2 can detect a far higher number of errors per frame. The exact number increases as the number of STS-1s (or STM-1s) increases in the SONET frame. For example, an OC-192/STM-64 produces a 192 x 8 = 1536 bit-wide BIP field. In other words, B2 can count up to 1536 bit errors per frame. There is considerably less chance of an even-numbered error that eludes the B2 parity calculation. B2 offers superior resolution when compared to B1 or B3. Therefore, a SONET interface can report B2 errors only for a particular monitored segment.

B3: B3 can detect up to eight parity errors in the entire SPE. This number produces acceptable resolution for a channelized interface because, (for example) each STS-1 in an STS-3 has a path overhead and B3 byte. However, this number produces poor resolution over concatenated payloads in which a single set of path overhead must cover a relatively large payload frame.

Packet over SONET commands

Displays information about the automatic protection switching feature

show aps

Displays information about the hardware

show controller sonet slot/port-adapter/port

Displays information about the interface

show controllers pos

G709

G709 is an optical specification that is specifcially designed for FEC (Forward Error correction) It uses Reed-Solomon to produce redundant information that can be used to rebuild the frame.

OTU - Optical channel Transport Unit
ODU - Optical channel Data Unit
OPU - OPtical channel Payload Unit

SRP - Spatial Reuse protocol

This is used for fiber rings, its where the destination nodes pulls the info from the ring so it doesn't loop endlessly.

Like taken from a standards document someplace

Spatial Reuse Protocol (SRP) is a media-independent MAC layer protocol that operates over two counterrotating
fiber-optic rings. The dual rings provide survivability of data in case of a failed node or a break in 
connecting cables by rerouting the data path over the alternate ring. SRP provides a more efficient use of 
bandwidth by having packets traverse only the part of the ring necessary to get to the destination node. Once
the packet has reached the destination node, it is removed from the ring, allowing other parts of the ring
to reuse the bandwidth. Data packets travel on one ring, while associated control packets travel in the opposite
direction on the alternate ring, ensuring that the data takes the shortest path to its destination.

RPR - Resilient Packet Ring

802.17

Steering - Nodes are told the affected node is down and don't include it.
Wrapping - The node closest to the break route the traffic on the other direction of the ring.

Side A Always connects to Side B.

Example of a working connection.

Node2# show controller srp 4/0
SRP4/0 - Side A (Outer RX, Inner TX)
SECTION
  LOF = 0          LOS    = 0                            BIP(B1) = 3
LINE
  AIS = 0          RDI    = 0          FEBE = 36599      BIP(B2) = 46
PATH
  AIS = 0          RDI    = 0          FEBE = 4440       BIP(B3) = 26
  LOP = 0          NEWPTR = 0          PSE  = 0          NSE     = 0

Active Defects: None
Active Alarms:  None
Alarm reporting enabled for: SLOS SLOF PLOP 

Framing           : SONET
Rx SONET/SDH bytes: (K1/K2) = 0/0        S1S0 = 0  C2 = 0x16
Tx SONET/SDH bytes: (K1/K2) = 0/0        S1S0 = 0  C2 = 0x16  J0 = 0x1 
Clock source      : Internal
Framer loopback   : None
Path trace buffer : Stable 
  Remote hostname : Node1
  Remote interface: SRP4/0
  Remote IP addr  : <removed>
  Remote side id  : B
BER thresholds:           SF = 10e-3  SD = 10e-6
IPS BER thresholds(B3):   SF = 10e-3  SD = 10e-6
TCA thresholds:           B1 = 10e-6  B2 = 10e-6  B3 = 10e-6

SRP4/0 - Side B (Inner RX, Outer TX)
SECTION
LOF = 0          LOS    = 0                            BIP(B1) = 65535
LINE
AIS = 0          RDI    = 0          FEBE = 65535      BIP(B2) = 65535
PATH
AIS = 0          RDI    = 0          FEBE = 65535      BIP(B3) = 65535
LOP = 0          NEWPTR = 3          PSE  = 0          NSE     = 0
Active Defects: None
Active Alarms:  None
Alarm reporting enabled for: SLOS SLOF PLOP 
Framing           : SONET
Rx SONET/SDH bytes: (K1/K2) = 0/0        S1S0 = 0  C2 = 0x16
Tx SONET/SDH bytes: (K1/K2) = 0/0        S1S0 = 0  C2 = 0x16  J0 = 0x1 
Clock source      : Internal
Framer loopback   : None
Path trace buffer : Stable 
Remote hostname : Node3
Remote interface: SRP4/0
Remote IP addr  : <removed>
Remote side id  : A
BER thresholds:           SF = 10e-3  SD = 10e-6
IPS BER thresholds(B3):   SF = 10e-3  SD = 10e-6
TCA thresholds:           B1 = 10e-6  B2 = 10e-6  B3 = 10e-6

References

SONET Primer

T1 Framing

D4 Frame is 24 timeslots + framing bit.

100011011100

Ethernet II --  14 octets.
MPLS        --   4 octets.
CESoPSN     --   4 octets.
TDM Payload -- 192 octets.

Each Ethernet II frame takes up 1712 bits on the wire.

T1 Channel Associated Signaling (CAS) [Used for voice]
    Every 6th frame will have all the lowest order bits stolen on each channel for signaling information.
    Super Framing does this 6 (A bit), 12 (B bit), 18 (A bit), 24 (B bit)
    Extended Super Framing does this but makes four bits. A, B, C, D

Link Down

On RX

175 contigouse pulse positions with no positive or negative polarity.

On TX

Sends yellow alarm Far End Alarm
Next device downstream gets a blue alarm

On this device marks the link as T1 LOS Loss of Signal.

T1 Clocking Types

Command	Description
`clock source line`	derive reference from external device.
`clock source internal`	use local PLL for reference.
`network-clock-participate`	join the TDM backplane of the router.
`network-clock-select`	Tells the TDM backplane to use certain T1 as a reference clock, and share it.

network-clock-select requires a T1 line to be in clock source line mode.

network-clock-participate is required for network-clock-select

Mainboard voice DSPs MUST use the backplane clock. They can't opt out.

All network-clock-participate devices share the same clocking-domain.

T1 Clocking Information

T1 reads from RX and TX buffers at the clock rate. Slips are reported when data is read at the wrong clock. Sometimes it might sample the same bit twice, sometimes it might miss bits completely.

UDP Packet Format

User Datagram Protocol - RFC 768

UDP does try to send error-free packets by including a checksum, the below via the RFC

Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data, padded with zero octets at the end (if necessary) to make a multiple of two octets.

...

If the computed checksum is zero, it is transmitted as all ones (the equivalent in one's complement arithmetic). An all zero transmitted checksum value means that the transmitter generated no checksum (for debugging or for higher level protocols that don't care).

 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
┌────────────────────────────────┬───────────────────────────────┐
│          Source Port           │       Destination Port        │
├────────────────────────────────┼───────────────────────────────┤
│            Length              │           Checksum            │
├────────────────────────────────┴───────────────────────────────┘
│          Data Octets
└────────────────────────────────►

TFTP Read Request

Frame 115: 69 bytes on wire (552 bits), 69 bytes captured (552 bits) on interface -, id 0
    Internet Protocol Version 4, Src: 10.0.10.22, Dst: 10.0.10.33
    User Datagram Protocol, Src Port: 52775, Dst Port: 69
        Source Port: 52775
        Destination Port: 69
        Length: 31
        Checksum: 0x4aed [correct]
        [Checksum Status: Good]
        [Stream index: 0]
        [Timestamps]
        UDP payload (23 bytes)
    Trivial File Transfer Protocol
        Opcode: Read Request (1)
        Source File: startup-config
        Type: octet

TFTP Data Packet

Frame 116: 562 bytes on wire (4496 bits), 562 bytes captured (4496 bits) on interface
    Internet Protocol Version 4, Src: 10.0.10.33, Dst: 10.0.10.22
    User Datagram Protocol, Src Port: 52590, Dst Port: 52775
        Source Port: 52590
        Destination Port: 52775
        Length: 524
        Checksum: 0xde83 [correct]
        [Checksum Status: Good]
        [Stream index: 1]
        [Timestamps]
        UDP payload (516 bytes)
    Trivial File Transfer Protocol
        Opcode: Data Packet (3)
        [Destination File: startup-config]
        [Read Request in frame 115]
        Block: 1
        [Full Block Number: 1]
    Data (512 bytes)
    
    0000  0a 21 0a 21 20 4c 61 73 74 20 63 6f 6e 66 69 67   .!.! Last config
    0010  75 72 61 74 69 6f 6e 20 63 68 61 6e 67 65 20 61   uration change a
    0020  74 20 30 35 3a 31 31 3a 31 35 20 55 54 43 20 53   t 05:11:15 UTC S
    0030  61 74 20 4a 75 6c 20 38 20 32 30 32 33 0a 21 0a   at Jul 8 2023.!.
    0040  76 65 72 73 69 6f 6e 20 31 35 2e 32 0a 73 65 72   version 15.2.ser
    0050  76 69 63 65 20 74 69 6d 65 73 74 61 6d 70 73 20   vice timestamps 
    0060  64 65 62 75 67 20 64 61 74 65 74 69 6d 65 20 6d   debug datetime m
    0070  73 65 63 0a 73 65 72 76 69 63 65 20 74 69 6d 65   sec.service time
    0080  73 74 61 6d 70 73 20 6c 6f 67 20 64 61 74 65 74   stamps log datet
    0090  69 6d 65 20 6d 73 65 63 0a 6e 6f 20 73 65 72 76   ime msec.no serv
    00a0  69 63 65 20 70 61 73 73 77 6f 72 64 2d 65 6e 63   ice password-enc
    00b0  72 79 70 74 69 6f 6e 0a 73 65 72 76 69 63 65 20   ryption.service 
    00c0  63 6f 6d 70 72 65 73 73 2d 63 6f 6e 66 69 67 0a   compress-config.
    00d0  21 0a 68 6f 73 74 6e 61 6d 65 20 53 57 33 0a 21   !.hostname SW3.!
    00e0  0a 62 6f 6f 74 2d 73 74 61 72 74 2d 6d 61 72 6b   .boot-start-mark
    00f0  65 72 0a 62 6f 6f 74 2d 65 6e 64 2d 6d 61 72 6b   er.boot-end-mark
    0100  65 72 0a 21 0a 21 0a 6c 6f 67 67 69 6e 67 20 64   er.!.!.logging d
    0110  69 73 63 72 69 6d 69 6e 61 74 6f 72 20 45 58 43   iscriminator EXC
    0120  45 53 53 20 73 65 76 65 72 69 74 79 20 64 72 6f   ESS severity dro
    0130  70 73 20 36 20 6d 73 67 2d 62 6f 64 79 20 64 72   ps 6 msg-body dr
    0140  6f 70 73 20 45 58 43 45 53 53 43 4f 4c 4c 20 0a   ops EXCESSCOLL .
    0150  6c 6f 67 67 69 6e 67 20 62 75 66 66 65 72 65 64   logging buffered
    0160  20 35 30 30 30 30 0a 6c 6f 67 67 69 6e 67 20 63    50000.logging c
    0170  6f 6e 73 6f 6c 65 20 64 69 73 63 72 69 6d 69 6e   onsole discrimin
    0180  61 74 6f 72 20 45 58 43 45 53 53 0a 21 0a 6e 6f   ator EXCESS.!.no
    0190  20 61 61 61 20 6e 65 77 2d 6d 6f 64 65 6c 0a 21    aaa new-model.!
    01a0  0a 21 0a 21 0a 21 0a 21 0a 6e 6f 20 69 70 20 69   .!.!.!.!.no ip i
    01b0  63 6d 70 20 72 61 74 65 2d 6c 69 6d 69 74 20 75   cmp rate-limit u
    01c0  6e 72 65 61 63 68 61 62 6c 65 0a 21 0a 21 0a 21   nreachable.!.!.!
    01d0  0a 6e 6f 20 69 70 20 64 6f 6d 61 69 6e 2d 6c 6f   .no ip domain-lo
    01e0  6f 6b 75 70 0a 69 70 20 63 65 66 0a 6e 6f 20 69   okup.ip cef.no i
    01f0  70 76 36 20 63 65 66 0a 21 0a 21 0a 21 0a 73 70   pv6 cef.!.!.!.sp

Alpine Hosts

hostname pc-20
ip link set dev eth0 up
ip address add 10.0.20.20/24 dev eth0
ip route add default via 10.0.20.1

iperf

Server iperf --port 2000 --server

Client iperf --port 2000 --client 10.0.0.1 --num 10k --reverse --udp

CML On Proxmox

... seems to work fine!

If you have enterprise CML, there is a front network and a back network.

The back network uses ipv6 link-local addresses which do not play well with Proxmox port channels and vlan tags.

It seems much safer to have a dedicated port for the back network.

Subnet with fingers

I just memorize these sequences, ungainly, but works.

Decimal masks - 128, 192, 224, 240, 248, 252, 254, 255

Wildcard masks - 127, 63, 31, 15, 7, 3, 1, 0

RFC 791 - Classful Networking

Early Internet addressing (1980s) the IP itself indicated the subnet mask, by using the High Order bits. There were only three network sizes.

/8 - Address starts with 0-127 - 128 networks

/16 - Address starts with 128-191 - 65,536 networks

/24 - Address starts with 192-223 - 16,777,216 networks

In the long ago, the hope was to use the first few bits of an address to tell the subnet mask. Even though we never do this in the modern era a few parts of classful networking are still here.

/24 is a very popular prefix
/16 is a very popular prefix
All multicast addresses start with 1110

Internet Protocol
Specification

  Addressing

    To provide for flexibility in assigning address to networks and
    allow for the  large number of small to intermediate sized networks
    the interpretation of the address field is coded to specify a small
    number of networks with a large number of host, a moderate number of
    networks with a moderate number of hosts, and a large number of
    networks with a small number of hosts.  In addition there is an
    escape code for extended addressing mode.

    Address Formats:

      High Order Bits   Format                           Class
      ---------------   -------------------------------  -----
            0            7 bits of net, 24 bits of host    a
            10          14 bits of net, 16 bits of host    b
            110         21 bits of net,  8 bits of host    c
            111         escape to extended addressing mode

RFC1918 Dungeons

These are the most famous IPv4 networks.

RFC 1918        Address Allocation for Private Internets   February 1996

3. Private Address Space

   The Internet Assigned Numbers Authority (IANA) has reserved the
   following three blocks of the IP address space for private internets:

     10.0.0.0        -   10.255.255.255  (10/8 prefix)
     172.16.0.0      -   172.31.255.255  (172.16/12 prefix)
     192.168.0.0     -   192.168.255.255 (192.168/16 prefix)

   We will refer to the first block as "24-bit block", the second as
   "20-bit block", and to the third as "16-bit" block. Note that (in
   pre-CIDR notation) the first block is nothing but a single class A
   network number, while the second block is a set of 16 contiguous
   class B network numbers, and third block is a set of 256 contiguous
   class C network numbers.

IP Protocol Numbers

When IP encapsulates another protoctol it labels the protoctol field with a number to define the next layer.

IP Protocol Number	Description
1	ICMP
2	IGMP
6	TCP
17	UDP
46	RSVP
47	GRE
51	ESP (IPSec)
51	AH (IPSec)
69	TFTP
88	EIGRP
89	OSPF
103	PIM
112	VRRP
115	L2TP
161	SNMP
162	TRAPS

Protocol Numbers - IANA

Cisco Administrative Distance

Protocol	Administrative Distance
Connected	0
Static	1
EIGRP Summary	5
eBGP	20
EIGRP Internal	90
OSPF	110
IS-IS	115
RIP	120
ODR	160
EIGRP External	170
iBGP	200
Unknown/Infinite¹	255

Troubleshooting TechNotes - What is Administrative Distance? - Cisco

Can use to do route-filtering.

IO Pathways

Device controller tells the CPU it's done (put data into a buffer) by sending an interrupt.

IO goes from controller - local buffer - CPU

Interrupts

Hardware interrupts

A buffer has been filled

Traps or exceptions are software generated interrupts

User requests
Errors

Most operating systems are interrupt driven.

Storage Structures

Main Memory (DRAM)

Random Access
Lost with power outage (volatile)

Secondary Storage

Larger
Not lost with power outage (non-volatile)

Caching

Copying data from secondary storage to main memory

Faster

Storage Hierarchy Registers > cache > main memory (dram) > solid-state disks > spinning disks > optical disks > magnetic tapes.

Direct Memory Access (DMA)

Some amount of DRAM is owned directly by an IO controller, and uses the DRAM for the buffer. When done, the IO controller sends an interrupt.

Processing

Asymmetric - each processor does a specific task.
Symmetric - each processor performs all tasks.

Multithreading

While one thread is asking for memory, execute the other thread. Go back and forth.

Dual Mode

User mode and Kernel mode, with a mode bit. Kernel mode is also called privileged.

System Calls

System calls are how user mode apps interact with the kernel. APIs are provided facilities to access the kernel without using system calls (which may not be allowed)

Win32 for Windows
POSIX API (Unix, Linux, Mac OS X)
Java API for Java Virtual Machine (JVM)

Load Averages

Windows will show a percentage of CPU. Linux systems instead show the number of processes waiting to acces the CPU. It can get to double digits.

Threading

A single-thread process has a program counter that says "go here to read the next instruction please"

Memory Management

Copying from storage into dram, into cache. Only stuff in L1 cache can be executed.

           0.5 ns - CPU L1 dCACHE reference
           1   ns - speed-of-light (a photon) travel a 1 ft (30.5cm) distance
           5   ns - CPU L1 iCACHE Branch mispredict
           7   ns - CPU L2  CACHE reference
          71   ns - CPU cross-QPI/NUMA best  case on XEON E5-46*
         100   ns - MUTEX lock/unlock
         100   ns - own DDR MEMORY reference
         135   ns - CPU cross-QPI/NUMA best  case on XEON E7-*
         202   ns - CPU cross-QPI/NUMA worst case on XEON E7-*
         325   ns - CPU cross-QPI/NUMA worst case on XEON E5-46*
      10,000   ns - Compress 1K bytes with Zippy PROCESS
      20,000   ns - Send 2K bytes over 1 Gbps NETWORK
     250,000   ns - Read 1 MB sequentially from MEMORY
     500,000   ns - Round trip within a same DataCenter
  10,000,000   ns - DISK seek
  10,000,000   ns - Read 1 MB sequentially from NETWORK
  30,000,000   ns - Read 1 MB sequentially from DISK
 150,000,000   ns - Send a NETWORK packet CA -> Netherlands
|   |   |   |
|   |   | ns|
|   | us|
| ms|

Source Stack Overflow

Debugging

Kernighan's Law

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it? -- Brian Kernighan, 1974

Write easy to understand code, planning on future debugging.

Communications Models

Message Passing (modern)

Puts messages into a shared queue, gives it a number, tell the other app "Go read this message"

Shared Memory (ancient)

Applications can just overwrite each others data.

Scheduling

FCFS - First come First Served. Not really used anymore
SJF - Shortest Job first, kind-of how QoS works.
Priority - Give processes an integer, rank them.
RR - Round Robin, using time quantum, called q like 10-100 milliseconds
CFS - *Completely Fair Scheduler
- Involved, emulates time-slices
- N tasks, each task gets 1/N time.

Multilevel Queue - Done in Linux

Foreground, Background
- Foreground gets 80% as RR
Background
- FCFS

Process Environment

Argument vector - the command line arguments used to invoke the running program
Environment vector - the list of "NAME=VALUE" pairs

Static and Dynamic Linking

Static - the library functions are embedded in the executable.
Dynamic - the library functions are at a place in memory, and shared.

Terms

STDM - Synchronous Time-Division Multiplexing
DS0 - Level 0. One timeslot. A timeslot carries 8 bits. Frame rate is 8000 hz. 8 * 8000 = 64Kbps.
B8ZS - Binary Eight Zero Substitution. A special way to encode 0000 0000 for DS0 lines.
T1 Frame - T-Carrier, Level 1. Aggregates 24 DS0 frames, or 192 bits. The T1 gets an extra bit, for framing so 193. 193 * 8000 is 1.544 Mbps.
Super Frame - 12 T1 frames.
Framing Search - Each T1 frame uses the extra bit to encode part of the superframe bit pattern 0101 1101 0001 or (5, 13, 1).
APS - Automatic Protection Switching. The device engaging in APS sends the data on both links, the working link and the protected link. The recieving device devices which to use.
DS1 - Data Stream, Level 1.
T1 - T-Carrier, Level 1, Carries 24 DS0 frames, or 192 bits. The T1 gets an extra bit, for framing so 193. 193 * 8000 is 1.544 Mbps.
ACR - Access Circuit Redundancy

The common STDM system in the US is T-Carrier.

Cisco CEM Terms

ACR - Adaptive Clock Recovery, A technique to recovery the clock based on the fill level of the jitter buffer.

Access Circuit Redundancy

References

T-Carrier and SONET

All you Wanted to Know about T1 But Were afraid to Ask

OCx CEM Interface Module Config Guide IOS-XE 17 ASR 900 Series

Rocky Linux, Certbot, Let's Encrypt, DNS and Snap

This setup means a device can have a valid SSL certificate and still be inaccessible from the Internet, so https://host.example.com works internally without SSL warnings.

Let's Encrypt is a Certificate Authority provided by the non-profit Internet Security Research Group as a free service.

This is a partial set of instructions to get valid SSL certificates via Let's Encrypt via certbot. It doesn't include autorenew. I did this on Rocky Linux but other instructions exist for other platforms.

These instructions follow RFC 8555#section-8.4 -> DNS Challenge.

I'm using cloudflare with a domain I own, but there is a good sized list of supported DNS plugins.

Instructions

Remove the older certbot

sudo dnf remove certbot
Update the package list

sudo dnf update
Install the EPEL repository

sudo dnf install epel-release
Install snapd, via the EPEL repository

sudo dnf install snapd
Enable the snap socket

sudo systemctl enable --now snapd.socket
Enable Classic Snap

sudo ln -s /var/lib/snapd/snap /snap
Install Classic Certbot, via Snap

sudo snap install --classic certbot
Link it like a regular binary.

sudo ln -s /snap/bin/certbot /usr/bin/certbot
Tell Certbot it can have root

sudo snap set certbot trust-plugin-with-root=ok
Obtain the cloudflare plugin

sudo snap install certbot-dns-cloudflare
Re-establish connection to box, to refresh binary paths

<exit>

<reconnect>
Get an API token from cloudflare.
- Limit permissions to Zone - DNS - Edit
- Limit the Zone to Include - Specific Zone - <domain>
Create a cloudflare.key file with the API token

dns_cloudflare_api_token = <token here>
Set the permissions on the key to be restrictive

sudo chmod o-rwx cloudflare.key

Get the certificates

sudo certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /opt/certbot/cloudflare.key \
  -d host.example.com

Move cloudflare.key into the new /etc/letsencrypt/ directory.

sudo mv /etc/letsencrypt/cloudflare-api-key cloudflare.key
Check work

ls -la /etc/letsencrypt/

References

EFF - Install Certbot via Snap

Snapcraft - Installing Snap or Rocky Linux

Read The Docs - Certbot - DNS Plugins

#
# This is the config for portainer, and the reverse proxy, traefik
#

#
# This is a VM that hosts portainer. These are services started by docker compose.
#
# sudo docker comopose up -d
# sudo docker compose down
#
# the network user-bridge needs to be specified in advance
#
# My wiki host is wiki.<mydomain>.org
# My wiki backup host is wiki-backup.<mydomain>.org
#
# The A and AAAA records point to the IP of the VM.
#
#
# My external DNS is handled by cloudflare. I'm using dns-challenge for getting LetsEncrypt SSL certs.
#
#


ariadne@docker-host:~/docker/portainer-traefik$ cat docker-compose.yml 
version: '3.1'
services:
  portainer:
    container_name: portainer
    image: portainer/portainer-ce:latest
    command: -H unix:///var/run/docker.sock
    restart: always
      #    ports:
      #- 8000:8000
      #- 9443:9443
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - portainer_data:/data
    networks:
      - user-bridge
    labels:
      - "traefik.enable=true"
      # using-the-fqdn
      - "traefik.http.routers.using-the-fqdn.rule=Host(`<docker-host>.<redacted>.org`)"
      - "traefik.http.routers.using-the-fqdn.entrypoints=websecure"
      - "traefik.http.routers.using-the-fqdn.service=using-the-fqdn"
      - "traefik.http.routers.using-the-fqdn.tls.certresolver=letsencrypt"
      - "traefik.http.services.using-the-fqdn.loadbalancer.server.port=9000"
  traefik:
    image: "traefik:v2.10"
    container_name: traefik
    restart: always
    command:
      # - "--log.level=DEBUG"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      # create entry point "web"
      - "--entrypoints.web.address=:80"
      # create entry point "websecure"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
      - "--entrypoints.web.http.redirections.entryPoint.scheme=https"
      # create cert resolver "letsencrypt"
      - "--certificatesresolvers.letsencrypt.acme.dnschallenge=true"
      - "--certificatesresolvers.letsencrypt.acme.dnschallenge.provider=cloudflare"
      - "--certificatesresolvers.letsencrypt.acme.dnschallenge.resolvers=1.1.1.1:53,8.8.8.8:53"
      # - "--certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory" # Staging CA Server
      - "--certificatesresolvers.letsencrypt.acme.caserver=https://acme-v02.api.letsencrypt.org/directory" # Production CA Server
      - "--certificatesresolvers.letsencrypt.acme.email=<redacted>"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    networks:
      - user-bridge
    environment:
      - "CF_DNS_API_TOKEN=<redacted>"
    volumes:
      - "./letsencrypt:/letsencrypt"
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
    labels:
      # create router "http-catchall"
      - "traefik.http.routers.http-catchall.rule=hostregexp(`{host:.+}`)"
      - "traefik.http.routers.http-catchall.entrypoints=web"
      # create middleware "middlewares"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.permanent=true"
volumes:
  portainer_data:

networks:
  user-bridge:
    external: true


#
# This is the config for the db, wiki, and duplicati backup services
#
ariadne@grove:~/docker/home-wiki$ cat docker-compose.yml 
version: "3.1"

services:
  db:
    image: postgres:15-alpine
    restart: no
    environment:
      POSTGRES_DB: wiki
      POSTGRES_PASSWORD: <redacted>
      POSTGRES_USER: wikijs
    logging:
      driver: "none"
    volumes:
      - /mnt/wiki-drive:/var/lib/postgresql/data
    networks:
      - user-bridge

  wiki:
    image: ghcr.io/requarks/wiki:2
    restart: always
    environment:
      DB_TYPE: postgres
      DB_HOST: db
      DB_PORT: 5432
      DB_USER: wikijs
      DB_PASS: wikijsrocks
      DB_NAME: wiki
    ports:
      - "3000:3000"
    networks:
      - user-bridge
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.wiki.rule=Host(`wiki.<redacted>.org`)"
      - "traefik.http.routers.wiki.entrypoints=web,websecure"
      - "traefik.http.routers.wiki.tls.certresolver=letsencrypt"
      - "traefik.http.services.wiki.loadbalancer.server.port=3000"

  duplicati:
    image: duplicati/duplicati:latest
    restart: always
    ports:
      - "8200:8200"
    command: "/usr/bin/duplicati-server --webservice-port=8200 --webservice-interface=any --webservice-allowed-hostnames=*"
    volumes:
      - /mnt/wiki-drive:/wiki-drive:rw        # What we want to back up 
      - /opt/duplicati/data:/data:rw          # Config Storage on the host
    networks:
      - user-bridge
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.duplicati.rule=Host(`wiki-backup.<redacted>.org`)"
      - "traefik.http.routers.duplicati.entrypoints=web,websecure"
      - "traefik.http.routers.duplicati.tls.certresolver=letsencrypt"
      - "traefik.http.services.duplicati.loadbalancer.server.port=8200"

networks:
  user-bridge:
    external: true

Windows 10 P2V - Physical to Virtual

My Setup

I am adding a compute node to an existing proxmox cluster.

I bought a used i7 Windows 10 machine with a 512 GB NVMe drive. On the outside are two COA stickers, one for Windows 10 Pro, and another for H&S Office 2019.

The current OS boots and the copy of Office works.

Goal: I want to keep this install of Windows 10 working, and copy the OS into Proxmox. I want to virtualize this OS.

This will give me a working licensed copy of Office.

Theory

I just need to get the "data" onto the VM.

The physical machine and the VM need to be able to ping each other.
Installing the drivers ahead of time should make the OS bootable.
Copying the data should preserve the OS and applications.
Copying the partitions should make recovery easier.
Rebuilding the boot information should make the OS bootable.

A lot of this is to enable a clean "recovery" of the OS once it's copied over. My copy of Windows 10 relies on:

FAT32
NTFS - This filesystem should really only be checked using Microsoft's own tools.
BCD - Boot Configuraiton Data
GPT
EFI
MSR

Dataloss

These tools cause dataloss.

A typo will destroy a filesystem.

Before doing this, practice both making and recovering bare metal restores (BMRs) ... I used Clonezilla.

BMR is usually device-to-image, or image-to-device.

Here are the docs for using Clonezilla.

My Windows 10 BMR is 11GB stored as bzip2.

If Possible Just Clone the Disk

I wanted to go from a larger drive (512GB) to a smaller drive (64GB). That meant instead of copying the devices, I needed to copy the partitions, after resizing them.

drive-to-drive cloning would be much easier.

Download ISOs

Most of the time was spent inside of recovery OSes, working with unmounted filesystems.

SystemRescue - Linux recovery media with NTFS support.

Windows 10 Installation Media - This is also the recovery disk. It can be made on the host being virtualized. This is needed to fix, BCD (Boot Configuariton Data) and EFI problems.

Clonezilla - A bare metal recovery tool.

Preparing Windows 10 to be virtualized

My Windows 10 machine had some extras on it I didn't want to virtualize.

Create a restore image with Clonezilla

This is the failsafe image, before touching anything. I saved mine to a samba share, but it can be saved anywhere it will fit that isn't on the device.
Turn off the hibernation file

Via the command prompt as an administrator:

powercfg -h off
Clean up the hard disk

Into the search box type:

Disk Cleanup
Set the virtual memory pagefile to 1024MB

A file of this size is needed for coredumps, errors, and logging.

Follow these instructions.
(Optional) Run WinDirStat to look for odd or large files

Delete or Uninstall them.

Windows Directory Statistics - WinDirStat
Run chkdsk on C:

Via the command prompt as an administrator:

chkdsk C: /R

/R - "Locates bad sectors and recovers readable information (implies /F, when /scan not specified)"

Reboot
(Optional) - Create another restore point with Clonezilla

This is the cleaned image, to save all the clean up work.
Boot GParted

This is where it gets dangerous. GParted can be used to resize offline NTFS partitions.
Resize the "Basic data partition"

My data partition was 410GiB. I resized it down to 48GiB. The data on the partition is 25GiB.
Move the "Recovery" partition

I used the GUI to slide it over.
Save your work with GParted

Click the green checkmark. This writes the changes to disk.
Boot into Windows 10

Check to make sure the OS is still sane. Does the Internet work?
Run chkdsk again on C:

This is done to make sure the filesystem is OK.

Via the command prompt as an administrator:

chkdsk C: /R

/R - "Locates bad sectors and recovers readable information (implies /F, when /scan not specified)"

Reboot
(Optional) - Create another restore point with Clonezilla

This is the prepared image.
Boot into SystemRescue

Creating the Virtual Machine

I used PVE - Proxmox Virtual Environment as my hypervisor. Any hypervisor should work.

I used the Proxmox GUI to assign the VM a hard disk of 64GB.

I boot the VM with SystemRescue, and make sure it can get a working IP address.

Preparing the Hard Drive on the Virtual Machine

There are four partitions on my windows 10 machine. I want to copy them over-the-network using netcat.

Both - Boot SystemRescue
Both - Open GParted
Destination - Using GParted, recreate the partition structure on the new hard disk

I used a mix of fdisk and the GUI for this.
- Created a GPT Partition Table
- Copied the partitions including the start and stop sectors, exactly.
- Copied the flags
I started with four partitions on both and ended with four partitions. They all fit on this smaller disk.
Destination - Turn off the firewall

systemctl stop iptables
Destination - Get the IP Address

ip a
Destination - Turn on the small service netcat

This needs to be done for each partition, one at a time.

nc -l -p 19000 | bzip2 -d | dd of=/dev/sda1
Source - Redirect dd into bzip into netcat, throw traffic at the Destination

This needs to be done for each partition, one at a time.

dd bs=16M if=/dev/nvme0n1p1 | bzip2 -c | nc <ip_address> <port>

Windows 10 Recovery

I went from a NVMe drive to a IDE drive. I still needed to recover the bootdata.

Destination - Load the ISO for the Windows Recovery Environment.

Click Repair your computer

Click Troubleshoot

Click Command Prompt

I followed this guide to repair the boot info.

Look at the new VM disk

diskpart

This leads to the DISKPART> prompt.
Verify the disk is GPT.

Under "GPT" there should be a star.
Select Disk 0

This is the only hard disk in this VM.

sel disk 0

List the partitions and Volumes

This is the windows equivalant to fdisk.

list partition

list volume

This is my lab system.

DISKPART> list partition 

   Partition ###   Type            Size        Offset
   -------------   --------------  ----------  -------
   Partition 1     System          100 MB      1024 KB
   Partition 2     Reserved        16 MB       101 MB
   Partition 3     Primary         46 GB       117 MB

DISKPART> list volume 

   Volume ###  Ltr     Label       Fs      Type        Size        Status      Info 
   ----------  ---     ----------  -----   ----------  -------     ----------  -------
   Volume 0    D       ESD-ISO     UDF     CD-ROM      4667 MB     Healthy  
   Volume 1    C                   NTFS    Partition     46 GB     Healthy  
   Volume 2                        FAT32   Partition    100 MB     Healthy     Hidden

There are the three required volumes.

NTFS - The data partition, apps and the OS
EFI - Extensible Firmware Interface. Where the modern boot system lives. Usually 100MB, FAT32
MSR - Microsoft System Reserved. Usually 16MB formatted as "MSR". Used by Windows to help manage the file partitions

At this point, I could just follow along with the Windows OS Hub article, to restore the BCD bootloader configuration.

References

Windows OS Hub - How to Repair EFI/GPT Bootloader on Windows 10 or 11

Microsoft - Disk cleanup in Windows

Ten Forums - How to Manage Virtual Memory Pagefile in Windows 10

Microsoft - BCD Boot Command Line Options

Windows OS Hub - How to repair deleted EFI partition in windows 7

Ariadne's Network Notes