ERS 8600: CPU Switch-over using the command "config sys set action cpuswitchover"


Doc ID    SOLN222909
Version:    2.0
Status:    Published
Published date:    19 Sep 2014
Created Date:    08 Mar 2013
Author:   
Jayant Sanduja
 

Details

Product: ERS 8600

S/w Version:- 7.1.3.0

Network outages after planned CPU switchover in ERS8600. Planning to go for the CPU switcover via command "config sys set action switchover", was not successful.

Problem Clarification

When CPU switch over was done using "config sys set action cpu switchover" ,  there are possibilities of outage into the network. (Once noticed in customer's Scenario). Both the SF's were manually reseated and then, booted the box with CPU 5. 

 

Cause

N/A

Solution

Investigation of logs and show tech when CPU switchover is done using the command "config sys set action cpu switchover" , the findings are as follows:-

Line 25870: CPU5 [01/15/13 22:17:47] SNMP INFO HA-CPU: Peer connection is established.
Line 25871: CPU5 [01/15/13 22:17:47] HW INFO VRF name: Global Router (VRF id 0): HA-CPU: Table Sync in progress
Line 25881: CPU5 [01/15/13 22:17:48] SNMP INFO HA-CPU: Synchronized state.
Line 27055: CPU5 [01/15/13 22:18:13] HW INFO HA-CPU: Table Sync Completed on Secondary CPU

 <NP> 24:e00b73262be8dda9376916e552c70bc7a4ab2dddeae557fe</NP> CPU5 [01/15/13 21:37:17] HW ERROR Lost connection to standby state = 0 event = 0
<NP> 24:261340581186ac6bca964417d3fdbea827e5446d5f33fd56</NP> CPU5 [01/15/13 21:39:33] SW INFO Found serial number <00:15:e8:ae:d0:00> in file <license.dat>
<NP> 24:261340581186ac6bca964417d3fdbea827e5446d5f33fd56</NP> CPU5 [01/15/13 21:39:33] SW INFO License Successfully Loaded From <license.dat> License Type -- PREMIER
<NP> 24:2ff1e7b15c229788e686357f99951d2c209002f849c84183</NP> CPU5 [01/15/13 21:39:34] SW INFO System boot
<NP> 24:2ff1e7b15c229788e686357f99951d2c209002f849c84183</NP> CPU5 [01/15/13 21:39:34] SW INFO ERS System Software Release 7.1.3.0
<NP> 24:2ff1e7b15c229788e686357f99951d2c209002f849c84183</NP> CPU5 [01/15/13 21:39:34] SW INFO CPU card entering hot-standby mode...
<NP> 24:e00b73262be8dda9376916e552c70bc7a4ab2dddeae557fe</NP> CPU5 [01/15/13 21:39:37] HW INFO HA-CPU: Table Sync in progress(Standby)...Please wait
Warning: Please do not reset either of the CPU/SSF cards until table synchronization is fully completed
<NP> 24:261340581186ac6bfc802a60d1d9b8a62521ff247578a320</NP> CPU5 [01/15/13 21:39:38] SW INFO License Successfully Loaded From <license.dat> License Type – PREMIER
<NP> 32:e00b73262be8dda9531e05d3ef8ce8dbaaa728da85a64a15dd58733a49572454</NP> CPU5 [01/15/13 21:39:58] HW INFO HA-CPU: Table Sync is complete (Standby CPU)
<NP> 32:e00b73262be8dda9531e05d3ef8ce8dbaaa728da85a64a15dd58733a49572454</NP> CPU5 [01/15/13 21:39:58] HW INFO Table Sync took 21398586 usecs.
Table Sync Real Execution took 21343657 usecs.

Clearly log messages reflect that CPU switch over was done perfectly. This means that peer connection was established between both the CPU’s and we also saw that configuration on the both the CPU’s is same and synchronized.

Things to consider when the CPU switchover is done with HA mode enabled:-

- When Performing CPU switchover, first ensure :- save config standby <filename> and save bootconfig standby <filename>, however, autosave boot flag to standby is already enabled.

- Perform config sys set action cpu switchover

- Wait for the switchover to complete (about 30 seconds).

- When it is complete, a logon prompt appears on the console session. On the old secondary SF/CPU module, the master LED lights. Now logon to the new master.

 

Note :- Do not hot swap or insert modules in a switch while the switch boots. If you do, the switch may not recognize the module, which causes module initialization failure.

 

CPU High Availability (CPU-HA) mode enables switches with two CPUs to recover quickly from a failure of the master SF/CPU. HA and non-HA mode characteristics are as follows:

 

• In HA mode, also called “hot standby,” the two CPUs are synchronized. This means the CPUs have the same configuration and forwarding tables, with the master automatically updating the forwarding tables of the secondary in real time. When the master SF/CPU fails, the secondary takes over "master" responsibility very quickly, thereby minimizing traffic interruption for the failure condition.

• In non-HA mode, also called “warm standby,” the two CPUs are not synchronized. In this mode, when the master fails, the secondary SF/CPU must boot before taking "master" responsibility, and then must also re-learn the forwarding table information. This operation causes an interruption to traffic.

 

Further,  info about the standby-to-master delay:-

 

Configure the standby-to-master delay to set the number of seconds a standby SF/CPU waits before trying to become the master SF/CPU. The time delay you configure applies during a cold start; it does not apply to a failover start.

 

Configure the standby-to-master delay by performing this procedure.

 

config bootconfig delay <seconds>

                                                                                                                           

In case the issue persists, collect below mentioned information: -

       1.    Show Tech of CPU 6 and 5 when the switch over is done.

2.    Logs of both the CPU’s

3.    Critical log file if generated.

4.    In case there is config loss then, Show config before the switchover was done and Show config containing the partial configuration.

 

Note: -Unless the above data is available it is very difficult to give the RCA.


Avaya -- Proprietary. Use pursuant to the terms of your signed agreement or Avaya policy