Abstract:
Few SSN1GSCC02 and SSN3GSCC02 boards of OSN 3500 and OSN 7500 have degraded clock signals due to the critical-state quality of clock signals on the orderwire chip SPI and individual variances among boards (for example, in PCB layout and build-out resistor precision). There is a low possibility that these boards experience warm resets. If this occurred on an active SSN1GSCC02 or SSN3GSCC02, the NE housing the SSN1GSCC02 or SSN3GSCC02 board may become transiently unreachable by the NMS.
[Problem Description]
Trigger condition:
There is a low possibility that the SSN1GSCC02 and SSN3GSCC02 boards manufactured earlier than August 9th, 2013 confronts this problem.
Symptom:
The SCC board occasionally reports the HARD_BAD (0xff 0xff 0xff 0×00 0×01) alarm that cannot be cleared after a warm reset due to an abnormal orderwire interruption. If the active SCC board is warm reset but does not report the HARD_BAD alarm, the NE is transiently unreachable for the NMS; if the active SCC board reports the HARD_BAD alarm, the active/standby switching is triggered. If the standby SCC board is warm reset, the standby SCC board reports the COMMUN _ FAIL (0×01 0×00 0×03 0xff 0xff) alarm, and the COMMUN-FAIL alarm clears after the standby SCC board starts working.
Identification method:
When the following two conditions are met, it can be determined that the problem is triggered:
1. The SSN1GSCC02 or SSN3GSCC02 board manufactured earlier than August 9th, 2013 serves as the system control board. Query the type and production date (obtained from the bar code) of a system control board using either of the following methods:
Method 1:
On the Navigator, run the :cfg-get-bdinfo:sccbdid command, as shown below:
cfg-get-bdinfo:24
BOARD-ALL-INFO
VERSION-INFO
[ArchivesInfo Version]
ArchivesInfoVersion=2.0
$[Log]
$Log1=14336,03020DCM0,2009-12-03
[Board Properties]
BoardType=SSN3GSCC02
BarCode=020DCM109B000016
BOM=BOM03020DCM00
Manufactured=2009-12-03
ManufactureCode=1
The type and production date of the system control board can be obtained from BoardType and BarCode respectively.
Method 2:
On the NMS (for example, U2000), choose Inventory > Project Document > Board Manufacture Information from the menu bar.
Trigger condition:
There is a low possibility that the SSN1GSCC02 and SSN3GSCC02 boards manufactured earlier than August 9th, 2013 confronts this problem.
Symptom:
The SCC board occasionally reports the HARD_BAD (0xff 0xff 0xff 0×00 0×01) alarm that cannot be cleared after a warm reset due to an abnormal orderwire interruption. If the active SCC board is warm reset but does not report the HARD_BAD alarm, the NE is transiently unreachable for the NMS; if the active SCC board reports the HARD_BAD alarm, the active/standby switching is triggered. If the standby SCC board is warm reset, the standby SCC board reports the COMMUN _ FAIL (0×01 0×00 0×03 0xff 0xff) alarm, and the COMMUN-FAIL alarm clears after the standby SCC board starts working.
Identification method:
When the following two conditions are met, it can be determined that the problem is triggered:
1. The SSN1GSCC02 or SSN3GSCC02 board manufactured earlier than August 9th, 2013 serves as the system control board. Query the type and production date (obtained from the bar code) of a system control board using either of the following methods:
Method 1:
On the Navigator, run the :cfg-get-bdinfo:sccbdid command, as shown below:
cfg-get-bdinfo:24
BOARD-ALL-INFO
VERSION-INFO
[ArchivesInfo Version]
ArchivesInfoVersion=2.0
$[Log]
$Log1=14336,03020DCM0,2009-12-03
[Board Properties]
BoardType=SSN3GSCC02
BarCode=020DCM109B000016
BOM=BOM03020DCM00
Manufactured=2009-12-03
ManufactureCode=1
The type and production date of the system control board can be obtained from BoardType and BarCode respectively.
Method 2:
On the NMS (for example, U2000), choose Inventory > Project Document > Board Manufacture Information from the menu bar.
The type and production date of the system control board can be obtained from BoardType and BarCode respectively.
As shown in the above figure, the type and bar code of the system control board are SSN3GSCC02 and 020DCM109B000016 respectively. The 9th hexadecimal digit of a bar code indicates the year and the 10th indicates the month. The hexadecimal digit B equals the decimal digit 11. Therefore, the system control board queried above is manufactured in November, 2009.
2. An SSN1GSCC02 or SSN3GSCC02 board is reset three times or more within 24 hours due to abnormal orderwire interruptions. On the Navigator, run the :mon-get-errlog:sccbdid command to query the reset record of an SSN1GSCC02 or SSN3GSCC02 board, as shown below:
mon-get-errlog:24
No.1035: 2013-08-14 08:16:49 BOARD=024 TYPE=0xf0000040 SOFTTYPE=002
No.1036: 2013-08-14 08:31:46 BOARD=024 TYPE=0xf0000040 SOFTTYPE=002
No.1037: 2013-08-14 15:47:49 BOARD=024 TYPE=0xf0000010 SOFTTYPE=002
No.1038: 2013-08-14 16:56:39 BOARD=024 TYPE=0xf0000010 SOFTTYPE=002
084# 2013-08-14 08:14:45 fatal task errorcode=0xffffffff, Line 00000 in reboot:interrup
085# 2013-08-14 08:29:43 fatal task errorcode=0xffffffff, Line 00000 in reboot:interrup
2. An SSN1GSCC02 or SSN3GSCC02 board is reset three times or more within 24 hours due to abnormal orderwire interruptions. On the Navigator, run the :mon-get-errlog:sccbdid command to query the reset record of an SSN1GSCC02 or SSN3GSCC02 board, as shown below:
mon-get-errlog:24
No.1035: 2013-08-14 08:16:49 BOARD=024 TYPE=0xf0000040 SOFTTYPE=002
No.1036: 2013-08-14 08:31:46 BOARD=024 TYPE=0xf0000040 SOFTTYPE=002
No.1037: 2013-08-14 15:47:49 BOARD=024 TYPE=0xf0000010 SOFTTYPE=002
No.1038: 2013-08-14 16:56:39 BOARD=024 TYPE=0xf0000010 SOFTTYPE=002
084# 2013-08-14 08:14:45 fatal task errorcode=0xffffffff, Line 00000 in reboot:interrup
085# 2013-08-14 08:29:43 fatal task errorcode=0xffffffff, Line 00000 in reboot:interrup
086# 2013-08-14 15:45:26 ExcptErr: pc=0×00000000, SR=0×81030, FaultAddr=0×00000700, DataBuff=0x019894c4, taskname=interrupt
087# 2013-08-14 16:54:16 ExcptErr: pc=0×00000000, SR=0×81030, FaultAddr=0×00000700, DataBuff=0x019894c4, taskname=interrupt
In query results, if the reset type (TYPE) is 0xf0000040 or 0xf0000010, and the field interrup appears in the corresponding reset cause, the SSN1GSCC02 or SSN3GSCC02 board experiences a reset due to an abnormal orderwire interruption.
087# 2013-08-14 16:54:16 ExcptErr: pc=0×00000000, SR=0×81030, FaultAddr=0×00000700, DataBuff=0x019894c4, taskname=interrupt
In query results, if the reset type (TYPE) is 0xf0000040 or 0xf0000010, and the field interrup appears in the corresponding reset cause, the SSN1GSCC02 or SSN3GSCC02 board experiences a reset due to an abnormal orderwire interruption.
[Root Cause]
Few SSN1GSCC02 and SSN3GSCC02 boards have degraded clock signals due to the critical-state quality of clock signals on the orderwire chip SPI and individual variances among boards (for example, in PCB layout and build-out resistor precision). Therefore, the register is incorrectly read or written, an abnormal orderwire interruption occurs, and then an SSN1GSCC02 and SSN3GSCC02 board is reset. In addition, orderwire interruption and nest occurs, as well as stack overflow. As a result, the register values of power switching control are mistakenly modified, and the HARD-BAD alarm is reported.
This problem is addressed in SSN1GSCC02 and SSN3GSCC02 boards manufactured later than August 9th, 2013.
[Impact and Risk]
1. A traditional NE experiences a warm reset and no active/standby switching is triggered on a traditional NE housing two system control boards. If the active system control board or the single system control board of a traditional NE experiences a warm reset, the NE will become transiently unreachable by the NMS. If the standby system control board of the NE experiences a reset, there is no impact or risk.
2. During the active/standby switching, if data is being backed up in batches, data loss may occur.
3. If the active system control board of an ASON NE experiences a warm reset, the NE will lose the rerouting capability temporally so the services may interrupt.
Few SSN1GSCC02 and SSN3GSCC02 boards have degraded clock signals due to the critical-state quality of clock signals on the orderwire chip SPI and individual variances among boards (for example, in PCB layout and build-out resistor precision). Therefore, the register is incorrectly read or written, an abnormal orderwire interruption occurs, and then an SSN1GSCC02 and SSN3GSCC02 board is reset. In addition, orderwire interruption and nest occurs, as well as stack overflow. As a result, the register values of power switching control are mistakenly modified, and the HARD-BAD alarm is reported.
This problem is addressed in SSN1GSCC02 and SSN3GSCC02 boards manufactured later than August 9th, 2013.
[Impact and Risk]
1. A traditional NE experiences a warm reset and no active/standby switching is triggered on a traditional NE housing two system control boards. If the active system control board or the single system control board of a traditional NE experiences a warm reset, the NE will become transiently unreachable by the NMS. If the standby system control board of the NE experiences a reset, there is no impact or risk.
2. During the active/standby switching, if data is being backed up in batches, data loss may occur.
3. If the active system control board of an ASON NE experiences a warm reset, the NE will lose the rerouting capability temporally so the services may interrupt.
Preventive measures:
Replace the problematic system control board with an SSN1GSCC02 or SSN3GSCC02 board in Huawei OSN 3500 manufactured in September 2013 or later, or with an SSN4GSCC board. For ASON NEs, replace the faulty SCC board with the SSN4GSCC board.
Replace the problematic system control board with an SSN1GSCC02 or SSN3GSCC02 board in Huawei OSN 3500 manufactured in September 2013 or later, or with an SSN4GSCC board. For ASON NEs, replace the faulty SCC board with the SSN4GSCC board.
No comments:
Post a Comment