Monday, August 25, 2014

Notice on Orderwire Interruptions in GSCC Boards of OSN3500 and OSN7500

Abstract:
Few SSN1GSCC02 and SSN3GSCC02 boards of OSN 3500 and OSN 7500 have degraded clock signals due to the critical-state quality of clock signals on the orderwire chip SPI and individual variances among boards (for example, in PCB layout and build-out resistor precision). There is a low possibility that these boards experience warm resets. If this occurred on an active SSN1GSCC02 or SSN3GSCC02, the NE housing the SSN1GSCC02 or SSN3GSCC02 board may become transiently unreachable by the NMS.
SSN1GSCC
[Problem Description]
Trigger condition:
There is a low possibility that the SSN1GSCC02 and SSN3GSCC02 boards manufactured earlier than August 9th, 2013 confronts this problem.
Symptom:
The SCC board occasionally reports the HARD_BAD (0xff 0xff 0xff 0×00 0×01) alarm that cannot be cleared after a warm reset due to an abnormal orderwire interruption. If the active SCC board is warm reset but does not report the HARD_BAD alarm, the NE is transiently unreachable for the NMS; if the active SCC board reports the HARD_BAD alarm, the active/standby switching is triggered. If the standby SCC board is warm reset, the standby SCC board reports the COMMUN _ FAIL (0×01 0×00 0×03 0xff 0xff) alarm, and the COMMUN-FAIL alarm clears after the standby SCC board starts working.
Identification method:
When the following two conditions are met, it can be determined that the problem is triggered:
1. The SSN1GSCC02 or SSN3GSCC02 board manufactured earlier than August 9th, 2013 serves as the system control board. Query the type and production date (obtained from the bar code) of a system control board using either of the following methods:
Method 1:
On the Navigator, run the :cfg-get-bdinfo:sccbdid command, as shown below:
cfg-get-bdinfo:24
BOARD-ALL-INFO
VERSION-INFO
[ArchivesInfo Version]
ArchivesInfoVersion=2.0
$[Log]
$Log1=14336,03020DCM0,2009-12-03
[Board Properties]
BoardType=SSN3GSCC02
BarCode=020DCM109B000016
BOM=BOM03020DCM00
Manufactured=2009-12-03
ManufactureCode=1
The type and production date of the system control board can be obtained from BoardType and BarCode respectively.
Method 2:
On the NMS (for example, U2000), choose Inventory > Project Document > Board Manufacture Information from the menu bar.
The type and production date of the system control board can be obtained from BoardType and BarCode respectively.
As shown in the above figure, the type and bar code of the system control board are SSN3GSCC02 and 020DCM109B000016 respectively. The 9th hexadecimal digit of a bar code indicates the year and the 10th indicates the month. The hexadecimal digit B equals the decimal digit 11. Therefore, the system control board queried above is manufactured in November, 2009.
2. An SSN1GSCC02 or SSN3GSCC02 board is reset three times or more within 24 hours due to abnormal orderwire interruptions. On the Navigator, run the :mon-get-errlog:sccbdid command to query the reset record of an SSN1GSCC02 or SSN3GSCC02 board, as shown below:
mon-get-errlog:24
No.1035:    2013-08-14 08:16:49 BOARD=024      TYPE=0xf0000040    SOFTTYPE=002
No.1036:    2013-08-14 08:31:46 BOARD=024      TYPE=0xf0000040    SOFTTYPE=002
No.1037:    2013-08-14 15:47:49 BOARD=024      TYPE=0xf0000010    SOFTTYPE=002
No.1038:    2013-08-14 16:56:39 BOARD=024      TYPE=0xf0000010     SOFTTYPE=002
084# 2013-08-14 08:14:45       fatal task errorcode=0xffffffff,     Line 00000 in reboot:interrup
085# 2013-08-14 08:29:43       fatal task errorcode=0xffffffff,     Line 00000 in reboot:interrup
086# 2013-08-14 15:45:26    ExcptErr: pc=0×00000000,   SR=0×81030,   FaultAddr=0×00000700,   DataBuff=0x019894c4,    taskname=interrupt
087# 2013-08-14 16:54:16   ExcptErr: pc=0×00000000,    SR=0×81030,   FaultAddr=0×00000700,   DataBuff=0x019894c4,    taskname=interrupt
In query results, if the reset type (TYPE) is 0xf0000040 or 0xf0000010, and the field interrup appears in the corresponding reset cause, the SSN1GSCC02 or SSN3GSCC02 board experiences a reset due to an abnormal orderwire interruption.
[Root Cause]
Few SSN1GSCC02 and SSN3GSCC02 boards have degraded clock signals due to the critical-state quality of clock signals on the orderwire chip SPI and individual variances among boards (for example, in PCB layout and build-out resistor precision). Therefore, the register is incorrectly read or written, an abnormal orderwire interruption occurs, and then an SSN1GSCC02 and SSN3GSCC02 board is reset. In addition, orderwire interruption and nest occurs, as well as stack overflow. As a result, the register values of power switching control are mistakenly modified, and the HARD-BAD alarm is reported.
This problem is addressed in SSN1GSCC02 and SSN3GSCC02 boards manufactured later than August 9th, 2013.
[Impact and Risk]
1. A traditional NE experiences a warm reset and no active/standby switching is triggered on a traditional NE housing two system control boards. If the active system control board or the single system control board of a traditional NE experiences a warm reset, the NE will become transiently unreachable by the NMS. If the standby system control board of the NE experiences a reset, there is no impact or risk.
2. During the active/standby switching, if data is being backed up in batches, data loss may occur.
3. If the active system control board of an ASON NE experiences a warm reset, the NE will lose the rerouting capability temporally so the services may interrupt.
Preventive measures:
Replace the problematic system control board with an SSN1GSCC02 or SSN3GSCC02 board in Huawei OSN 3500 manufactured in September 2013 or later, or with an SSN4GSCC board. For ASON NEs, replace the faulty SCC board with the SSN4GSCC board.

Tuesday, August 12, 2014

Cautions for the Failure to Download Configuration Data on OptiX OSN 1500

Abstract: A user fails to download configuration data to an OSN 1500 with an interface board of dynamic ports due to the lack of logical configurations on tributary interface boards. As a result, services may be interrupted.
Symptom
1:The download fails or partially fails. Dynamic port and service data fail to be downloaded to NEs, as shown in the following figure.
Result
2:The download succeeds. However, the interface board becomes dimmed on the NE Panel. This is only a display issue and does not affect NMS functionality.
Before the download:
QQ截图20140808094526
After the download:
after the download
Identification Method
The preceding symptoms occur if boards are installed in slots 6 and 7 on the OSN 1500A. (Slots 6 and 7 are used to install interface boards.)
The preceding symptoms occur if boards are installed in slots 14, 15, 16, and 17 on the OSN 1500B. (Slots 14, 15, 16, and 17 are used to install interface boards.)
1.2 [Root Cause]
For earlier versions of OSN 1500A 5.36.30.10 and earlier version of OSN 1500B 5.36.30.10: Interface boards are not applied to NEs due to incorrect logical configurations. As a result, dynamic port and service data fail to be downloaded.
For OSN 1500A 5.36.30.10 and later versions: The NMS does not apply interface boards last. Therefore, if the processing board on which interface board application depends is not added, configuration data fails to be downloaded.
1.3 [Impact and Risks]
1. NE type involved: OSN 1500
Board types affected:
N1D75S, N1D12S, N1D12B, N1MU04, N1TSB8, N1ETF8, N1EFF8, N1EU04, N1OU08, N2OU08, N1EU08, N1TSB4, N1ETS8, N1DM12, N1D34S, N1C34S, L75S, and L12S
2.Scenario 1: The download fails or partially fails. Logical boards are not downloaded to the OSN 1500. As a result, services depending on the logical boards fail to be downloaded and the download process ends.
3.Scenario 2: The download succeeds. However, logical boards are not downloaded to the OSN 1500. This does not affect services and alarm reporting but affects board display.
1.4 [Measures and Solutions]
Preventive Measure
Do not download data to an OSN 1500 with interface boards on the NMS. Use DC to back up and restore the database of the OSN 1500.
Restoration Measure
If an OSN 1500 is equipped with interface boards and the configuration data fails to be downloaded. Perform the following restoration measures:
  •  If the database of the OSN 1500 has been backed up, use DC to restore the database.
  •  If the database of the OSN 1500 has not been backed up, restore the download in either of the following ways:
− Install a patch to resolve this problem and download the configuration data to the OSN 1500 again.
− Change the OSN 1500 to a preconfigured NE on the U2000, record the associated services of its interface boards, delete the interface boards and associated services, then download configuration data to the OSN 1500, and add interface boards and associated services to it.
Solution
The problem has been resolved in the following NMS versions:
  • NMS versions involved for the OSN 1500A (earlier versions of 5.36.30.10) and OSN 1500B (earlier versions of 5.36.30.10):
− U2000 V100R005C00CP6032 and later
− U2000 V100R006C00CP3011 and later
− U2000 V100R006C02CP3001 and later
  •  NMS versions involved for the OSN 1500A (5.36.30.10 and later):
− U2000 V100R002C01CP5035 and later
− U2000 V100R006C02SPC302 and later

Thursday, August 7, 2014

Please pay attention on TN11SCC Failure to Restart from a Power Outage in Huawei OptiX OSN 6800

Summary:
There is a small probability that TN11SCC boards equipped with memory chips of a specific lot fail to restart from a power outage.
[Problem Description]
Trigger condition:
1. A TN11SCC board uses a memory chip of a specific lot.
2. The TN11SCC board experiences a power outage (for example, the board is removed and reinserted, or the subrack is powered off).
Symptom:
An OSN 6800 NE is unreachable by the NMS. As confirmed onsite, after the TN11SCC board on the NE is powered on again, the STAT indicator fast blinks green and the PROG indicator fast blinks red at the same time on the front panel of the board. The board fails to restart.
Huawei Optix OSN6800
Identification method:
A TN11SCC board has the problem described in this notice when the following conditions are met:
1. An abnormality occurs after the SCC board or the subrack is powered on again from a power outage.
2. The BOM number of the TN11SCC board is included in the attachment of this notice.
[Root Cause]
The TN11SCC boards use memory chips of a specific lot which have a bug. Due to this bug, electrical discharging inside the chips upon board power outage causes a chip failure. Consequently, the boards fail to restart. The following table provides statistics on the failure rates for different working durations of the boards.
Working Duration (Year)Chip Failure Rate
20.002%
30.67%
41.58%
52.61%
63.73%
74.90%
86.11%
97.73%
108.65%

[Impact and Risk]
1. If the TN11SCC board is removed and reinserted, there is a small probability that the TN11SCC board fails to restart and the NE is unreachable by the NMS.
2. If the NE is powered off and then powered on again, there is a small probability that the TN11SCC board on the NE fails to restart and consequently services are interrupted.
[Measures and Solutions]
Recovery measures:
If an Huawei optical transmission OSN 6800 NE needs to be powered off and powered on again (for example in a subrack power supply cutover scenario), backup the NE database and prepare SCC spare parts as follows:
1. For a TN11SCC board of a version earlier than V100R004C04SPC800, prepare a spare TN11SCC board of the same software version.
2. For a TN11SCC board of V100R004C04SPC800 or a later version, prepare a spare TN11SCC or TN52SCC board of the same software version.
When the TN11SCC board on the OSN 6800 NE fails to restart from a power outage, replace the board with the prepared spare part in a timely manner and restore the NE database using the backup database.
Workarounds:
None.
Preventive measures:
Replace involved TN11SCC boards.
Material handling after replacement:
Send the replaced TN11SCC boards back to Huawei HQ for chip repair

Wednesday, August 6, 2014

Do you find any solution on rectification Cross-Connect BD_STATUS alarm on OSN 3500

Summary:
Creep occurs on SD585 soldered balls of some cross-connect boards of OptiX OSN 3500 so these boards repeatedly reset, fail to work, and report BD_STATUS alarms.
[Problem Description]
Trigger conditions:
There is a possibility that this problem occurs as a result of long-term exposure of the boards involved to high temperature.
1Symptom:
1. Boards are reset repeatedly and fail to work.
2. NEs may report BD_STATUS alarms or COMMUN_FAIL alarms on cross-connect boards. System control boards may report BIOS_STATUS alarms on cross-connect boards.
Identification method:
The problem can be identified if the following two conditions are met:
1. The boards are manufactured in Feb 2010, Apr 2010, May 2010, Jul 2010, Aug 2010, or Mar 2011.
2. The cross-connect boards are repeatedly reset and fail to work. The board BOMs are found in the attached Board Delivery Information.
[Root Cause]
The SD585 chip radiator uses the thick spring, which applies high levels of stress to the chip. The soldered ball of the SD585 chip may deform and short-circuit as a result of long-term exposure to high temperature. Therefore, the board repeatedly reset and fail to work.
[Impact and Risk]
1. Services are not influenced because of the 1+1 protection scheme is configured on the cross-connect boards. When a cross-connect board is faulty, services are switched over to the other cross-connect board.
2. In extra situations, both cross-connect boards configured in the 1+1 protection scheme become faulty in a short time. As a result, the NE fails to work and services are interrupted.
[Measures and Solutions]
Recovery measures:
Replace the faulty cross-connect board.
Solution:
Replace the faulty boards.
[Rectification Instructions] 
Replace the faulty boards.
SSN1SXCSA02 (03030KBM) boards are out of production, so replace SSN1SXCSA02 boards with SSN1SXCSA01 (03030DKF) when filling in an electric process application for board rectification in batches. The two kinds of boards can be mixedly inserted or completely replace each other. Active and standby boards of the same type are recommended.