InterWeb.org.uk

Interweb Blog Hyper-V, App-V, Dell Equallogic, Forefront TMG Security, DPM 2010, SCVMM

6Jan/126

EqualLogic Kernel Panic on a NetBSD Processor

After running EqualLogic for a couple of years now, I have never experienced any real issues with the boxes. Running what seems to be a customized version of NetBSD they have proven to be fast, and reliable systems.

I was performing what is now a routine Firmware update from version 5.1.1 to 5.1.2, I have already done this on a PS6000 we have setup for Replication with no issue what so ever, and left upgrading our production Array untill I was happy that the Replication Array was working fine with the new firmware. Satisfied I decided to update our production PS6000 to FW 5.1.2.  During the upgrade I recieved a Kernel Panic and subsequent Panic Recovery, looks like the controller crashed during the update this resulted in the iSCSI targets being unavailable for a couple of minutes, the controllers did not fail over and I now have a Firmware mismatch.

below are some of the event logs from the EqualLogic PS6000 Array concerned.

Logs -

ERROR event from storage array

subsystem: SP
event: 15.4.1
time: Thu Jan 5 16:27:41 2012

NVRAM contains valid data. This is a PANIC RECOVERY due to a panic on a NetBSD processor.

-----------------------------------------
ERROR event from storage array
subsystem: SP
event: 15.4.5
time: Thu Jan 5 16:27:41 2012

Saved CPU registers, CPU 12
at 0000000000000000 v0 0000000004010000 v1 ffffffffbef08010
a0 0000000000000104 a1 0000000000000000 a2 0000000000000104 a3 ffffffffd250fa30
t0 0000000000000001 t1 0000000000000000 t2 ffffffffd2247000 t3 000000000000000c
t4 0000000000000000 t5 ffffffffe000d830 t6 ffffffffe05972cc t7 0000000000000000
s0 ffffffff8092caf8 s1 ffffffffd250fa18 s2 ffffffff8092caf8 s3 ffffffff81306f70
s4 000000000000ff01 s5 0000000000040000 s6 ffffffff80000400 s7 0000000000000000
t8 0000000000000000 t9 ffffffffc0042f00 k0 000000000010401f k1 000000000010405f
gp ffffffff8003e000 sp ffffffffd250f890 s8 0000000000000080 ra ffffffff804bfe7c
-----------------------------------------
ERROR event from storage array
subsystem: SP
event: 15.4.6
time: Thu Jan 5 16:27:42 2012

Saved CP0 registers, CPU 12
sr 40488005 badva d2247000 epc 8063e75c errorepc 804bfe50
cause 0000000c errctl 00000000 cacheeri 00000000 cacheerd 00000000
buserr 0000000000000000 cacheerrdpa 0000000000000000
-----------------------------------------
ERROR event from storage array
subsystem: SP
event: 15.4.7
time: Thu Jan 5 16:27:42 2012

Saved function call stack, CPU 12
804bfe50 804bfe7c 804c0060 8063132c 80632a50 80632850 8063a2ec 80637668
804cff78 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-----------------------------------------

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-----------------------------------------
ERROR event from storage array
subsystem: SP
event: 2.4.0
time: Thu Jan 5 16:27:43 2012

Panic recovery from CPU0 with reason 'PortRetryIO - Retry of Srb 0xc0104e00, LBA 5, Outstanding 1'.
-----------------------------------------
WARNING event from storage array
subsystem: SP
event: 28.3.51
time: Thu Jan 5 16:27:58 2012

Warning health conditions currently exist.
Correct these conditions before they affect array operation.
Control modules are initializing. Control module failover cannot occur until the initialization completes.
There are 1 outstanding health conditions. Correct these conditions before they affect array operation.
-----------------------------------------
ERROR event from storage array
subsystem: SP
event: 34.4.0
time: Thu Jan 5 16:27:58 2012

Version mismatch between active and secondary control modules.
Active Firmware Version: Storage Array Firmware V5.1.1 (R189834) (H2)
Secondary Firmware Version: Storage Array Firmware V5.1.2 (R197668)

-----------------------------------------
ERROR event from storage array
subsystem: SP
event: 34.4.3
time: Thu Jan 5 16:27:58 2012

Secondary control module cannot be used to mirror the cache on the active control module.
-----------------------------------------

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

ERROR event from storage array
subsystem: logevent
event: 30.4.1
time: Thu Jan 5 16:35:56 2012

Array firmware update from version V5.1.1 to V5.1.2 failed. Reason: Error verifying new firmware integrity.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I have generated .dgo log files and have opened a case with EqualLogic support, just waiting to hear back.  I must say at this point the EqualLogic Array is still functioning on the controller running FW 5.1.1 all volumes are online and iSCSI targets are in session, there seems to be no performance decrease.  As for controller failover, If the current active controller failed I wouldn't like to say if normal failover would occur to the secondary controller due to the mismatch in controller firmware versions.

I will update this post as I get more information on what caused the Kernel Panic.

UPDATE 06/01/2012 @ 14:48 GMT

Spoke to EqualLogic support, according to them there is a rare issue with FW Versions 5.1.x  that in when you process a firmware update using the Java based Group Manager GUI, there are instances when the firmware file becomes corrupted.  Support siad that this is a very rare issue and that it will be addressed in further updates.

Case Archived.

 

    Comments (6) Trackbacks (2)
    1. I just ran into this same exact issue. Can you share how it was resolved?

    2. how can i open .DGO file, it’s encrypted…

    3. I’m experiencing the same error message right now with an attempted upgrade from 5.0.2 to 5.1.4-H1. EqualLogic Support have asked me to generate diagnostic logs. Will post again once I have something to report.

      • Correction for earlier post, I am upgrading from 5.0.2 to 5.2.4(H1).

        After talking to EqualLogic support they instructed me to physicall pull the controller out that had been updated to 5.2.4(H1) as it was not the active controller, then run an update on the active controller (5.0.2) via the CLI/FTP. The update was successful, so I then plugged the other controller back in and then did a test failover to it successfully.

        The DGO files are encrypted, and EqualLogic support have the tool that can read them.

        No specific reason was given to me for the firmware update failure, but I suspect it was for the same reason.

        I’ll be doing all future updates via the CLI now.


    Leave a comment

    Get Adobe Flash player