|
@@ -4,15 +4,17 @@
|
|
|
February 2, 2006
|
|
|
|
|
|
Current document maintainer:
|
|
|
- Linas Vepstas <linas@austin.ibm.com>
|
|
|
+ Linas Vepstas <linasvepstas@gmail.com>
|
|
|
+ updated by Richard Lary <rlary@us.ibm.com>
|
|
|
+ and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
|
|
|
|
|
|
|
|
|
Many PCI bus controllers are able to detect a variety of hardware
|
|
|
PCI errors on the bus, such as parity errors on the data and address
|
|
|
busses, as well as SERR and PERR errors. Some of the more advanced
|
|
|
chipsets are able to deal with these errors; these include PCI-E chipsets,
|
|
|
-and the PCI-host bridges found on IBM Power4 and Power5-based pSeries
|
|
|
-boxes. A typical action taken is to disconnect the affected device,
|
|
|
+and the PCI-host bridges found on IBM Power4, Power5 and Power6-based
|
|
|
+pSeries boxes. A typical action taken is to disconnect the affected device,
|
|
|
halting all I/O to it. The goal of a disconnection is to avoid system
|
|
|
corruption; for example, to halt system memory corruption due to DMA's
|
|
|
to "wild" addresses. Typically, a reconnection mechanism is also
|
|
@@ -37,10 +39,11 @@ is forced by the need to handle multi-function devices, that is,
|
|
|
devices that have multiple device drivers associated with them.
|
|
|
In the first stage, each driver is allowed to indicate what type
|
|
|
of reset it desires, the choices being a simple re-enabling of I/O
|
|
|
-or requesting a hard reset (a full electrical #RST of the PCI card).
|
|
|
-If any driver requests a full reset, that is what will be done.
|
|
|
+or requesting a slot reset.
|
|
|
|
|
|
-After a full reset and/or a re-enabling of I/O, all drivers are
|
|
|
+If any driver requests a slot reset, that is what will be done.
|
|
|
+
|
|
|
+After a reset and/or a re-enabling of I/O, all drivers are
|
|
|
again notified, so that they may then perform any device setup/config
|
|
|
that may be required. After these have all completed, a final
|
|
|
"resume normal operations" event is sent out.
|
|
@@ -101,7 +104,7 @@ if it implements any, it must implement error_detected(). If a callback
|
|
|
is not implemented, the corresponding feature is considered unsupported.
|
|
|
For example, if mmio_enabled() and resume() aren't there, then it
|
|
|
is assumed that the driver is not doing any direct recovery and requires
|
|
|
-a reset. If link_reset() is not implemented, the card is assumed as
|
|
|
+a slot reset. If link_reset() is not implemented, the card is assumed to
|
|
|
not care about link resets. Typically a driver will want to know about
|
|
|
a slot_reset().
|
|
|
|
|
@@ -111,7 +114,7 @@ sequence described below.
|
|
|
|
|
|
STEP 0: Error Event
|
|
|
-------------------
|
|
|
-PCI bus error is detect by the PCI hardware. On powerpc, the slot
|
|
|
+A PCI bus error is detected by the PCI hardware. On powerpc, the slot
|
|
|
is isolated, in that all I/O is blocked: all reads return 0xffffffff,
|
|
|
all writes are ignored.
|
|
|
|
|
@@ -139,7 +142,7 @@ The driver must return one of the following result codes:
|
|
|
a chance to extract some diagnostic information (see
|
|
|
mmio_enable, below).
|
|
|
- PCI_ERS_RESULT_NEED_RESET:
|
|
|
- Driver returns this if it can't recover without a hard
|
|
|
+ Driver returns this if it can't recover without a
|
|
|
slot reset.
|
|
|
- PCI_ERS_RESULT_DISCONNECT:
|
|
|
Driver returns this if it doesn't want to recover at all.
|
|
@@ -169,11 +172,11 @@ is STEP 6 (Permanent Failure).
|
|
|
|
|
|
>>> The current powerpc implementation doesn't much care if the device
|
|
|
>>> attempts I/O at this point, or not. I/O's will fail, returning
|
|
|
->>> a value of 0xff on read, and writes will be dropped. If the device
|
|
|
->>> driver attempts more than 10K I/O's to a frozen adapter, it will
|
|
|
->>> assume that the device driver has gone into an infinite loop, and
|
|
|
->>> it will panic the kernel. There doesn't seem to be any other
|
|
|
->>> way of stopping a device driver that insists on spinning on I/O.
|
|
|
+>>> a value of 0xff on read, and writes will be dropped. If more than
|
|
|
+>>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
|
|
|
+>>> assumes that the device driver has gone into an infinite loop
|
|
|
+>>> and prints an error to syslog. A reboot is then required to
|
|
|
+>>> get the device working again.
|
|
|
|
|
|
STEP 2: MMIO Enabled
|
|
|
-------------------
|
|
@@ -182,15 +185,14 @@ DMA), and then calls the mmio_enabled() callback on all affected
|
|
|
device drivers.
|
|
|
|
|
|
This is the "early recovery" call. IOs are allowed again, but DMA is
|
|
|
-not (hrm... to be discussed, I prefer not), with some restrictions. This
|
|
|
-is NOT a callback for the driver to start operations again, only to
|
|
|
-peek/poke at the device, extract diagnostic information, if any, and
|
|
|
-eventually do things like trigger a device local reset or some such,
|
|
|
-but not restart operations. This is callback is made if all drivers on
|
|
|
-a segment agree that they can try to recover and if no automatic link reset
|
|
|
-was performed by the HW. If the platform can't just re-enable IOs without
|
|
|
-a slot reset or a link reset, it wont call this callback, and instead
|
|
|
-will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
|
|
|
+not, with some restrictions. This is NOT a callback for the driver to
|
|
|
+start operations again, only to peek/poke at the device, extract diagnostic
|
|
|
+information, if any, and eventually do things like trigger a device local
|
|
|
+reset or some such, but not restart operations. This callback is made if
|
|
|
+all drivers on a segment agree that they can try to recover and if no automatic
|
|
|
+link reset was performed by the HW. If the platform can't just re-enable IOs
|
|
|
+without a slot reset or a link reset, it will not call this callback, and
|
|
|
+instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
|
|
|
|
|
|
>>> The following is proposed; no platform implements this yet:
|
|
|
>>> Proposal: All I/O's should be done _synchronously_ from within
|
|
@@ -228,9 +230,6 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
|
|
|
If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
|
|
|
proceeds to STEP 4 (Slot Reset)
|
|
|
|
|
|
->>> The current powerpc implementation does not implement this callback.
|
|
|
-
|
|
|
-
|
|
|
STEP 3: Link Reset
|
|
|
------------------
|
|
|
The platform resets the link, and then calls the link_reset() callback
|
|
@@ -253,16 +252,33 @@ The platform then proceeds to either STEP 4 (Slot Reset) or STEP 5
|
|
|
|
|
|
>>> The current powerpc implementation does not implement this callback.
|
|
|
|
|
|
-
|
|
|
STEP 4: Slot Reset
|
|
|
------------------
|
|
|
-The platform performs a soft or hard reset of the device, and then
|
|
|
-calls the slot_reset() callback.
|
|
|
|
|
|
-A soft reset consists of asserting the adapter #RST line and then
|
|
|
+In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
|
|
|
+the platform will peform a slot reset on the requesting PCI device(s).
|
|
|
+The actual steps taken by a platform to perform a slot reset
|
|
|
+will be platform-dependent. Upon completion of slot reset, the
|
|
|
+platform will call the device slot_reset() callback.
|
|
|
+
|
|
|
+Powerpc platforms implement two levels of slot reset:
|
|
|
+soft reset(default) and fundamental(optional) reset.
|
|
|
+
|
|
|
+Powerpc soft reset consists of asserting the adapter #RST line and then
|
|
|
restoring the PCI BAR's and PCI configuration header to a state
|
|
|
that is equivalent to what it would be after a fresh system
|
|
|
power-on followed by power-on BIOS/system firmware initialization.
|
|
|
+Soft reset is also known as hot-reset.
|
|
|
+
|
|
|
+Powerpc fundamental reset is supported by PCI Express cards only
|
|
|
+and results in device's state machines, hardware logic, port states and
|
|
|
+configuration registers to initialize to their default conditions.
|
|
|
+
|
|
|
+For most PCI devices, a soft reset will be sufficient for recovery.
|
|
|
+Optional fundamental reset is provided to support a limited number
|
|
|
+of PCI Express PCI devices for which a soft reset is not sufficient
|
|
|
+for recovery.
|
|
|
+
|
|
|
If the platform supports PCI hotplug, then the reset might be
|
|
|
performed by toggling the slot electrical power off/on.
|
|
|
|
|
@@ -274,10 +290,12 @@ may result in hung devices, kernel panics, or silent data corruption.
|
|
|
|
|
|
This call gives drivers the chance to re-initialize the hardware
|
|
|
(re-download firmware, etc.). At this point, the driver may assume
|
|
|
-that he card is in a fresh state and is fully functional. In
|
|
|
-particular, interrupt generation should work normally.
|
|
|
+that the card is in a fresh state and is fully functional. The slot
|
|
|
+is unfrozen and the driver has full access to PCI config space,
|
|
|
+memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
|
|
|
+will also be available.
|
|
|
|
|
|
-Drivers should not yet restart normal I/O processing operations
|
|
|
+Drivers should not restart normal I/O processing operations
|
|
|
at this point. If all device drivers report success on this
|
|
|
callback, the platform will call resume() to complete the sequence,
|
|
|
and let the driver restart normal I/O processing.
|
|
@@ -302,11 +320,21 @@ driver performs device init only from PCI function 0:
|
|
|
- PCI_ERS_RESULT_DISCONNECT
|
|
|
Same as above.
|
|
|
|
|
|
+Drivers for PCI Express cards that require a fundamental reset must
|
|
|
+set the needs_freset bit in the pci_dev structure in their probe function.
|
|
|
+For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
|
|
|
+PCI card types:
|
|
|
+
|
|
|
++ /* Set EEH reset type to fundamental if required by hba */
|
|
|
++ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
|
|
|
++ pdev->needs_freset = 1;
|
|
|
++
|
|
|
+
|
|
|
Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
|
|
|
Failure).
|
|
|
|
|
|
->>> The current powerpc implementation does not currently try a
|
|
|
->>> power-cycle reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
|
|
+>>> The current powerpc implementation does not try a power-cycle
|
|
|
+>>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
|
|
>>> However, it probably should.
|
|
|
|
|
|
|
|
@@ -348,7 +376,7 @@ software errors.
|
|
|
|
|
|
Conclusion; General Remarks
|
|
|
---------------------------
|
|
|
-The way those callbacks are called is platform policy. A platform with
|
|
|
+The way the callbacks are called is platform policy. A platform with
|
|
|
no slot reset capability may want to just "ignore" drivers that can't
|
|
|
recover (disconnect them) and try to let other cards on the same segment
|
|
|
recover. Keep in mind that in most real life cases, though, there will
|
|
@@ -361,8 +389,8 @@ That is, the recovery API only requires that:
|
|
|
|
|
|
- There is no guarantee that interrupt delivery can proceed from any
|
|
|
device on the segment starting from the error detection and until the
|
|
|
-resume callback is sent, at which point interrupts are expected to be
|
|
|
-fully operational.
|
|
|
+slot_reset callback is called, at which point interrupts are expected
|
|
|
+to be fully operational.
|
|
|
|
|
|
- There is no guarantee that interrupt delivery is stopped, that is,
|
|
|
a driver that gets an interrupt after detecting an error, or that detects
|
|
@@ -381,16 +409,23 @@ anyway :)
|
|
|
>>> Implementation details for the powerpc platform are discussed in
|
|
|
>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
|
|
|
|
|
|
->>> As of this writing, there are six device drivers with patches
|
|
|
->>> implementing error recovery. Not all of these patches are in
|
|
|
+>>> As of this writing, there is a growing list of device drivers with
|
|
|
+>>> patches implementing error recovery. Not all of these patches are in
|
|
|
>>> mainline yet. These may be used as "examples":
|
|
|
>>>
|
|
|
->>> drivers/scsi/ipr.c
|
|
|
->>> drivers/scsi/sym53cxx_2
|
|
|
+>>> drivers/scsi/ipr
|
|
|
+>>> drivers/scsi/sym53c8xx_2
|
|
|
+>>> drivers/scsi/qla2xxx
|
|
|
+>>> drivers/scsi/lpfc
|
|
|
+>>> drivers/next/bnx2.c
|
|
|
>>> drivers/next/e100.c
|
|
|
>>> drivers/net/e1000
|
|
|
+>>> drivers/net/e1000e
|
|
|
>>> drivers/net/ixgb
|
|
|
+>>> drivers/net/ixgbe
|
|
|
+>>> drivers/net/cxgb3
|
|
|
>>> drivers/net/s2io.c
|
|
|
+>>> drivers/net/qlge
|
|
|
|
|
|
The End
|
|
|
-------
|