MSI-HOWTO.txt 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511
  1. The MSI Driver Guide HOWTO
  2. Tom L Nguyen tom.l.nguyen@intel.com
  3. 10/03/2003
  4. Revised Feb 12, 2004 by Martine Silbermann
  5. email: Martine.Silbermann@hp.com
  6. Revised Jun 25, 2004 by Tom L Nguyen
  7. 1. About this guide
  8. This guide describes the basics of Message Signaled Interrupts (MSI),
  9. the advantages of using MSI over traditional interrupt mechanisms,
  10. and how to enable your driver to use MSI or MSI-X. Also included is
  11. a Frequently Asked Questions (FAQ) section.
  12. 1.1 Terminology
  13. PCI devices can be single-function or multi-function. In either case,
  14. when this text talks about enabling or disabling MSI on a "device
  15. function," it is referring to one specific PCI device and function and
  16. not to all functions on a PCI device (unless the PCI device has only
  17. one function).
  18. 2. Copyright 2003 Intel Corporation
  19. 3. What is MSI/MSI-X?
  20. Message Signaled Interrupt (MSI), as described in the PCI Local Bus
  21. Specification Revision 2.3 or later, is an optional feature, and a
  22. required feature for PCI Express devices. MSI enables a device function
  23. to request service by sending an Inbound Memory Write on its PCI bus to
  24. the FSB as a Message Signal Interrupt transaction. Because MSI is
  25. generated in the form of a Memory Write, all transaction conditions,
  26. such as a Retry, Master-Abort, Target-Abort or normal completion, are
  27. supported.
  28. A PCI device that supports MSI must also support pin IRQ assertion
  29. interrupt mechanism to provide backward compatibility for systems that
  30. do not support MSI. In systems which support MSI, the bus driver is
  31. responsible for initializing the message address and message data of
  32. the device function's MSI/MSI-X capability structure during device
  33. initial configuration.
  34. An MSI capable device function indicates MSI support by implementing
  35. the MSI/MSI-X capability structure in its PCI capability list. The
  36. device function may implement both the MSI capability structure and
  37. the MSI-X capability structure; however, the bus driver should not
  38. enable both.
  39. The MSI capability structure contains Message Control register,
  40. Message Address register and Message Data register. These registers
  41. provide the bus driver control over MSI. The Message Control register
  42. indicates the MSI capability supported by the device. The Message
  43. Address register specifies the target address and the Message Data
  44. register specifies the characteristics of the message. To request
  45. service, the device function writes the content of the Message Data
  46. register to the target address. The device and its software driver
  47. are prohibited from writing to these registers.
  48. The MSI-X capability structure is an optional extension to MSI. It
  49. uses an independent and separate capability structure. There are
  50. some key advantages to implementing the MSI-X capability structure
  51. over the MSI capability structure as described below.
  52. - Support a larger maximum number of vectors per function.
  53. - Provide the ability for system software to configure
  54. each vector with an independent message address and message
  55. data, specified by a table that resides in Memory Space.
  56. - MSI and MSI-X both support per-vector masking. Per-vector
  57. masking is an optional extension of MSI but a required
  58. feature for MSI-X. Per-vector masking provides the kernel the
  59. ability to mask/unmask a single MSI while running its
  60. interrupt service routine. If per-vector masking is
  61. not supported, then the device driver should provide the
  62. hardware/software synchronization to ensure that the device
  63. generates MSI when the driver wants it to do so.
  64. 4. Why use MSI?
  65. As a benefit to the simplification of board design, MSI allows board
  66. designers to remove out-of-band interrupt routing. MSI is another
  67. step towards a legacy-free environment.
  68. Due to increasing pressure on chipset and processor packages to
  69. reduce pin count, the need for interrupt pins is expected to
  70. diminish over time. Devices, due to pin constraints, may implement
  71. messages to increase performance.
  72. PCI Express endpoints uses INTx emulation (in-band messages) instead
  73. of IRQ pin assertion. Using INTx emulation requires interrupt
  74. sharing among devices connected to the same node (PCI bridge) while
  75. MSI is unique (non-shared) and does not require BIOS configuration
  76. support. As a result, the PCI Express technology requires MSI
  77. support for better interrupt performance.
  78. Using MSI enables the device functions to support two or more
  79. vectors, which can be configured to target different CPUs to
  80. increase scalability.
  81. 5. Configuring a driver to use MSI/MSI-X
  82. By default, the kernel will not enable MSI/MSI-X on all devices that
  83. support this capability. The CONFIG_PCI_MSI kernel option
  84. must be selected to enable MSI/MSI-X support.
  85. 5.1 Including MSI/MSI-X support into the kernel
  86. To allow MSI/MSI-X capable device drivers to selectively enable
  87. MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
  88. below), the VECTOR based scheme needs to be enabled by setting
  89. CONFIG_PCI_MSI during kernel config.
  90. Since the target of the inbound message is the local APIC, providing
  91. CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.
  92. 5.2 Configuring for MSI support
  93. Due to the non-contiguous fashion in vector assignment of the
  94. existing Linux kernel, this version does not support multiple
  95. messages regardless of a device function is capable of supporting
  96. more than one vector. To enable MSI on a device function's MSI
  97. capability structure requires a device driver to call the function
  98. pci_enable_msi() explicitly.
  99. 5.2.1 API pci_enable_msi
  100. int pci_enable_msi(struct pci_dev *dev)
  101. With this new API, a device driver that wants to have MSI
  102. enabled on its device function must call this API to enable MSI.
  103. A successful call will initialize the MSI capability structure
  104. with ONE vector, regardless of whether a device function is
  105. capable of supporting multiple messages. This vector replaces the
  106. pre-assigned dev->irq with a new MSI vector. To avoid a conflict
  107. of the new assigned vector with existing pre-assigned vector requires
  108. a device driver to call this API before calling request_irq().
  109. 5.2.2 API pci_disable_msi
  110. void pci_disable_msi(struct pci_dev *dev)
  111. This API should always be used to undo the effect of pci_enable_msi()
  112. when a device driver is unloading. This API restores dev->irq with
  113. the pre-assigned IOAPIC vector and switches a device's interrupt
  114. mode to PCI pin-irq assertion/INTx emulation mode.
  115. Note that a device driver should always call free_irq() on the MSI vector
  116. that it has done request_irq() on before calling this API. Failure to do
  117. so results in a BUG_ON() and a device will be left with MSI enabled and
  118. leaks its vector.
  119. 5.2.3 MSI mode vs. legacy mode diagram
  120. The below diagram shows the events which switch the interrupt
  121. mode on the MSI-capable device function between MSI mode and
  122. PIN-IRQ assertion mode.
  123. ------------ pci_enable_msi ------------------------
  124. | | <=============== | |
  125. | MSI MODE | | PIN-IRQ ASSERTION MODE |
  126. | | ===============> | |
  127. ------------ pci_disable_msi ------------------------
  128. Figure 1. MSI Mode vs. Legacy Mode
  129. In Figure 1, a device operates by default in legacy mode. Legacy
  130. in this context means PCI pin-irq assertion or PCI-Express INTx
  131. emulation. A successful MSI request (using pci_enable_msi()) switches
  132. a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
  133. stored in dev->irq will be saved by the PCI subsystem and a new
  134. assigned MSI vector will replace dev->irq.
  135. To return back to its default mode, a device driver should always call
  136. pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
  137. device driver should always call free_irq() on the MSI vector it has
  138. done request_irq() on before calling pci_disable_msi(). Failure to do
  139. so results in a BUG_ON() and a device will be left with MSI enabled and
  140. leaks its vector. Otherwise, the PCI subsystem restores a device's
  141. dev->irq with a pre-assigned IOAPIC vector and marks the released
  142. MSI vector as unused.
  143. Once being marked as unused, there is no guarantee that the PCI
  144. subsystem will reserve this MSI vector for a device. Depending on
  145. the availability of current PCI vector resources and the number of
  146. MSI/MSI-X requests from other drivers, this MSI may be re-assigned.
  147. For the case where the PCI subsystem re-assigns this MSI vector to
  148. another driver, a request to switch back to MSI mode may result
  149. in being assigned a different MSI vector or a failure if no more
  150. vectors are available.
  151. 5.3 Configuring for MSI-X support
  152. Due to the ability of the system software to configure each vector of
  153. the MSI-X capability structure with an independent message address
  154. and message data, the non-contiguous fashion in vector assignment of
  155. the existing Linux kernel has no impact on supporting multiple
  156. messages on an MSI-X capable device functions. To enable MSI-X on
  157. a device function's MSI-X capability structure requires its device
  158. driver to call the function pci_enable_msix() explicitly.
  159. The function pci_enable_msix(), once invoked, enables either
  160. all or nothing, depending on the current availability of PCI vector
  161. resources. If the PCI vector resources are available for the number
  162. of vectors requested by a device driver, this function will configure
  163. the MSI-X table of the MSI-X capability structure of a device with
  164. requested messages. To emphasize this reason, for example, a device
  165. may be capable for supporting the maximum of 32 vectors while its
  166. software driver usually may request 4 vectors. It is recommended
  167. that the device driver should call this function once during the
  168. initialization phase of the device driver.
  169. Unlike the function pci_enable_msi(), the function pci_enable_msix()
  170. does not replace the pre-assigned IOAPIC dev->irq with a new MSI
  171. vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
  172. into the field vector of each element contained in a second argument.
  173. Note that the pre-assigned IOAPIC dev->irq is valid only if the device
  174. operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at
  175. using dev->irq by the device driver to request for interrupt service
  176. may result in unpredictable behavior.
  177. For each MSI-X vector granted, a device driver is responsible for calling
  178. other functions like request_irq(), enable_irq(), etc. to enable
  179. this vector with its corresponding interrupt service handler. It is
  180. a device driver's choice to assign all vectors with the same
  181. interrupt service handler or each vector with a unique interrupt
  182. service handler.
  183. 5.3.1 Handling MMIO address space of MSI-X Table
  184. The PCI 3.0 specification has implementation notes that MMIO address
  185. space for a device's MSI-X structure should be isolated so that the
  186. software system can set different pages for controlling accesses to the
  187. MSI-X structure. The implementation of MSI support requires the PCI
  188. subsystem, not a device driver, to maintain full control of the MSI-X
  189. table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X
  190. table/MSI-X PBA. A device driver is prohibited from requesting the MMIO
  191. address space of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem
  192. will fail enabling MSI-X on its hardware device when it calls the function
  193. pci_enable_msix().
  194. 5.3.2 API pci_enable_msix
  195. int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
  196. This API enables a device driver to request the PCI subsystem
  197. to enable MSI-X messages on its hardware device. Depending on
  198. the availability of PCI vectors resources, the PCI subsystem enables
  199. either all or none of the requested vectors.
  200. Argument 'dev' points to the device (pci_dev) structure.
  201. Argument 'entries' is a pointer to an array of msix_entry structs.
  202. The number of entries is indicated in argument 'nvec'.
  203. struct msix_entry is defined in /driver/pci/msi.h:
  204. struct msix_entry {
  205. u16 vector; /* kernel uses to write alloc vector */
  206. u16 entry; /* driver uses to specify entry */
  207. };
  208. A device driver is responsible for initializing the field 'entry' of
  209. each element with a unique entry supported by MSI-X table. Otherwise,
  210. -EINVAL will be returned as a result. A successful return of zero
  211. indicates the PCI subsystem completed initializing each of the requested
  212. entries of the MSI-X table with message address and message data.
  213. Last but not least, the PCI subsystem will write the 1:1
  214. vector-to-entry mapping into the field 'vector' of each element. A
  215. device driver is responsible for keeping track of allocated MSI-X
  216. vectors in its internal data structure.
  217. A return of zero indicates that the number of MSI-X vectors was
  218. successfully allocated. A return of greater than zero indicates
  219. MSI-X vector shortage. Or a return of less than zero indicates
  220. a failure. This failure may be a result of duplicate entries
  221. specified in second argument, or a result of no available vector,
  222. or a result of failing to initialize MSI-X table entries.
  223. 5.3.3 API pci_disable_msix
  224. void pci_disable_msix(struct pci_dev *dev)
  225. This API should always be used to undo the effect of pci_enable_msix()
  226. when a device driver is unloading. Note that a device driver should
  227. always call free_irq() on all MSI-X vectors it has done request_irq()
  228. on before calling this API. Failure to do so results in a BUG_ON() and
  229. a device will be left with MSI-X enabled and leaks its vectors.
  230. 5.3.4 MSI-X mode vs. legacy mode diagram
  231. The below diagram shows the events which switch the interrupt
  232. mode on the MSI-X capable device function between MSI-X mode and
  233. PIN-IRQ assertion mode (legacy).
  234. ------------ pci_enable_msix(,,n) ------------------------
  235. | | <=============== | |
  236. | MSI-X MODE | | PIN-IRQ ASSERTION MODE |
  237. | | ===============> | |
  238. ------------ pci_disable_msix ------------------------
  239. Figure 2. MSI-X Mode vs. Legacy Mode
  240. In Figure 2, a device operates by default in legacy mode. A
  241. successful MSI-X request (using pci_enable_msix()) switches a
  242. device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector
  243. stored in dev->irq will be saved by the PCI subsystem; however,
  244. unlike MSI mode, the PCI subsystem will not replace dev->irq with
  245. assigned MSI-X vector because the PCI subsystem already writes the 1:1
  246. vector-to-entry mapping into the field 'vector' of each element
  247. specified in second argument.
  248. To return back to its default mode, a device driver should always call
  249. pci_disable_msix() to undo the effect of pci_enable_msix(). Note that
  250. a device driver should always call free_irq() on all MSI-X vectors it
  251. has done request_irq() on before calling pci_disable_msix(). Failure
  252. to do so results in a BUG_ON() and a device will be left with MSI-X
  253. enabled and leaks its vectors. Otherwise, the PCI subsystem switches a
  254. device function's interrupt mode from MSI-X mode to legacy mode and
  255. marks all allocated MSI-X vectors as unused.
  256. Once being marked as unused, there is no guarantee that the PCI
  257. subsystem will reserve these MSI-X vectors for a device. Depending on
  258. the availability of current PCI vector resources and the number of
  259. MSI/MSI-X requests from other drivers, these MSI-X vectors may be
  260. re-assigned.
  261. For the case where the PCI subsystem re-assigned these MSI-X vectors
  262. to other drivers, a request to switch back to MSI-X mode may result
  263. being assigned with another set of MSI-X vectors or a failure if no
  264. more vectors are available.
  265. 5.4 Handling function implementing both MSI and MSI-X capabilities
  266. For the case where a function implements both MSI and MSI-X
  267. capabilities, the PCI subsystem enables a device to run either in MSI
  268. mode or MSI-X mode but not both. A device driver determines whether it
  269. wants MSI or MSI-X enabled on its hardware device. Once a device
  270. driver requests for MSI, for example, it is prohibited from requesting
  271. MSI-X; in other words, a device driver is not permitted to ping-pong
  272. between MSI mod MSI-X mode during a run-time.
  273. 5.5 Hardware requirements for MSI/MSI-X support
  274. MSI/MSI-X support requires support from both system hardware and
  275. individual hardware device functions.
  276. 5.5.1 Required x86 hardware support
  277. Since the target of MSI address is the local APIC CPU, enabling
  278. MSI/MSI-X support in the Linux kernel is dependent on whether existing
  279. system hardware supports local APIC. Users should verify that their
  280. system supports local APIC operation by testing that it runs when
  281. CONFIG_X86_LOCAL_APIC=y.
  282. In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
  283. however, in UP environment, users must manually set
  284. CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
  285. CONFIG_PCI_MSI enables the VECTOR based scheme and the option for
  286. MSI-capable device drivers to selectively enable MSI/MSI-X.
  287. Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X
  288. vector is allocated new during runtime and MSI/MSI-X support does not
  289. depend on BIOS support. This key independency enables MSI/MSI-X
  290. support on future IOxAPIC free platforms.
  291. 5.5.2 Device hardware support
  292. The hardware device function supports MSI by indicating the
  293. MSI/MSI-X capability structure on its PCI capability list. By
  294. default, this capability structure will not be initialized by
  295. the kernel to enable MSI during the system boot. In other words,
  296. the device function is running on its default pin assertion mode.
  297. Note that in many cases the hardware supporting MSI have bugs,
  298. which may result in system hangs. The software driver of specific
  299. MSI-capable hardware is responsible for deciding whether to call
  300. pci_enable_msi or not. A return of zero indicates the kernel
  301. successfully initialized the MSI/MSI-X capability structure of the
  302. device function. The device function is now running on MSI/MSI-X mode.
  303. 5.6 How to tell whether MSI/MSI-X is enabled on device function
  304. At the driver level, a return of zero from the function call of
  305. pci_enable_msi()/pci_enable_msix() indicates to a device driver that
  306. its device function is initialized successfully and ready to run in
  307. MSI/MSI-X mode.
  308. At the user level, users can use the command 'cat /proc/interrupts'
  309. to display the vectors allocated for devices and their interrupt
  310. MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is
  311. enabled on a SCSI Adaptec 39320D Ultra320 controller.
  312. CPU0 CPU1
  313. 0: 324639 0 IO-APIC-edge timer
  314. 1: 1186 0 IO-APIC-edge i8042
  315. 2: 0 0 XT-PIC cascade
  316. 12: 2797 0 IO-APIC-edge i8042
  317. 14: 6543 0 IO-APIC-edge ide0
  318. 15: 1 0 IO-APIC-edge ide1
  319. 169: 0 0 IO-APIC-level uhci-hcd
  320. 185: 0 0 IO-APIC-level uhci-hcd
  321. 193: 138 10 PCI-MSI aic79xx
  322. 201: 30 0 PCI-MSI aic79xx
  323. 225: 30 0 IO-APIC-level aic7xxx
  324. 233: 30 0 IO-APIC-level aic7xxx
  325. NMI: 0 0
  326. LOC: 324553 325068
  327. ERR: 0
  328. MIS: 0
  329. 6. MSI quirks
  330. Several PCI chipsets or devices are known to not support MSI.
  331. The PCI stack provides 3 possible levels of MSI disabling:
  332. * on a single device
  333. * on all devices behind a specific bridge
  334. * globally
  335. 6.1. Disabling MSI on a single device
  336. Under some circumstances it might be required to disable MSI on a
  337. single device. This may be achieved by either not calling pci_enable_msi()
  338. or all, or setting the pci_dev->no_msi flag before (most of the time
  339. in a quirk).
  340. 6.2. Disabling MSI below a bridge
  341. The vast majority of MSI quirks are required by PCI bridges not
  342. being able to route MSI between busses. In this case, MSI have to be
  343. disabled on all devices behind this bridge. It is achieves by setting
  344. the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge
  345. subordinate bus. There is no need to set the same flag on bridges that
  346. are below the broken bridge. When pci_enable_msi() is called to enable
  347. MSI on a device, pci_msi_supported() takes care of checking the NO_MSI
  348. flag in all parent busses of the device.
  349. Some bridges actually support dynamic MSI support enabling/disabling
  350. by changing some bits in their PCI configuration space (especially
  351. the Hypertransport chipsets such as the nVidia nForce and Serverworks
  352. HT2000). It may then be required to update the NO_MSI flag on the
  353. corresponding devices in the sysfs hierarchy. To enable MSI support
  354. on device "0000:00:0e", do:
  355. echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus
  356. To disable MSI support, echo 0 instead of 1. Note that it should be
  357. used with caution since changing this value might break interrupts.
  358. 6.3. Disabling MSI globally
  359. Some extreme cases may require to disable MSI globally on the system.
  360. For now, the only known case is a Serverworks PCI-X chipsets (MSI are
  361. not supported on several busses that are not all connected to the
  362. chipset in the Linux PCI hierarchy). In the vast majority of other
  363. cases, disabling only behind a specific bridge is enough.
  364. For debugging purpose, the user may also pass pci=nomsi on the kernel
  365. command-line to explicitly disable MSI globally. But, once the appro-
  366. priate quirks are added to the kernel, this option should not be
  367. required anymore.
  368. 6.4. Finding why MSI cannot be enabled on a device
  369. Assuming that MSI are not enabled on a device, you should look at
  370. dmesg to find messages that quirks may output when disabling MSI
  371. on some devices, some bridges or even globally.
  372. Then, lspci -t gives the list of bridges above a device. Reading
  373. /sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI
  374. are enabled (1) or disabled (0). In 0 is found in a single bridge
  375. msi_bus file above the device, MSI cannot be enabled.
  376. 7. FAQ
  377. Q1. Are there any limitations on using the MSI?
  378. A1. If the PCI device supports MSI and conforms to the
  379. specification and the platform supports the APIC local bus,
  380. then using MSI should work.
  381. Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
  382. AMD processors)? In P3 IPI's are transmitted on the APIC local
  383. bus and in P4 and Xeon they are transmitted on the system
  384. bus. Are there any implications with this?
  385. A2. MSI support enables a PCI device sending an inbound
  386. memory write (0xfeexxxxx as target address) on its PCI bus
  387. directly to the FSB. Since the message address has a
  388. redirection hint bit cleared, it should work.
  389. Q3. The target address 0xfeexxxxx will be translated by the
  390. Host Bridge into an interrupt message. Are there any
  391. limitations on the chipsets such as Intel 8xx, Intel e7xxx,
  392. or VIA?
  393. A3. If these chipsets support an inbound memory write with
  394. target address set as 0xfeexxxxx, as conformed to PCI
  395. specification 2.3 or latest, then it should work.
  396. Q4. From the driver point of view, if the MSI is lost because
  397. of errors occurring during inbound memory write, then it may
  398. wait forever. Is there a mechanism for it to recover?
  399. A4. Since the target of the transaction is an inbound memory
  400. write, all transaction termination conditions (Retry,
  401. Master-Abort, Target-Abort, or normal completion) are
  402. supported. A device sending an MSI must abide by all the PCI
  403. rules and conditions regarding that inbound memory write. So,
  404. if a retry is signaled it must retry, etc... We believe that
  405. the recommendation for Abort is also a retry (refer to PCI
  406. specification 2.3 or latest).