MSI-HOWTO.txt 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503
  1. The MSI Driver Guide HOWTO
  2. Tom L Nguyen tom.l.nguyen@intel.com
  3. 10/03/2003
  4. Revised Feb 12, 2004 by Martine Silbermann
  5. email: Martine.Silbermann@hp.com
  6. Revised Jun 25, 2004 by Tom L Nguyen
  7. 1. About this guide
  8. This guide describes the basics of Message Signaled Interrupts (MSI),
  9. the advantages of using MSI over traditional interrupt mechanisms,
  10. and how to enable your driver to use MSI or MSI-X. Also included is
  11. a Frequently Asked Questions.
  12. 2. Copyright 2003 Intel Corporation
  13. 3. What is MSI/MSI-X?
  14. Message Signaled Interrupt (MSI), as described in the PCI Local Bus
  15. Specification Revision 2.3 or latest, is an optional feature, and a
  16. required feature for PCI Express devices. MSI enables a device function
  17. to request service by sending an Inbound Memory Write on its PCI bus to
  18. the FSB as a Message Signal Interrupt transaction. Because MSI is
  19. generated in the form of a Memory Write, all transaction conditions,
  20. such as a Retry, Master-Abort, Target-Abort or normal completion, are
  21. supported.
  22. A PCI device that supports MSI must also support pin IRQ assertion
  23. interrupt mechanism to provide backward compatibility for systems that
  24. do not support MSI. In Systems, which support MSI, the bus driver is
  25. responsible for initializing the message address and message data of
  26. the device function's MSI/MSI-X capability structure during device
  27. initial configuration.
  28. An MSI capable device function indicates MSI support by implementing
  29. the MSI/MSI-X capability structure in its PCI capability list. The
  30. device function may implement both the MSI capability structure and
  31. the MSI-X capability structure; however, the bus driver should not
  32. enable both.
  33. The MSI capability structure contains Message Control register,
  34. Message Address register and Message Data register. These registers
  35. provide the bus driver control over MSI. The Message Control register
  36. indicates the MSI capability supported by the device. The Message
  37. Address register specifies the target address and the Message Data
  38. register specifies the characteristics of the message. To request
  39. service, the device function writes the content of the Message Data
  40. register to the target address. The device and its software driver
  41. are prohibited from writing to these registers.
  42. The MSI-X capability structure is an optional extension to MSI. It
  43. uses an independent and separate capability structure. There are
  44. some key advantages to implementing the MSI-X capability structure
  45. over the MSI capability structure as described below.
  46. - Support a larger maximum number of vectors per function.
  47. - Provide the ability for system software to configure
  48. each vector with an independent message address and message
  49. data, specified by a table that resides in Memory Space.
  50. - MSI and MSI-X both support per-vector masking. Per-vector
  51. masking is an optional extension of MSI but a required
  52. feature for MSI-X. Per-vector masking provides the kernel
  53. the ability to mask/unmask MSI when servicing its software
  54. interrupt service routing handler. If per-vector masking is
  55. not supported, then the device driver should provide the
  56. hardware/software synchronization to ensure that the device
  57. generates MSI when the driver wants it to do so.
  58. 4. Why use MSI?
  59. As a benefit the simplification of board design, MSI allows board
  60. designers to remove out of band interrupt routing. MSI is another
  61. step towards a legacy-free environment.
  62. Due to increasing pressure on chipset and processor packages to
  63. reduce pin count, the need for interrupt pins is expected to
  64. diminish over time. Devices, due to pin constraints, may implement
  65. messages to increase performance.
  66. PCI Express endpoints uses INTx emulation (in-band messages) instead
  67. of IRQ pin assertion. Using INTx emulation requires interrupt
  68. sharing among devices connected to the same node (PCI bridge) while
  69. MSI is unique (non-shared) and does not require BIOS configuration
  70. support. As a result, the PCI Express technology requires MSI
  71. support for better interrupt performance.
  72. Using MSI enables the device functions to support two or more
  73. vectors, which can be configured to target different CPU's to
  74. increase scalability.
  75. 5. Configuring a driver to use MSI/MSI-X
  76. By default, the kernel will not enable MSI/MSI-X on all devices that
  77. support this capability. The CONFIG_PCI_MSI kernel option
  78. must be selected to enable MSI/MSI-X support.
  79. 5.1 Including MSI/MSI-X support into the kernel
  80. To allow MSI/MSI-X capable device drivers to selectively enable
  81. MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
  82. below), the VECTOR based scheme needs to be enabled by setting
  83. CONFIG_PCI_MSI during kernel config.
  84. Since the target of the inbound message is the local APIC, providing
  85. CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.
  86. 5.2 Configuring for MSI support
  87. Due to the non-contiguous fashion in vector assignment of the
  88. existing Linux kernel, this version does not support multiple
  89. messages regardless of a device function is capable of supporting
  90. more than one vector. To enable MSI on a device function's MSI
  91. capability structure requires a device driver to call the function
  92. pci_enable_msi() explicitly.
  93. 5.2.1 API pci_enable_msi
  94. int pci_enable_msi(struct pci_dev *dev)
  95. With this new API, any existing device driver, which like to have
  96. MSI enabled on its device function, must call this API to enable MSI
  97. A successful call will initialize the MSI capability structure
  98. with ONE vector, regardless of whether a device function is
  99. capable of supporting multiple messages. This vector replaces the
  100. pre-assigned dev->irq with a new MSI vector. To avoid the conflict
  101. of new assigned vector with existing pre-assigned vector requires
  102. a device driver to call this API before calling request_irq().
  103. 5.2.2 API pci_disable_msi
  104. void pci_disable_msi(struct pci_dev *dev)
  105. This API should always be used to undo the effect of pci_enable_msi()
  106. when a device driver is unloading. This API restores dev->irq with
  107. the pre-assigned IOAPIC vector and switches a device's interrupt
  108. mode to PCI pin-irq assertion/INTx emulation mode.
  109. Note that a device driver should always call free_irq() on MSI vector
  110. it has done request_irq() on before calling this API. Failure to do
  111. so results a BUG_ON() and a device will be left with MSI enabled and
  112. leaks its vector.
  113. 5.2.3 MSI mode vs. legacy mode diagram
  114. The below diagram shows the events, which switches the interrupt
  115. mode on the MSI-capable device function between MSI mode and
  116. PIN-IRQ assertion mode.
  117. ------------ pci_enable_msi ------------------------
  118. | | <=============== | |
  119. | MSI MODE | | PIN-IRQ ASSERTION MODE |
  120. | | ===============> | |
  121. ------------ pci_disable_msi ------------------------
  122. Figure 1.0 MSI Mode vs. Legacy Mode
  123. In Figure 1.0, a device operates by default in legacy mode. Legacy
  124. in this context means PCI pin-irq assertion or PCI-Express INTx
  125. emulation. A successful MSI request (using pci_enable_msi()) switches
  126. a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
  127. stored in dev->irq will be saved by the PCI subsystem and a new
  128. assigned MSI vector will replace dev->irq.
  129. To return back to its default mode, a device driver should always call
  130. pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
  131. device driver should always call free_irq() on MSI vector it has done
  132. request_irq() on before calling pci_disable_msi(). Failure to do so
  133. results a BUG_ON() and a device will be left with MSI enabled and
  134. leaks its vector. Otherwise, the PCI subsystem restores a device's
  135. dev->irq with a pre-assigned IOAPIC vector and marks released
  136. MSI vector as unused.
  137. Once being marked as unused, there is no guarantee that the PCI
  138. subsystem will reserve this MSI vector for a device. Depending on
  139. the availability of current PCI vector resources and the number of
  140. MSI/MSI-X requests from other drivers, this MSI may be re-assigned.
  141. For the case where the PCI subsystem re-assigned this MSI vector
  142. another driver, a request to switching back to MSI mode may result
  143. in being assigned a different MSI vector or a failure if no more
  144. vectors are available.
  145. 5.3 Configuring for MSI-X support
  146. Due to the ability of the system software to configure each vector of
  147. the MSI-X capability structure with an independent message address
  148. and message data, the non-contiguous fashion in vector assignment of
  149. the existing Linux kernel has no impact on supporting multiple
  150. messages on an MSI-X capable device functions. To enable MSI-X on
  151. a device function's MSI-X capability structure requires its device
  152. driver to call the function pci_enable_msix() explicitly.
  153. The function pci_enable_msix(), once invoked, enables either
  154. all or nothing, depending on the current availability of PCI vector
  155. resources. If the PCI vector resources are available for the number
  156. of vectors requested by a device driver, this function will configure
  157. the MSI-X table of the MSI-X capability structure of a device with
  158. requested messages. To emphasize this reason, for example, a device
  159. may be capable for supporting the maximum of 32 vectors while its
  160. software driver usually may request 4 vectors. It is recommended
  161. that the device driver should call this function once during the
  162. initialization phase of the device driver.
  163. Unlike the function pci_enable_msi(), the function pci_enable_msix()
  164. does not replace the pre-assigned IOAPIC dev->irq with a new MSI
  165. vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
  166. into the field vector of each element contained in a second argument.
  167. Note that the pre-assigned IO-APIC dev->irq is valid only if the device
  168. operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt of
  169. using dev->irq by the device driver to request for interrupt service
  170. may result unpredictabe behavior.
  171. For each MSI-X vector granted, a device driver is responsible to call
  172. other functions like request_irq(), enable_irq(), etc. to enable
  173. this vector with its corresponding interrupt service handler. It is
  174. a device driver's choice to assign all vectors with the same
  175. interrupt service handler or each vector with a unique interrupt
  176. service handler.
  177. 5.3.1 Handling MMIO address space of MSI-X Table
  178. The PCI 3.0 specification has implementation notes that MMIO address
  179. space for a device's MSI-X structure should be isolated so that the
  180. software system can set different page for controlling accesses to
  181. the MSI-X structure. The implementation of MSI patch requires the PCI
  182. subsystem, not a device driver, to maintain full control of the MSI-X
  183. table/MSI-X PBA and MMIO address space of the MSI-X table/MSI-X PBA.
  184. A device driver is prohibited from requesting the MMIO address space
  185. of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem will fail
  186. enabling MSI-X on its hardware device when it calls the function
  187. pci_enable_msix().
  188. 5.3.2 Handling MSI-X allocation
  189. Determining the number of MSI-X vectors allocated to a function is
  190. dependent on the number of MSI capable devices and MSI-X capable
  191. devices populated in the system. The policy of allocating MSI-X
  192. vectors to a function is defined as the following:
  193. #of MSI-X vectors allocated to a function = (x - y)/z where
  194. x = The number of available PCI vector resources by the time
  195. the device driver calls pci_enable_msix(). The PCI vector
  196. resources is the sum of the number of unassigned vectors
  197. (new) and the number of released vectors when any MSI/MSI-X
  198. device driver switches its hardware device back to a legacy
  199. mode or is hot-removed. The number of unassigned vectors
  200. may exclude some vectors reserved, as defined in parameter
  201. NR_HP_RESERVED_VECTORS, for the case where the system is
  202. capable of supporting hot-add/hot-remove operations. Users
  203. may change the value defined in NR_HR_RESERVED_VECTORS to
  204. meet their specific needs.
  205. y = The number of MSI capable devices populated in the system.
  206. This policy ensures that each MSI capable device has its
  207. vector reserved to avoid the case where some MSI-X capable
  208. drivers may attempt to claim all available vector resources.
  209. z = The number of MSI-X capable devices pupulated in the system.
  210. This policy ensures that maximum (x - y) is distributed
  211. evenly among MSI-X capable devices.
  212. Note that the PCI subsystem scans y and z during a bus enumeration.
  213. When the PCI subsystem completes configuring MSI/MSI-X capability
  214. structure of a device as requested by its device driver, y/z is
  215. decremented accordingly.
  216. 5.3.3 Handling MSI-X shortages
  217. For the case where fewer MSI-X vectors are allocated to a function
  218. than requested, the function pci_enable_msix() will return the
  219. maximum number of MSI-X vectors available to the caller. A device
  220. driver may re-send its request with fewer or equal vectors indicated
  221. in a return. For example, if a device driver requests 5 vectors, but
  222. the number of available vectors is 3 vectors, a value of 3 will be a
  223. return as a result of pci_enable_msix() call. A function could be
  224. designed for its driver to use only 3 MSI-X table entries as
  225. different combinations as ABC--, A-B-C, A--CB, etc. Note that this
  226. patch does not support multiple entries with the same vector. Such
  227. attempt by a device driver to use 5 MSI-X table entries with 3 vectors
  228. as ABBCC, AABCC, BCCBA, etc will result as a failure by the function
  229. pci_enable_msix(). Below are the reasons why supporting multiple
  230. entries with the same vector is an undesirable solution.
  231. - The PCI subsystem can not determine which entry, which
  232. generated the message, to mask/unmask MSI while handling
  233. software driver ISR. Attempting to walk through all MSI-X
  234. table entries (2048 max) to mask/unmask any match vector
  235. is an undesirable solution.
  236. - Walk through all MSI-X table entries (2048 max) to handle
  237. SMP affinity of any match vector is an undesirable solution.
  238. 5.3.4 API pci_enable_msix
  239. int pci_enable_msix(struct pci_dev *dev, u32 *entries, int nvec)
  240. This API enables a device driver to request the PCI subsystem
  241. for enabling MSI-X messages on its hardware device. Depending on
  242. the availability of PCI vectors resources, the PCI subsystem enables
  243. either all or nothing.
  244. Argument dev points to the device (pci_dev) structure.
  245. Argument entries is a pointer of unsigned integer type. The number of
  246. elements is indicated in argument nvec. The content of each element
  247. will be mapped to the following struct defined in /driver/pci/msi.h.
  248. struct msix_entry {
  249. u16 vector; /* kernel uses to write alloc vector */
  250. u16 entry; /* driver uses to specify entry */
  251. };
  252. A device driver is responsible for initializing the field entry of
  253. each element with unique entry supported by MSI-X table. Otherwise,
  254. -EINVAL will be returned as a result. A successful return of zero
  255. indicates the PCI subsystem completes initializing each of requested
  256. entries of the MSI-X table with message address and message data.
  257. Last but not least, the PCI subsystem will write the 1:1
  258. vector-to-entry mapping into the field vector of each element. A
  259. device driver is responsible of keeping track of allocated MSI-X
  260. vectors in its internal data structure.
  261. Argument nvec is an integer indicating the number of messages
  262. requested.
  263. A return of zero indicates that the number of MSI-X vectors is
  264. successfully allocated. A return of greater than zero indicates
  265. MSI-X vector shortage. Or a return of less than zero indicates
  266. a failure. This failure may be a result of duplicate entries
  267. specified in second argument, or a result of no available vector,
  268. or a result of failing to initialize MSI-X table entries.
  269. 5.3.5 API pci_disable_msix
  270. void pci_disable_msix(struct pci_dev *dev)
  271. This API should always be used to undo the effect of pci_enable_msix()
  272. when a device driver is unloading. Note that a device driver should
  273. always call free_irq() on all MSI-X vectors it has done request_irq()
  274. on before calling this API. Failure to do so results a BUG_ON() and
  275. a device will be left with MSI-X enabled and leaks its vectors.
  276. 5.3.6 MSI-X mode vs. legacy mode diagram
  277. The below diagram shows the events, which switches the interrupt
  278. mode on the MSI-X capable device function between MSI-X mode and
  279. PIN-IRQ assertion mode (legacy).
  280. ------------ pci_enable_msix(,,n) ------------------------
  281. | | <=============== | |
  282. | MSI-X MODE | | PIN-IRQ ASSERTION MODE |
  283. | | ===============> | |
  284. ------------ pci_disable_msix ------------------------
  285. Figure 2.0 MSI-X Mode vs. Legacy Mode
  286. In Figure 2.0, a device operates by default in legacy mode. A
  287. successful MSI-X request (using pci_enable_msix()) switches a
  288. device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector
  289. stored in dev->irq will be saved by the PCI subsystem; however,
  290. unlike MSI mode, the PCI subsystem will not replace dev->irq with
  291. assigned MSI-X vector because the PCI subsystem already writes the 1:1
  292. vector-to-entry mapping into the field vector of each element
  293. specified in second argument.
  294. To return back to its default mode, a device driver should always call
  295. pci_disable_msix() to undo the effect of pci_enable_msix(). Note that
  296. a device driver should always call free_irq() on all MSI-X vectors it
  297. has done request_irq() on before calling pci_disable_msix(). Failure
  298. to do so results a BUG_ON() and a device will be left with MSI-X
  299. enabled and leaks its vectors. Otherwise, the PCI subsystem switches a
  300. device function's interrupt mode from MSI-X mode to legacy mode and
  301. marks all allocated MSI-X vectors as unused.
  302. Once being marked as unused, there is no guarantee that the PCI
  303. subsystem will reserve these MSI-X vectors for a device. Depending on
  304. the availability of current PCI vector resources and the number of
  305. MSI/MSI-X requests from other drivers, these MSI-X vectors may be
  306. re-assigned.
  307. For the case where the PCI subsystem re-assigned these MSI-X vectors
  308. to other driver, a request to switching back to MSI-X mode may result
  309. being assigned with another set of MSI-X vectors or a failure if no
  310. more vectors are available.
  311. 5.4 Handling function implementng both MSI and MSI-X capabilities
  312. For the case where a function implements both MSI and MSI-X
  313. capabilities, the PCI subsystem enables a device to run either in MSI
  314. mode or MSI-X mode but not both. A device driver determines whether it
  315. wants MSI or MSI-X enabled on its hardware device. Once a device
  316. driver requests for MSI, for example, it is prohibited to request for
  317. MSI-X; in other words, a device driver is not permitted to ping-pong
  318. between MSI mod MSI-X mode during a run-time.
  319. 5.5 Hardware requirements for MSI/MSI-X support
  320. MSI/MSI-X support requires support from both system hardware and
  321. individual hardware device functions.
  322. 5.5.1 System hardware support
  323. Since the target of MSI address is the local APIC CPU, enabling
  324. MSI/MSI-X support in Linux kernel is dependent on whether existing
  325. system hardware supports local APIC. Users should verify their
  326. system whether it runs when CONFIG_X86_LOCAL_APIC=y.
  327. In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
  328. however, in UP environment, users must manually set
  329. CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
  330. CONFIG_PCI_MSI enables the VECTOR based scheme and
  331. the option for MSI-capable device drivers to selectively enable
  332. MSI/MSI-X.
  333. Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X
  334. vector is allocated new during runtime and MSI/MSI-X support does not
  335. depend on BIOS support. This key independency enables MSI/MSI-X
  336. support on future IOxAPIC free platform.
  337. 5.5.2 Device hardware support
  338. The hardware device function supports MSI by indicating the
  339. MSI/MSI-X capability structure on its PCI capability list. By
  340. default, this capability structure will not be initialized by
  341. the kernel to enable MSI during the system boot. In other words,
  342. the device function is running on its default pin assertion mode.
  343. Note that in many cases the hardware supporting MSI have bugs,
  344. which may result in system hang. The software driver of specific
  345. MSI-capable hardware is responsible for whether calling
  346. pci_enable_msi or not. A return of zero indicates the kernel
  347. successfully initializes the MSI/MSI-X capability structure of the
  348. device function. The device function is now running on MSI/MSI-X mode.
  349. 5.6 How to tell whether MSI/MSI-X is enabled on device function
  350. At the driver level, a return of zero from the function call of
  351. pci_enable_msi()/pci_enable_msix() indicates to a device driver that
  352. its device function is initialized successfully and ready to run in
  353. MSI/MSI-X mode.
  354. At the user level, users can use command 'cat /proc/interrupts'
  355. to display the vector allocated for a device and its interrupt
  356. MSI/MSI-X mode ("PCI MSI"/"PCI MSIX"). Below shows below MSI mode is
  357. enabled on a SCSI Adaptec 39320D Ultra320.
  358. CPU0 CPU1
  359. 0: 324639 0 IO-APIC-edge timer
  360. 1: 1186 0 IO-APIC-edge i8042
  361. 2: 0 0 XT-PIC cascade
  362. 12: 2797 0 IO-APIC-edge i8042
  363. 14: 6543 0 IO-APIC-edge ide0
  364. 15: 1 0 IO-APIC-edge ide1
  365. 169: 0 0 IO-APIC-level uhci-hcd
  366. 185: 0 0 IO-APIC-level uhci-hcd
  367. 193: 138 10 PCI MSI aic79xx
  368. 201: 30 0 PCI MSI aic79xx
  369. 225: 30 0 IO-APIC-level aic7xxx
  370. 233: 30 0 IO-APIC-level aic7xxx
  371. NMI: 0 0
  372. LOC: 324553 325068
  373. ERR: 0
  374. MIS: 0
  375. 6. FAQ
  376. Q1. Are there any limitations on using the MSI?
  377. A1. If the PCI device supports MSI and conforms to the
  378. specification and the platform supports the APIC local bus,
  379. then using MSI should work.
  380. Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
  381. AMD processors)? In P3 IPI's are transmitted on the APIC local
  382. bus and in P4 and Xeon they are transmitted on the system
  383. bus. Are there any implications with this?
  384. A2. MSI support enables a PCI device sending an inbound
  385. memory write (0xfeexxxxx as target address) on its PCI bus
  386. directly to the FSB. Since the message address has a
  387. redirection hint bit cleared, it should work.
  388. Q3. The target address 0xfeexxxxx will be translated by the
  389. Host Bridge into an interrupt message. Are there any
  390. limitations on the chipsets such as Intel 8xx, Intel e7xxx,
  391. or VIA?
  392. A3. If these chipsets support an inbound memory write with
  393. target address set as 0xfeexxxxx, as conformed to PCI
  394. specification 2.3 or latest, then it should work.
  395. Q4. From the driver point of view, if the MSI is lost because
  396. of the errors occur during inbound memory write, then it may
  397. wait for ever. Is there a mechanism for it to recover?
  398. A4. Since the target of the transaction is an inbound memory
  399. write, all transaction termination conditions (Retry,
  400. Master-Abort, Target-Abort, or normal completion) are
  401. supported. A device sending an MSI must abide by all the PCI
  402. rules and conditions regarding that inbound memory write. So,
  403. if a retry is signaled it must retry, etc... We believe that
  404. the recommendation for Abort is also a retry (refer to PCI
  405. specification 2.3 or latest).