async-tx-api.txt 8.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219
  1. Asynchronous Transfers/Transforms API
  2. 1 INTRODUCTION
  3. 2 GENEALOGY
  4. 3 USAGE
  5. 3.1 General format of the API
  6. 3.2 Supported operations
  7. 3.3 Descriptor management
  8. 3.4 When does the operation execute?
  9. 3.5 When does the operation complete?
  10. 3.6 Constraints
  11. 3.7 Example
  12. 4 DRIVER DEVELOPER NOTES
  13. 4.1 Conformance points
  14. 4.2 "My application needs finer control of hardware channels"
  15. 5 SOURCE
  16. ---
  17. 1 INTRODUCTION
  18. The async_tx API provides methods for describing a chain of asynchronous
  19. bulk memory transfers/transforms with support for inter-transactional
  20. dependencies. It is implemented as a dmaengine client that smooths over
  21. the details of different hardware offload engine implementations. Code
  22. that is written to the API can optimize for asynchronous operation and
  23. the API will fit the chain of operations to the available offload
  24. resources.
  25. 2 GENEALOGY
  26. The API was initially designed to offload the memory copy and
  27. xor-parity-calculations of the md-raid5 driver using the offload engines
  28. present in the Intel(R) Xscale series of I/O processors. It also built
  29. on the 'dmaengine' layer developed for offloading memory copies in the
  30. network stack using Intel(R) I/OAT engines. The following design
  31. features surfaced as a result:
  32. 1/ implicit synchronous path: users of the API do not need to know if
  33. the platform they are running on has offload capabilities. The
  34. operation will be offloaded when an engine is available and carried out
  35. in software otherwise.
  36. 2/ cross channel dependency chains: the API allows a chain of dependent
  37. operations to be submitted, like xor->copy->xor in the raid5 case. The
  38. API automatically handles cases where the transition from one operation
  39. to another implies a hardware channel switch.
  40. 3/ dmaengine extensions to support multiple clients and operation types
  41. beyond 'memcpy'
  42. 3 USAGE
  43. 3.1 General format of the API:
  44. struct dma_async_tx_descriptor *
  45. async_<operation>(<op specific parameters>,
  46. enum async_tx_flags flags,
  47. struct dma_async_tx_descriptor *dependency,
  48. dma_async_tx_callback callback_routine,
  49. void *callback_parameter);
  50. 3.2 Supported operations:
  51. memcpy - memory copy between a source and a destination buffer
  52. memset - fill a destination buffer with a byte value
  53. xor - xor a series of source buffers and write the result to a
  54. destination buffer
  55. xor_zero_sum - xor a series of source buffers and set a flag if the
  56. result is zero. The implementation attempts to prevent
  57. writes to memory
  58. 3.3 Descriptor management:
  59. The return value is non-NULL and points to a 'descriptor' when the operation
  60. has been queued to execute asynchronously. Descriptors are recycled
  61. resources, under control of the offload engine driver, to be reused as
  62. operations complete. When an application needs to submit a chain of
  63. operations it must guarantee that the descriptor is not automatically recycled
  64. before the dependency is submitted. This requires that all descriptors be
  65. acknowledged by the application before the offload engine driver is allowed to
  66. recycle (or free) the descriptor. A descriptor can be acked by one of the
  67. following methods:
  68. 1/ setting the ASYNC_TX_ACK flag if no child operations are to be submitted
  69. 2/ setting the ASYNC_TX_DEP_ACK flag to acknowledge the parent
  70. descriptor of a new operation.
  71. 3/ calling async_tx_ack() on the descriptor.
  72. 3.4 When does the operation execute?
  73. Operations do not immediately issue after return from the
  74. async_<operation> call. Offload engine drivers batch operations to
  75. improve performance by reducing the number of mmio cycles needed to
  76. manage the channel. Once a driver-specific threshold is met the driver
  77. automatically issues pending operations. An application can force this
  78. event by calling async_tx_issue_pending_all(). This operates on all
  79. channels since the application has no knowledge of channel to operation
  80. mapping.
  81. 3.5 When does the operation complete?
  82. There are two methods for an application to learn about the completion
  83. of an operation.
  84. 1/ Call dma_wait_for_async_tx(). This call causes the CPU to spin while
  85. it polls for the completion of the operation. It handles dependency
  86. chains and issuing pending operations.
  87. 2/ Specify a completion callback. The callback routine runs in tasklet
  88. context if the offload engine driver supports interrupts, or it is
  89. called in application context if the operation is carried out
  90. synchronously in software. The callback can be set in the call to
  91. async_<operation>, or when the application needs to submit a chain of
  92. unknown length it can use the async_trigger_callback() routine to set a
  93. completion interrupt/callback at the end of the chain.
  94. 3.6 Constraints:
  95. 1/ Calls to async_<operation> are not permitted in IRQ context. Other
  96. contexts are permitted provided constraint #2 is not violated.
  97. 2/ Completion callback routines cannot submit new operations. This
  98. results in recursion in the synchronous case and spin_locks being
  99. acquired twice in the asynchronous case.
  100. 3.7 Example:
  101. Perform a xor->copy->xor operation where each operation depends on the
  102. result from the previous operation:
  103. void complete_xor_copy_xor(void *param)
  104. {
  105. printk("complete\n");
  106. }
  107. int run_xor_copy_xor(struct page **xor_srcs,
  108. int xor_src_cnt,
  109. struct page *xor_dest,
  110. size_t xor_len,
  111. struct page *copy_src,
  112. struct page *copy_dest,
  113. size_t copy_len)
  114. {
  115. struct dma_async_tx_descriptor *tx;
  116. tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len,
  117. ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL);
  118. tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len,
  119. ASYNC_TX_DEP_ACK, tx, NULL, NULL);
  120. tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len,
  121. ASYNC_TX_XOR_DROP_DST | ASYNC_TX_DEP_ACK | ASYNC_TX_ACK,
  122. tx, complete_xor_copy_xor, NULL);
  123. async_tx_issue_pending_all();
  124. }
  125. See include/linux/async_tx.h for more information on the flags. See the
  126. ops_run_* and ops_complete_* routines in drivers/md/raid5.c for more
  127. implementation examples.
  128. 4 DRIVER DEVELOPMENT NOTES
  129. 4.1 Conformance points:
  130. There are a few conformance points required in dmaengine drivers to
  131. accommodate assumptions made by applications using the async_tx API:
  132. 1/ Completion callbacks are expected to happen in tasklet context
  133. 2/ dma_async_tx_descriptor fields are never manipulated in IRQ context
  134. 3/ Use async_tx_run_dependencies() in the descriptor clean up path to
  135. handle submission of dependent operations
  136. 4.2 "My application needs finer control of hardware channels"
  137. This requirement seems to arise from cases where a DMA engine driver is
  138. trying to support device-to-memory DMA. The dmaengine and async_tx
  139. implementations were designed for offloading memory-to-memory
  140. operations; however, there are some capabilities of the dmaengine layer
  141. that can be used for platform-specific channel management.
  142. Platform-specific constraints can be handled by registering the
  143. application as a 'dma_client' and implementing a 'dma_event_callback' to
  144. apply a filter to the available channels in the system. Before showing
  145. how to implement a custom dma_event callback some background of
  146. dmaengine's client support is required.
  147. The following routines in dmaengine support multiple clients requesting
  148. use of a channel:
  149. - dma_async_client_register(struct dma_client *client)
  150. - dma_async_client_chan_request(struct dma_client *client)
  151. dma_async_client_register takes a pointer to an initialized dma_client
  152. structure. It expects that the 'event_callback' and 'cap_mask' fields
  153. are already initialized.
  154. dma_async_client_chan_request triggers dmaengine to notify the client of
  155. all channels that satisfy the capability mask. It is up to the client's
  156. event_callback routine to track how many channels the client needs and
  157. how many it is currently using. The dma_event_callback routine returns a
  158. dma_state_client code to let dmaengine know the status of the
  159. allocation.
  160. Below is the example of how to extend this functionality for
  161. platform-specific filtering of the available channels beyond the
  162. standard capability mask:
  163. static enum dma_state_client
  164. my_dma_client_callback(struct dma_client *client,
  165. struct dma_chan *chan, enum dma_state state)
  166. {
  167. struct dma_device *dma_dev;
  168. struct my_platform_specific_dma *plat_dma_dev;
  169. dma_dev = chan->device;
  170. plat_dma_dev = container_of(dma_dev,
  171. struct my_platform_specific_dma,
  172. dma_dev);
  173. if (!plat_dma_dev->platform_specific_capability)
  174. return DMA_DUP;
  175. . . .
  176. }
  177. 5 SOURCE
  178. include/linux/dmaengine.h: core header file for DMA drivers and clients
  179. drivers/dma/dmaengine.c: offload engine channel management routines
  180. drivers/dma/: location for offload engine drivers
  181. include/linux/async_tx.h: core header file for the async_tx api
  182. crypto/async_tx/async_tx.c: async_tx interface to dmaengine and common code
  183. crypto/async_tx/async_memcpy.c: copy offload
  184. crypto/async_tx/async_memset.c: memory fill offload
  185. crypto/async_tx/async_xor.c: xor and xor zero sum offload