compress_offload.txt 8.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188
  1. compress_offload.txt
  2. =====================
  3. Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
  4. Vinod Koul <vinod.koul@linux.intel.com>
  5. Overview
  6. Since its early days, the ALSA API was defined with PCM support or
  7. constant bitrates payloads such as IEC61937 in mind. Arguments and
  8. returned values in frames are the norm, making it a challenge to
  9. extend the existing API to compressed data streams.
  10. In recent years, audio digital signal processors (DSP) were integrated
  11. in system-on-chip designs, and DSPs are also integrated in audio
  12. codecs. Processing compressed data on such DSPs results in a dramatic
  13. reduction of power consumption compared to host-based
  14. processing. Support for such hardware has not been very good in Linux,
  15. mostly because of a lack of a generic API available in the mainline
  16. kernel.
  17. Rather than requiring a compatibility break with an API change of the
  18. ALSA PCM interface, a new 'Compressed Data' API is introduced to
  19. provide a control and data-streaming interface for audio DSPs.
  20. The design of this API was inspired by the 2-year experience with the
  21. Intel Moorestown SOC, with many corrections required to upstream the
  22. API in the mainline kernel instead of the staging tree and make it
  23. usable by others.
  24. Requirements
  25. The main requirements are:
  26. - separation between byte counts and time. Compressed formats may have
  27. a header per file, per frame, or no header at all. The payload size
  28. may vary from frame-to-frame. As a result, it is not possible to
  29. estimate reliably the duration of audio buffers when handling
  30. compressed data. Dedicated mechanisms are required to allow for
  31. reliable audio-video synchronization, which requires precise
  32. reporting of the number of samples rendered at any given time.
  33. - Handling of multiple formats. PCM data only requires a specification
  34. of the sampling rate, number of channels and bits per sample. In
  35. contrast, compressed data comes in a variety of formats. Audio DSPs
  36. may also provide support for a limited number of audio encoders and
  37. decoders embedded in firmware, or may support more choices through
  38. dynamic download of libraries.
  39. - Focus on main formats. This API provides support for the most
  40. popular formats used for audio and video capture and playback. It is
  41. likely that as audio compression technology advances, new formats
  42. will be added.
  43. - Handling of multiple configurations. Even for a given format like
  44. AAC, some implementations may support AAC multichannel but HE-AAC
  45. stereo. Likewise WMA10 level M3 may require too much memory and cpu
  46. cycles. The new API needs to provide a generic way of listing these
  47. formats.
  48. - Rendering/Grabbing only. This API does not provide any means of
  49. hardware acceleration, where PCM samples are provided back to
  50. user-space for additional processing. This API focuses instead on
  51. streaming compressed data to a DSP, with the assumption that the
  52. decoded samples are routed to a physical output or logical back-end.
  53. - Complexity hiding. Existing user-space multimedia frameworks all
  54. have existing enums/structures for each compressed format. This new
  55. API assumes the existence of a platform-specific compatibility layer
  56. to expose, translate and make use of the capabilities of the audio
  57. DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
  58. applications are not supposed to make use of this API.
  59. Design
  60. The new API shares a number of concepts with with the PCM API for flow
  61. control. Start, pause, resume, drain and stop commands have the same
  62. semantics no matter what the content is.
  63. The concept of memory ring buffer divided in a set of fragments is
  64. borrowed from the ALSA PCM API. However, only sizes in bytes can be
  65. specified.
  66. Seeks/trick modes are assumed to be handled by the host.
  67. The notion of rewinds/forwards is not supported. Data committed to the
  68. ring buffer cannot be invalidated, except when dropping all buffers.
  69. The Compressed Data API does not make any assumptions on how the data
  70. is transmitted to the audio DSP. DMA transfers from main memory to an
  71. embedded audio cluster or to a SPI interface for external DSPs are
  72. possible. As in the ALSA PCM case, a core set of routines is exposed;
  73. each driver implementer will have to write support for a set of
  74. mandatory routines and possibly make use of optional ones.
  75. The main additions are
  76. - get_caps
  77. This routine returns the list of audio formats supported. Querying the
  78. codecs on a capture stream will return encoders, decoders will be
  79. listed for playback streams.
  80. - get_codec_caps For each codec, this routine returns a list of
  81. capabilities. The intent is to make sure all the capabilities
  82. correspond to valid settings, and to minimize the risks of
  83. configuration failures. For example, for a complex codec such as AAC,
  84. the number of channels supported may depend on a specific profile. If
  85. the capabilities were exposed with a single descriptor, it may happen
  86. that a specific combination of profiles/channels/formats may not be
  87. supported. Likewise, embedded DSPs have limited memory and cpu cycles,
  88. it is likely that some implementations make the list of capabilities
  89. dynamic and dependent on existing workloads. In addition to codec
  90. settings, this routine returns the minimum buffer size handled by the
  91. implementation. This information can be a function of the DMA buffer
  92. sizes, the number of bytes required to synchronize, etc, and can be
  93. used by userspace to define how much needs to be written in the ring
  94. buffer before playback can start.
  95. - set_params
  96. This routine sets the configuration chosen for a specific codec. The
  97. most important field in the parameters is the codec type; in most
  98. cases decoders will ignore other fields, while encoders will strictly
  99. comply to the settings
  100. - get_params
  101. This routines returns the actual settings used by the DSP. Changes to
  102. the settings should remain the exception.
  103. - get_timestamp
  104. The timestamp becomes a multiple field structure. It lists the number
  105. of bytes transferred, the number of samples processed and the number
  106. of samples rendered/grabbed. All these values can be used to determine
  107. the avarage bitrate, figure out if the ring buffer needs to be
  108. refilled or the delay due to decoding/encoding/io on the DSP.
  109. Note that the list of codecs/profiles/modes was derived from the
  110. OpenMAX AL specification instead of reinventing the wheel.
  111. Modifications include:
  112. - Addition of FLAC and IEC formats
  113. - Merge of encoder/decoder capabilities
  114. - Profiles/modes listed as bitmasks to make descriptors more compact
  115. - Addition of set_params for decoders (missing in OpenMAX AL)
  116. - Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
  117. - Addition of format information for WMA
  118. - Addition of encoding options when required (derived from OpenMAX IL)
  119. - Addition of rateControlSupported (missing in OpenMAX AL)
  120. Not supported:
  121. - Support for VoIP/circuit-switched calls is not the target of this
  122. API. Support for dynamic bit-rate changes would require a tight
  123. coupling between the DSP and the host stack, limiting power savings.
  124. - Packet-loss concealment is not supported. This would require an
  125. additional interface to let the decoder synthesize data when frames
  126. are lost during transmission. This may be added in the future.
  127. - Volume control/routing is not handled by this API. Devices exposing a
  128. compressed data interface will be considered as regular ALSA devices;
  129. volume changes and routing information will be provided with regular
  130. ALSA kcontrols.
  131. - Embedded audio effects. Such effects should be enabled in the same
  132. manner, no matter if the input was PCM or compressed.
  133. - multichannel IEC encoding. Unclear if this is required.
  134. - Encoding/decoding acceleration is not supported as mentioned
  135. above. It is possible to route the output of a decoder to a capture
  136. stream, or even implement transcoding capabilities. This routing
  137. would be enabled with ALSA kcontrols.
  138. - Audio policy/resource management. This API does not provide any
  139. hooks to query the utilization of the audio DSP, nor any premption
  140. mechanisms.
  141. - No notion of underun/overrun. Since the bytes written are compressed
  142. in nature and data written/read doesn't translate directly to
  143. rendered output in time, this does not deal with underrun/overun and
  144. maybe dealt in user-library
  145. Credits:
  146. - Mark Brown and Liam Girdwood for discussions on the need for this API
  147. - Harsha Priya for her work on intel_sst compressed API
  148. - Rakesh Ughreja for valuable feedback
  149. - Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
  150. demonstrating and quantifying the benefits of audio offload on a
  151. real platform.