13 years ago · 57bd9b8ddd
--- a/Documentation/sound/alsa/compress_offload.txt
+++ b/Documentation/sound/alsa/compress_offload.txt
@@ -0,0 +1,188 @@
 
				+		compress_offload.txt
			
 
				+		=====================
			
 
				+	Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
			
 
				+		Vinod Koul <vinod.koul@linux.intel.com>
			
 
				+
			
 
				+Overview
			
 
				+
			
 
				+Since its early days, the ALSA API was defined with PCM support or
			
 
				+constant bitrates payloads such as IEC61937 in mind. Arguments and
			
 
				+returned values in frames are the norm, making it a challenge to
			
 
				+extend the existing API to compressed data streams.
			
 
				+
			
 
				+In recent years, audio digital signal processors (DSP) were integrated
			
 
				+in system-on-chip designs, and DSPs are also integrated in audio
			
 
				+codecs. Processing compressed data on such DSPs results in a dramatic
			
 
				+reduction of power consumption compared to host-based
			
 
				+processing. Support for such hardware has not been very good in Linux,
			
 
				+mostly because of a lack of a generic API available in the mainline
			
 
				+kernel.
			
 
				+
			
 
				+Rather than requiring a compability break with an API change of the
			
 
				+ALSA PCM interface, a new 'Compressed Data' API is introduced to
			
 
				+provide a control and data-streaming interface for audio DSPs.
			
 
				+
			
 
				+The design of this API was inspired by the 2-year experience with the
			
 
				+Intel Moorestown SOC, with many corrections required to upstream the
			
 
				+API in the mainline kernel instead of the staging tree and make it
			
 
				+usable by others.
			
 
				+
			
 
				+Requirements
			
 
				+
			
 
				+The main requirements are:
			
 
				+
			
 
				+- separation between byte counts and time. Compressed formats may have
			
 
				+  a header per file, per frame, or no header at all. The payload size
			
 
				+  may vary from frame-to-frame. As a result, it is not possible to
			
 
				+  estimate reliably the duration of audio buffers when handling
			
 
				+  compressed data. Dedicated mechanisms are required to allow for
			
 
				+  reliable audio-video synchronization, which requires precise
			
 
				+  reporting of the number of samples rendered at any given time.
			
 
				+
			
 
				+- Handling of multiple formats. PCM data only requires a specification
			
 
				+  of the sampling rate, number of channels and bits per sample. In
			
 
				+  contrast, compressed data comes in a variety of formats. Audio DSPs
			
 
				+  may also provide support for a limited number of audio encoders and
			
 
				+  decoders embedded in firmware, or may support more choices through
			
 
				+  dynamic download of libraries.
			
 
				+
			
 
				+- Focus on main formats. This API provides support for the most
			
 
				+  popular formats used for audio and video capture and playback. It is
			
 
				+  likely that as audio compression technology advances, new formats
			
 
				+  will be added.
			
 
				+
			
 
				+- Handling of multiple configurations. Even for a given format like
			
 
				+  AAC, some implementations may support AAC multichannel but HE-AAC
			
 
				+  stereo. Likewise WMA10 level M3 may require too much memory and cpu
			
 
				+  cycles. The new API needs to provide a generic way of listing these
			
 
				+  formats.
			
 
				+
			
 
				+- Rendering/Grabbing only. This API does not provide any means of
			
 
				+  hardware acceleration, where PCM samples are provided back to
			
 
				+  user-space for additional processing. This API focuses instead on
			
 
				+  streaming compressed data to a DSP, with the assumption that the
			
 
				+  decoded samples are routed to a physical output or logical back-end.
			
 
				+
			
 
				+ - Complexity hiding. Existing user-space multimedia frameworks all
			
 
				+  have existing enums/structures for each compressed format. This new
			
 
				+  API assumes the existence of a platform-specific compatibility layer
			
 
				+  to expose, translate and make use of the capabilities of the audio
			
 
				+  DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
			
 
				+  applications are not supposed to make use of this API.
			
 
				+
			
 
				+
			
 
				+Design
			
 
				+
			
 
				+The new API shares a number of concepts with with the PCM API for flow
			
 
				+control. Start, pause, resume, drain and stop commands have the same
			
 
				+semantics no matter what the content is.
			
 
				+
			
 
				+The concept of memory ring buffer divided in a set of fragments is
			
 
				+borrowed from the ALSA PCM API. However, only sizes in bytes can be
			
 
				+specified.
			
 
				+
			
 
				+Seeks/trick modes are assumed to be handled by the host.
			
 
				+
			
 
				+The notion of rewinds/forwards is not supported. Data committed to the
			
 
				+ring buffer cannot be invalidated, except when dropping all buffers.
			
 
				+
			
 
				+The Compressed Data API does not make any assumptions on how the data
			
 
				+is transmitted to the audio DSP. DMA transfers from main memory to an
			
 
				+embedded audio cluster or to a SPI interface for external DSPs are
			
 
				+possible. As in the ALSA PCM case, a core set of routines is exposed;
			
 
				+each driver implementer will have to write support for a set of
			
 
				+mandatory routines and possibly make use of optional ones.
			
 
				+
			
 
				+The main additions are
			
 
				+
			
 
				+- get_caps
			
 
				+This routine returns the list of audio formats supported. Querying the
			
 
				+codecs on a capture stream will return encoders, decoders will be
			
 
				+listed for playback streams.
			
 
				+
			
 
				+- get_codec_caps For each codec, this routine returns a list of
			
 
				+capabilities. The intent is to make sure all the capabilities
			
 
				+correspond to valid settings, and to minimize the risks of
			
 
				+configuration failures. For example, for a complex codec such as AAC,
			
 
				+the number of channels supported may depend on a specific profile. If
			
 
				+the capabilities were exposed with a single descriptor, it may happen
			
 
				+that a specific combination of profiles/channels/formats may not be
			
 
				+supported. Likewise, embedded DSPs have limited memory and cpu cycles,
			
 
				+it is likely that some implementations make the list of capabilities
			
 
				+dynamic and dependent on existing workloads. In addition to codec
			
 
				+settings, this routine returns the minimum buffer size handled by the
			
 
				+implementation. This information can be a function of the DMA buffer
			
 
				+sizes, the number of bytes required to synchronize, etc, and can be
			
 
				+used by userspace to define how much needs to be written in the ring
			
 
				+buffer before playback can start.
			
 
				+
			
 
				+- set_params
			
 
				+This routine sets the configuration chosen for a specific codec. The
			
 
				+most important field in the parameters is the codec type; in most
			
 
				+cases decoders will ignore other fields, while encoders will strictly
			
 
				+comply to the settings
			
 
				+
			
 
				+- get_params
			
 
				+This routines returns the actual settings used by the DSP. Changes to
			
 
				+the settings should remain the exception.
			
 
				+
			
 
				+- get_timestamp
			
 
				+The timestamp becomes a multiple field structure. It lists the number
			
 
				+of bytes transferred, the number of samples processed and the number
			
 
				+of samples rendered/grabbed. All these values can be used to determine
			
 
				+the avarage bitrate, figure out if the ring buffer needs to be
			
 
				+refilled or the delay due to decoding/encoding/io on the DSP.
			
 
				+
			
 
				+Note that the list of codecs/profiles/modes was derived from the
			
 
				+OpenMAX AL specification instead of reinventing the wheel.
			
 
				+Modifications include:
			
 
				+- Addition of FLAC and IEC formats
			
 
				+- Merge of encoder/decoder capabilities
			
 
				+- Profiles/modes listed as bitmasks to make descriptors more compact
			
 
				+- Addition of set_params for decoders (missing in OpenMAX AL)
			
 
				+- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
			
 
				+- Addition of format information for WMA
			
 
				+- Addition of encoding options when required (derived from OpenMAX IL)
			
 
				+- Addition of rateControlSupported (missing in OpenMAX AL)
			
 
				+
			
 
				+Not supported:
			
 
				+
			
 
				+- Support for VoIP/circuit-switched calls is not the target of this
			
 
				+  API. Support for dynamic bit-rate changes would require a tight
			
 
				+  coupling between the DSP and the host stack, limiting power savings.
			
 
				+
			
 
				+- Packet-loss concealment is not supported. This would require an
			
 
				+  additional interface to let the decoder synthesize data when frames
			
 
				+  are lost during transmission. This may be added in the future.
			
 
				+
			
 
				+- Volume control/routing is not handled by this API. Devices exposing a
			
 
				+  compressed data interface will be considered as regular ALSA devices;
			
 
				+  volume changes and routing information will be provided with regular
			
 
				+  ALSA kcontrols.
			
 
				+
			
 
				+- Embedded audio effects. Such effects should be enabled in the same
			
 
				+  manner, no matter if the input was PCM or compressed.
			
 
				+
			
 
				+- multichannel IEC encoding. Unclear if this is required.
			
 
				+
			
 
				+- Encoding/decoding acceleration is not supported as mentioned
			
 
				+  above. It is possible to route the output of a decoder to a capture
			
 
				+  stream, or even implement transcoding capabilities. This routing
			
 
				+  would be enabled with ALSA kcontrols.
			
 
				+
			
 
				+- Audio policy/resource management. This API does not provide any
			
 
				+  hooks to query the utilization of the audio DSP, nor any premption
			
 
				+  mechanisms.
			
 
				+
			
 
				+- No notion of underun/overrun. Since the bytes written are compressed
			
 
				+  in nature and data written/read doesn't translate directly to
			
 
				+  rendered output in time, this does not deal with underrun/overun and
			
 
				+  maybe dealt in user-library
			
 
				+
			
 
				+Credits:
			
 
				+- Mark Brown and Liam Girdwood for discussions on the need for this API
			
 
				+- Harsha Priya for her work on intel_sst compressed API
			
 
				+- Rakesh Ughreja for valuable feedback
			
 
				+- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
			
 
				+  demonstrating and quantifying the benefits of audio offload on a
			
 
				+  real platform.