|
@@ -0,0 +1,121 @@
|
|
|
+Kernel mode NEON
|
|
|
+================
|
|
|
+
|
|
|
+TL;DR summary
|
|
|
+-------------
|
|
|
+* Use only NEON instructions, or VFP instructions that don't rely on support
|
|
|
+ code
|
|
|
+* Isolate your NEON code in a separate compilation unit, and compile it with
|
|
|
+ '-mfpu=neon -mfloat-abi=softfp'
|
|
|
+* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
|
|
|
+ NEON code
|
|
|
+* Don't sleep in your NEON code, and be aware that it will be executed with
|
|
|
+ preemption disabled
|
|
|
+
|
|
|
+
|
|
|
+Introduction
|
|
|
+------------
|
|
|
+It is possible to use NEON instructions (and in some cases, VFP instructions) in
|
|
|
+code that runs in kernel mode. However, for performance reasons, the NEON/VFP
|
|
|
+register file is not preserved and restored at every context switch or taken
|
|
|
+exception like the normal register file is, so some manual intervention is
|
|
|
+required. Furthermore, special care is required for code that may sleep [i.e.,
|
|
|
+may call schedule()], as NEON or VFP instructions will be executed in a
|
|
|
+non-preemptible section for reasons outlined below.
|
|
|
+
|
|
|
+
|
|
|
+Lazy preserve and restore
|
|
|
+-------------------------
|
|
|
+The NEON/VFP register file is managed using lazy preserve (on UP systems) and
|
|
|
+lazy restore (on both SMP and UP systems). This means that the register file is
|
|
|
+kept 'live', and is only preserved and restored when multiple tasks are
|
|
|
+contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
|
|
|
+another core). Lazy restore is implemented by disabling the NEON/VFP unit after
|
|
|
+every context switch, resulting in a trap when subsequently a NEON/VFP
|
|
|
+instruction is issued, allowing the kernel to step in and perform the restore if
|
|
|
+necessary.
|
|
|
+
|
|
|
+Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
|
|
|
+it is required to do an 'eager' preserve of the NEON/VFP register file, and
|
|
|
+enable the NEON/VFP unit explicitly so no exceptions are generated on first
|
|
|
+subsequent use. This is handled by the function kernel_neon_begin(), which
|
|
|
+should be called before any kernel mode NEON or VFP instructions are issued.
|
|
|
+Likewise, the NEON/VFP unit should be disabled again after use to make sure user
|
|
|
+mode will hit the lazy restore trap upon next use. This is handled by the
|
|
|
+function kernel_neon_end().
|
|
|
+
|
|
|
+
|
|
|
+Interruptions in kernel mode
|
|
|
+----------------------------
|
|
|
+For reasons of performance and simplicity, it was decided that there shall be no
|
|
|
+preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
|
|
|
+implies that interruptions of a kernel mode NEON section can only be allowed if
|
|
|
+they are guaranteed not to touch the NEON/VFP registers. For this reason, the
|
|
|
+following rules and restrictions apply in the kernel:
|
|
|
+* NEON/VFP code is not allowed in interrupt context;
|
|
|
+* NEON/VFP code is not allowed to sleep;
|
|
|
+* NEON/VFP code is executed with preemption disabled.
|
|
|
+
|
|
|
+If latency is a concern, it is possible to put back to back calls to
|
|
|
+kernel_neon_end() and kernel_neon_begin() in places in your code where none of
|
|
|
+the NEON registers are live. (Additional calls to kernel_neon_begin() should be
|
|
|
+reasonably cheap if no context switch occurred in the meantime)
|
|
|
+
|
|
|
+
|
|
|
+VFP and support code
|
|
|
+--------------------
|
|
|
+Earlier versions of VFP (prior to version 3) rely on software support for things
|
|
|
+like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
|
|
|
+software assistance, it signals the kernel by raising an undefined instruction
|
|
|
+exception. The kernel responds by inspecting the VFP control registers and the
|
|
|
+current instruction and arguments, and emulates the instruction in software.
|
|
|
+
|
|
|
+Such software assistance is currently not implemented for VFP instructions
|
|
|
+executed in kernel mode. If such a condition is encountered, the kernel will
|
|
|
+fail and generate an OOPS.
|
|
|
+
|
|
|
+
|
|
|
+Separating NEON code from ordinary code
|
|
|
+---------------------------------------
|
|
|
+The compiler is not aware of the special significance of kernel_neon_begin() and
|
|
|
+kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
|
|
|
+between calls to these respective functions. Furthermore, GCC may generate NEON
|
|
|
+instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
|
|
|
+kernel is currently compiled at -O2, future changes may result in NEON/VFP
|
|
|
+instructions appearing in unexpected places if no special care is taken.
|
|
|
+
|
|
|
+Therefore, the recommended and only supported way of using NEON/VFP in the
|
|
|
+kernel is by adhering to the following rules:
|
|
|
+* isolate the NEON code in a separate compilation unit and compile it with
|
|
|
+ '-mfpu=neon -mfloat-abi=softfp';
|
|
|
+* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
|
|
|
+ into the unit containing the NEON code from a compilation unit which is *not*
|
|
|
+ built with the GCC flag '-mfpu=neon' set.
|
|
|
+
|
|
|
+As the kernel is compiled with '-msoft-float', the above will guarantee that
|
|
|
+both NEON and VFP instructions will only ever appear in designated compilation
|
|
|
+units at any optimization level.
|
|
|
+
|
|
|
+
|
|
|
+NEON assembler
|
|
|
+--------------
|
|
|
+NEON assembler is supported with no additional caveats as long as the rules
|
|
|
+above are followed.
|
|
|
+
|
|
|
+
|
|
|
+NEON code generated by GCC
|
|
|
+--------------------------
|
|
|
+The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
|
|
|
+parallelism, and generates NEON code from ordinary C source code. This is fully
|
|
|
+supported as long as the rules above are followed.
|
|
|
+
|
|
|
+
|
|
|
+NEON intrinsics
|
|
|
+---------------
|
|
|
+NEON intrinsics are also supported. However, as code using NEON intrinsics
|
|
|
+relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
|
|
|
+observe the following in addition to the rules above:
|
|
|
+* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
|
|
|
+ uses its builtin version of <stdint.h> (this is a C99 header which the kernel
|
|
|
+ does not supply);
|
|
|
+* Include <arm_neon.h> last, or at least after <linux/types.h>
|