Răsfoiți Sursa

x86-64: Handle byte-wise tail copying in memcpy() without a loop

While hard to measure, reducing the number of possibly/likely
mis-predicted branches can generally be expected to be slightly
better.

Other than apparent at the first glance, this also doesn't grow
the function size (the alignment gap to the next function just
gets smaller).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4F218584020000780006F422@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich 13 ani în urmă
părinte
comite
9d8e22777e
1 a modificat fișierele cu 10 adăugiri și 9 ștergeri
  1. 10 9
      arch/x86/lib/memcpy_64.S

+ 10 - 9
arch/x86/lib/memcpy_64.S

@@ -164,18 +164,19 @@ ENTRY(memcpy)
 	retq
 	retq
 	.p2align 4
 	.p2align 4
 .Lless_3bytes:
 .Lless_3bytes:
-	cmpl $0, %edx
-	je .Lend
+	subl $1, %edx
+	jb .Lend
 	/*
 	/*
 	 * Move data from 1 bytes to 3 bytes.
 	 * Move data from 1 bytes to 3 bytes.
 	 */
 	 */
-.Lloop_1:
-	movb (%rsi), %r8b
-	movb %r8b, (%rdi)
-	incq %rdi
-	incq %rsi
-	decl %edx
-	jnz .Lloop_1
+	movzbl (%rsi), %ecx
+	jz .Lstore_1byte
+	movzbq 1(%rsi), %r8
+	movzbq (%rsi, %rdx), %r9
+	movb %r8b, 1(%rdi)
+	movb %r9b, (%rdi, %rdx)
+.Lstore_1byte:
+	movb %cl, (%rdi)
 
 
 .Lend:
 .Lend:
 	retq
 	retq