|
@@ -0,0 +1,126 @@
|
|
|
+ kmemtrace - Kernel Memory Tracer
|
|
|
+
|
|
|
+ by Eduard - Gabriel Munteanu
|
|
|
+ <eduard.munteanu@linux360.ro>
|
|
|
+
|
|
|
+I. Introduction
|
|
|
+===============
|
|
|
+
|
|
|
+kmemtrace helps kernel developers figure out two things:
|
|
|
+1) how different allocators (SLAB, SLUB etc.) perform
|
|
|
+2) how kernel code allocates memory and how much
|
|
|
+
|
|
|
+To do this, we trace every allocation and export information to the userspace
|
|
|
+through the relay interface. We export things such as the number of requested
|
|
|
+bytes, the number of bytes actually allocated (i.e. including internal
|
|
|
+fragmentation), whether this is a slab allocation or a plain kmalloc() and so
|
|
|
+on.
|
|
|
+
|
|
|
+The actual analysis is performed by a userspace tool (see section III for
|
|
|
+details on where to get it from). It logs the data exported by the kernel,
|
|
|
+processes it and (as of writing this) can provide the following information:
|
|
|
+- the total amount of memory allocated and fragmentation per call-site
|
|
|
+- the amount of memory allocated and fragmentation per allocation
|
|
|
+- total memory allocated and fragmentation in the collected dataset
|
|
|
+- number of cross-CPU allocation and frees (makes sense in NUMA environments)
|
|
|
+
|
|
|
+Moreover, it can potentially find inconsistent and erroneous behavior in
|
|
|
+kernel code, such as using slab free functions on kmalloc'ed memory or
|
|
|
+allocating less memory than requested (but not truly failed allocations).
|
|
|
+
|
|
|
+kmemtrace also makes provisions for tracing on some arch and analysing the
|
|
|
+data on another.
|
|
|
+
|
|
|
+II. Design and goals
|
|
|
+====================
|
|
|
+
|
|
|
+kmemtrace was designed to handle rather large amounts of data. Thus, it uses
|
|
|
+the relay interface to export whatever is logged to userspace, which then
|
|
|
+stores it. Analysis and reporting is done asynchronously, that is, after the
|
|
|
+data is collected and stored. By design, it allows one to log and analyse
|
|
|
+on different machines and different arches.
|
|
|
+
|
|
|
+As of writing this, the ABI is not considered stable, though it might not
|
|
|
+change much. However, no guarantees are made about compatibility yet. When
|
|
|
+deemed stable, the ABI should still allow easy extension while maintaining
|
|
|
+backward compatibility. This is described further in Documentation/ABI.
|
|
|
+
|
|
|
+Summary of design goals:
|
|
|
+ - allow logging and analysis to be done across different machines
|
|
|
+ - be fast and anticipate usage in high-load environments (*)
|
|
|
+ - be reasonably extensible
|
|
|
+ - make it possible for GNU/Linux distributions to have kmemtrace
|
|
|
+ included in their repositories
|
|
|
+
|
|
|
+(*) - one of the reasons Pekka Enberg's original userspace data analysis
|
|
|
+ tool's code was rewritten from Perl to C (although this is more than a
|
|
|
+ simple conversion)
|
|
|
+
|
|
|
+
|
|
|
+III. Quick usage guide
|
|
|
+======================
|
|
|
+
|
|
|
+1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
|
|
|
+CONFIG_KMEMTRACE and CONFIG_DEFAULT_ENABLED).
|
|
|
+
|
|
|
+2) Get the userspace tool and build it:
|
|
|
+$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository
|
|
|
+$ cd kmemtrace-user/
|
|
|
+$ ./autogen.sh
|
|
|
+$ ./configure
|
|
|
+$ make
|
|
|
+
|
|
|
+3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
|
|
|
+'single' runlevel (so that relay buffers don't fill up easily), and run
|
|
|
+kmemtrace:
|
|
|
+# '$' does not mean user, but root here.
|
|
|
+$ mount -t debugfs none /sys/kernel/debug
|
|
|
+$ mount -t proc none /proc
|
|
|
+$ cd path/to/kmemtrace-user/
|
|
|
+$ ./kmemtraced
|
|
|
+Wait a bit, then stop it with CTRL+C.
|
|
|
+$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
|
|
|
+ # overrun, should
|
|
|
+ # be zero.
|
|
|
+$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
|
|
|
+ check its correctness]
|
|
|
+$ ./kmemtrace-report
|
|
|
+
|
|
|
+Now you should have a nice and short summary of how the allocator performs.
|
|
|
+
|
|
|
+IV. FAQ and known issues
|
|
|
+========================
|
|
|
+
|
|
|
+Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
|
|
|
+this? Should I worry?
|
|
|
+A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
|
|
|
+large the number is. You can fix it by supplying a higher
|
|
|
+'kmemtrace.subbufs=N' kernel parameter.
|
|
|
+---
|
|
|
+
|
|
|
+Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
|
|
|
+A: This is a bug and should be reported. It can occur for a variety of
|
|
|
+reasons:
|
|
|
+ - possible bugs in relay code
|
|
|
+ - possible misuse of relay by kmemtrace
|
|
|
+ - timestamps being collected unorderly
|
|
|
+Or you may fix it yourself and send us a patch.
|
|
|
+---
|
|
|
+
|
|
|
+Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
|
|
|
+A: This is a known issue and I'm working on it. These might be true errors
|
|
|
+in kernel code, which may have inconsistent behavior (e.g. allocating memory
|
|
|
+with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
|
|
|
+out this behavior may work with SLAB, but may fail with other allocators.
|
|
|
+
|
|
|
+It may also be due to lack of tracing in some unusual allocator functions.
|
|
|
+
|
|
|
+We don't want bug reports regarding this issue yet.
|
|
|
+---
|
|
|
+
|
|
|
+V. See also
|
|
|
+===========
|
|
|
+
|
|
|
+Documentation/kernel-parameters.txt
|
|
|
+Documentation/ABI/testing/debugfs-kmemtrace
|
|
|
+
|