|
@@ -10,8 +10,8 @@ Tracepoints (see Documentation/trace/tracepoints.txt) can be used without
|
|
creating custom kernel modules to register probe functions using the event
|
|
creating custom kernel modules to register probe functions using the event
|
|
tracing infrastructure.
|
|
tracing infrastructure.
|
|
|
|
|
|
-Simplistically, tracepoints will represent an important event that when can
|
|
|
|
-be taken in conjunction with other tracepoints to build a "Big Picture" of
|
|
|
|
|
|
+Simplistically, tracepoints represent important events that can be
|
|
|
|
+taken in conjunction with other tracepoints to build a "Big Picture" of
|
|
what is going on within the system. There are a large number of methods for
|
|
what is going on within the system. There are a large number of methods for
|
|
gathering and interpreting these events. Lacking any current Best Practises,
|
|
gathering and interpreting these events. Lacking any current Best Practises,
|
|
this document describes some of the methods that can be used.
|
|
this document describes some of the methods that can be used.
|
|
@@ -33,12 +33,12 @@ calling
|
|
|
|
|
|
will give a fair indication of the number of events available.
|
|
will give a fair indication of the number of events available.
|
|
|
|
|
|
-2.2 PCL
|
|
|
|
|
|
+2.2 PCL (Performance Counters for Linux)
|
|
-------
|
|
-------
|
|
|
|
|
|
-Discovery and enumeration of all counters and events, including tracepoints
|
|
|
|
|
|
+Discovery and enumeration of all counters and events, including tracepoints,
|
|
are available with the perf tool. Getting a list of available events is a
|
|
are available with the perf tool. Getting a list of available events is a
|
|
-simple case of
|
|
|
|
|
|
+simple case of:
|
|
|
|
|
|
$ perf list 2>&1 | grep Tracepoint
|
|
$ perf list 2>&1 | grep Tracepoint
|
|
ext4:ext4_free_inode [Tracepoint event]
|
|
ext4:ext4_free_inode [Tracepoint event]
|
|
@@ -49,19 +49,19 @@ simple case of
|
|
[ .... remaining output snipped .... ]
|
|
[ .... remaining output snipped .... ]
|
|
|
|
|
|
|
|
|
|
-2. Enabling Events
|
|
|
|
|
|
+3. Enabling Events
|
|
==================
|
|
==================
|
|
|
|
|
|
-2.1 System-Wide Event Enabling
|
|
|
|
|
|
+3.1 System-Wide Event Enabling
|
|
------------------------------
|
|
------------------------------
|
|
|
|
|
|
See Documentation/trace/events.txt for a proper description on how events
|
|
See Documentation/trace/events.txt for a proper description on how events
|
|
can be enabled system-wide. A short example of enabling all events related
|
|
can be enabled system-wide. A short example of enabling all events related
|
|
-to page allocation would look something like
|
|
|
|
|
|
+to page allocation would look something like:
|
|
|
|
|
|
$ for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done
|
|
$ for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done
|
|
|
|
|
|
-2.2 System-Wide Event Enabling with SystemTap
|
|
|
|
|
|
+3.2 System-Wide Event Enabling with SystemTap
|
|
---------------------------------------------
|
|
---------------------------------------------
|
|
|
|
|
|
In SystemTap, tracepoints are accessible using the kernel.trace() function
|
|
In SystemTap, tracepoints are accessible using the kernel.trace() function
|
|
@@ -86,7 +86,7 @@ were allocating the pages.
|
|
print_count()
|
|
print_count()
|
|
}
|
|
}
|
|
|
|
|
|
-2.3 System-Wide Event Enabling with PCL
|
|
|
|
|
|
+3.3 System-Wide Event Enabling with PCL
|
|
---------------------------------------
|
|
---------------------------------------
|
|
|
|
|
|
By specifying the -a switch and analysing sleep, the system-wide events
|
|
By specifying the -a switch and analysing sleep, the system-wide events
|
|
@@ -107,16 +107,16 @@ for a duration of time can be examined.
|
|
Similarly, one could execute a shell and exit it as desired to get a report
|
|
Similarly, one could execute a shell and exit it as desired to get a report
|
|
at that point.
|
|
at that point.
|
|
|
|
|
|
-2.4 Local Event Enabling
|
|
|
|
|
|
+3.4 Local Event Enabling
|
|
------------------------
|
|
------------------------
|
|
|
|
|
|
Documentation/trace/ftrace.txt describes how to enable events on a per-thread
|
|
Documentation/trace/ftrace.txt describes how to enable events on a per-thread
|
|
basis using set_ftrace_pid.
|
|
basis using set_ftrace_pid.
|
|
|
|
|
|
-2.5 Local Event Enablement with PCL
|
|
|
|
|
|
+3.5 Local Event Enablement with PCL
|
|
-----------------------------------
|
|
-----------------------------------
|
|
|
|
|
|
-Events can be activate and tracked for the duration of a process on a local
|
|
|
|
|
|
+Events can be activated and tracked for the duration of a process on a local
|
|
basis using PCL such as follows.
|
|
basis using PCL such as follows.
|
|
|
|
|
|
$ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
|
|
$ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
|
|
@@ -131,18 +131,18 @@ basis using PCL such as follows.
|
|
|
|
|
|
0.973913387 seconds time elapsed
|
|
0.973913387 seconds time elapsed
|
|
|
|
|
|
-3. Event Filtering
|
|
|
|
|
|
+4. Event Filtering
|
|
==================
|
|
==================
|
|
|
|
|
|
Documentation/trace/ftrace.txt covers in-depth how to filter events in
|
|
Documentation/trace/ftrace.txt covers in-depth how to filter events in
|
|
ftrace. Obviously using grep and awk of trace_pipe is an option as well
|
|
ftrace. Obviously using grep and awk of trace_pipe is an option as well
|
|
as any script reading trace_pipe.
|
|
as any script reading trace_pipe.
|
|
|
|
|
|
-4. Analysing Event Variances with PCL
|
|
|
|
|
|
+5. Analysing Event Variances with PCL
|
|
=====================================
|
|
=====================================
|
|
|
|
|
|
Any workload can exhibit variances between runs and it can be important
|
|
Any workload can exhibit variances between runs and it can be important
|
|
-to know what the standard deviation in. By and large, this is left to the
|
|
|
|
|
|
+to know what the standard deviation is. By and large, this is left to the
|
|
performance analyst to do it by hand. In the event that the discrete event
|
|
performance analyst to do it by hand. In the event that the discrete event
|
|
occurrences are useful to the performance analyst, then perf can be used.
|
|
occurrences are useful to the performance analyst, then perf can be used.
|
|
|
|
|
|
@@ -166,7 +166,7 @@ In the event that some higher-level event is required that depends on some
|
|
aggregation of discrete events, then a script would need to be developed.
|
|
aggregation of discrete events, then a script would need to be developed.
|
|
|
|
|
|
Using --repeat, it is also possible to view how events are fluctuating over
|
|
Using --repeat, it is also possible to view how events are fluctuating over
|
|
-time on a system wide basis using -a and sleep.
|
|
|
|
|
|
+time on a system-wide basis using -a and sleep.
|
|
|
|
|
|
$ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
|
|
$ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
|
|
-e kmem:mm_pagevec_free \
|
|
-e kmem:mm_pagevec_free \
|
|
@@ -180,7 +180,7 @@ time on a system wide basis using -a and sleep.
|
|
|
|
|
|
1.002251757 seconds time elapsed ( +- 0.005% )
|
|
1.002251757 seconds time elapsed ( +- 0.005% )
|
|
|
|
|
|
-5. Higher-Level Analysis with Helper Scripts
|
|
|
|
|
|
+6. Higher-Level Analysis with Helper Scripts
|
|
============================================
|
|
============================================
|
|
|
|
|
|
When events are enabled the events that are triggering can be read from
|
|
When events are enabled the events that are triggering can be read from
|
|
@@ -190,11 +190,11 @@ be gathered on-line as appropriate. Examples of post-processing might include
|
|
|
|
|
|
o Reading information from /proc for the PID that triggered the event
|
|
o Reading information from /proc for the PID that triggered the event
|
|
o Deriving a higher-level event from a series of lower-level events.
|
|
o Deriving a higher-level event from a series of lower-level events.
|
|
- o Calculate latencies between two events
|
|
|
|
|
|
+ o Calculating latencies between two events
|
|
|
|
|
|
Documentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example
|
|
Documentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example
|
|
script that can read trace_pipe from STDIN or a copy of a trace. When used
|
|
script that can read trace_pipe from STDIN or a copy of a trace. When used
|
|
-on-line, it can be interrupted once to generate a report without existing
|
|
|
|
|
|
+on-line, it can be interrupted once to generate a report without exiting
|
|
and twice to exit.
|
|
and twice to exit.
|
|
|
|
|
|
Simplistically, the script just reads STDIN and counts up events but it
|
|
Simplistically, the script just reads STDIN and counts up events but it
|
|
@@ -212,12 +212,12 @@ also can do more such as
|
|
processes, the parent process responsible for creating all the helpers
|
|
processes, the parent process responsible for creating all the helpers
|
|
can be identified
|
|
can be identified
|
|
|
|
|
|
-6. Lower-Level Analysis with PCL
|
|
|
|
|
|
+7. Lower-Level Analysis with PCL
|
|
================================
|
|
================================
|
|
|
|
|
|
-There may also be a requirement to identify what functions with a program
|
|
|
|
|
|
+There may also be a requirement to identify what functions within a program
|
|
were generating events within the kernel. To begin this sort of analysis, the
|
|
were generating events within the kernel. To begin this sort of analysis, the
|
|
-data must be recorded. At the time of writing, this required root
|
|
|
|
|
|
+data must be recorded. At the time of writing, this required root:
|
|
|
|
|
|
$ perf record -c 1 \
|
|
$ perf record -c 1 \
|
|
-e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
|
|
-e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
|
|
@@ -253,11 +253,11 @@ perf report.
|
|
# (For more details, try: perf report --sort comm,dso,symbol)
|
|
# (For more details, try: perf report --sort comm,dso,symbol)
|
|
#
|
|
#
|
|
|
|
|
|
-According to this, the vast majority of events occured triggered on events
|
|
|
|
-within the VDSO. With simple binaries, this will often be the case so lets
|
|
|
|
|
|
+According to this, the vast majority of events triggered on events
|
|
|
|
+within the VDSO. With simple binaries, this will often be the case so let's
|
|
take a slightly different example. In the course of writing this, it was
|
|
take a slightly different example. In the course of writing this, it was
|
|
-noticed that X was generating an insane amount of page allocations so lets look
|
|
|
|
-at it
|
|
|
|
|
|
+noticed that X was generating an insane amount of page allocations so let's look
|
|
|
|
+at it:
|
|
|
|
|
|
$ perf record -c 1 -f \
|
|
$ perf record -c 1 -f \
|
|
-e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
|
|
-e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
|
|
@@ -280,8 +280,8 @@ This was interrupted after a few seconds and
|
|
# (For more details, try: perf report --sort comm,dso,symbol)
|
|
# (For more details, try: perf report --sort comm,dso,symbol)
|
|
#
|
|
#
|
|
|
|
|
|
-So, almost half of the events are occuring in a library. To get an idea which
|
|
|
|
-symbol.
|
|
|
|
|
|
+So, almost half of the events are occurring in a library. To get an idea which
|
|
|
|
+symbol:
|
|
|
|
|
|
$ perf report --sort comm,dso,symbol
|
|
$ perf report --sort comm,dso,symbol
|
|
# Samples: 27666
|
|
# Samples: 27666
|
|
@@ -297,7 +297,7 @@ symbol.
|
|
0.01% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] get_fast_path
|
|
0.01% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] get_fast_path
|
|
0.00% Xorg [kernel] [k] ftrace_trace_userstack
|
|
0.00% Xorg [kernel] [k] ftrace_trace_userstack
|
|
|
|
|
|
-To see where within the function pixmanFillsse2 things are going wrong
|
|
|
|
|
|
+To see where within the function pixmanFillsse2 things are going wrong:
|
|
|
|
|
|
$ perf annotate pixmanFillsse2
|
|
$ perf annotate pixmanFillsse2
|
|
[ ... ]
|
|
[ ... ]
|