|
@@ -8,6 +8,7 @@ Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
|
|
|
Modified by Paul Jackson <pj@sgi.com>
|
|
|
Modified by Christoph Lameter <clameter@sgi.com>
|
|
|
Modified by Paul Menage <menage@google.com>
|
|
|
+Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
|
|
|
|
|
|
CONTENTS:
|
|
|
=========
|
|
@@ -20,7 +21,8 @@ CONTENTS:
|
|
|
1.5 What is memory_pressure ?
|
|
|
1.6 What is memory spread ?
|
|
|
1.7 What is sched_load_balance ?
|
|
|
- 1.8 How do I use cpusets ?
|
|
|
+ 1.8 What is sched_relax_domain_level ?
|
|
|
+ 1.9 How do I use cpusets ?
|
|
|
2. Usage Examples and Syntax
|
|
|
2.1 Basic Usage
|
|
|
2.2 Adding/removing cpus
|
|
@@ -497,7 +499,73 @@ the cpuset code to update these sched domains, it compares the new
|
|
|
partition requested with the current, and updates its sched domains,
|
|
|
removing the old and adding the new, for each change.
|
|
|
|
|
|
-1.8 How do I use cpusets ?
|
|
|
+
|
|
|
+1.8 What is sched_relax_domain_level ?
|
|
|
+--------------------------------------
|
|
|
+
|
|
|
+In sched domain, the scheduler migrates tasks in 2 ways; periodic load
|
|
|
+balance on tick, and at time of some schedule events.
|
|
|
+
|
|
|
+When a task is woken up, scheduler try to move the task on idle CPU.
|
|
|
+For example, if a task A running on CPU X activates another task B
|
|
|
+on the same CPU X, and if CPU Y is X's sibling and performing idle,
|
|
|
+then scheduler migrate task B to CPU Y so that task B can start on
|
|
|
+CPU Y without waiting task A on CPU X.
|
|
|
+
|
|
|
+And if a CPU run out of tasks in its runqueue, the CPU try to pull
|
|
|
+extra tasks from other busy CPUs to help them before it is going to
|
|
|
+be idle.
|
|
|
+
|
|
|
+Of course it takes some searching cost to find movable tasks and/or
|
|
|
+idle CPUs, the scheduler might not search all CPUs in the domain
|
|
|
+everytime. In fact, in some architectures, the searching ranges on
|
|
|
+events are limited in the same socket or node where the CPU locates,
|
|
|
+while the load balance on tick searchs all.
|
|
|
+
|
|
|
+For example, assume CPU Z is relatively far from CPU X. Even if CPU Z
|
|
|
+is idle while CPU X and the siblings are busy, scheduler can't migrate
|
|
|
+woken task B from X to Z since it is out of its searching range.
|
|
|
+As the result, task B on CPU X need to wait task A or wait load balance
|
|
|
+on the next tick. For some applications in special situation, waiting
|
|
|
+1 tick may be too long.
|
|
|
+
|
|
|
+The 'sched_relax_domain_level' file allows you to request changing
|
|
|
+this searching range as you like. This file takes int value which
|
|
|
+indicates size of searching range in levels ideally as follows,
|
|
|
+otherwise initial value -1 that indicates the cpuset has no request.
|
|
|
+
|
|
|
+ -1 : no request. use system default or follow request of others.
|
|
|
+ 0 : no search.
|
|
|
+ 1 : search siblings (hyperthreads in a core).
|
|
|
+ 2 : search cores in a package.
|
|
|
+ 3 : search cpus in a node [= system wide on non-NUMA system]
|
|
|
+ ( 4 : search nodes in a chunk of node [on NUMA system] )
|
|
|
+ ( 5~ : search system wide [on NUMA system])
|
|
|
+
|
|
|
+This file is per-cpuset and affect the sched domain where the cpuset
|
|
|
+belongs to. Therefore if the flag 'sched_load_balance' of a cpuset
|
|
|
+is disabled, then 'sched_relax_domain_level' have no effect since
|
|
|
+there is no sched domain belonging the cpuset.
|
|
|
+
|
|
|
+If multiple cpusets are overlapping and hence they form a single sched
|
|
|
+domain, the largest value among those is used. Be careful, if one
|
|
|
+requests 0 and others are -1 then 0 is used.
|
|
|
+
|
|
|
+Note that modifying this file will have both good and bad effects,
|
|
|
+and whether it is acceptable or not will be depend on your situation.
|
|
|
+Don't modify this file if you are not sure.
|
|
|
+
|
|
|
+If your situation is:
|
|
|
+ - The migration costs between each cpu can be assumed considerably
|
|
|
+ small(for you) due to your special application's behavior or
|
|
|
+ special hardware support for CPU cache etc.
|
|
|
+ - The searching cost doesn't have impact(for you) or you can make
|
|
|
+ the searching cost enough small by managing cpuset to compact etc.
|
|
|
+ - The latency is required even it sacrifices cache hit rate etc.
|
|
|
+then increasing 'sched_relax_domain_level' would benefit you.
|
|
|
+
|
|
|
+
|
|
|
+1.9 How do I use cpusets ?
|
|
|
--------------------------
|
|
|
|
|
|
In order to minimize the impact of cpusets on critical kernel
|