123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127 |
- Hypervisor-Assisted Dump
- ------------------------
- November 2007
- The goal of hypervisor-assisted dump is to enable the dump of
- a crashed system, and to do so from a fully-reset system, and
- to minimize the total elapsed time until the system is back
- in production use.
- As compared to kdump or other strategies, hypervisor-assisted
- dump offers several strong, practical advantages:
- -- Unlike kdump, the system has been reset, and loaded
- with a fresh copy of the kernel. In particular,
- PCI and I/O devices have been reinitialized and are
- in a clean, consistent state.
- -- As the dump is performed, the dumped memory becomes
- immediately available to the system for normal use.
- -- After the dump is completed, no further reboots are
- required; the system will be fully usable, and running
- in it's normal, production mode on it normal kernel.
- The above can only be accomplished by coordination with,
- and assistance from the hypervisor. The procedure is
- as follows:
- -- When a system crashes, the hypervisor will save
- the low 256MB of RAM to a previously registered
- save region. It will also save system state, system
- registers, and hardware PTE's.
- -- After the low 256MB area has been saved, the
- hypervisor will reset PCI and other hardware state.
- It will *not* clear RAM. It will then launch the
- bootloader, as normal.
- -- The freshly booted kernel will notice that there
- is a new node (ibm,dump-kernel) in the device tree,
- indicating that there is crash data available from
- a previous boot. It will boot into only 256MB of RAM,
- reserving the rest of system memory.
- -- Userspace tools will parse /sys/kernel/release_region
- and read /proc/vmcore to obtain the contents of memory,
- which holds the previous crashed kernel. The userspace
- tools may copy this info to disk, or network, nas, san,
- iscsi, etc. as desired.
- For Example: the values in /sys/kernel/release-region
- would look something like this (address-range pairs).
- CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
- DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
- -- As the userspace tools complete saving a portion of
- dump, they echo an offset and size to
- /sys/kernel/release_region to release the reserved
- memory back to general use.
- An example of this is:
- "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
- which will release 256MB at the 1GB boundary.
- Please note that the hypervisor-assisted dump feature
- is only available on Power6-based systems with recent
- firmware versions.
- Implementation details:
- ----------------------
- During boot, a check is made to see if firmware supports
- this feature on this particular machine. If it does, then
- we check to see if a active dump is waiting for us. If yes
- then everything but 256 MB of RAM is reserved during early
- boot. This area is released once we collect a dump from user
- land scripts that are run. If there is dump data, then
- the /sys/kernel/release_region file is created, and
- the reserved memory is held.
- If there is no waiting dump data, then only the highest
- 256MB of the ram is reserved as a scratch area. This area
- is *not* released: this region will be kept permanently
- reserved, so that it can act as a receptacle for a copy
- of the low 256MB in the case a crash does occur. See,
- however, "open issues" below, as to whether
- such a reserved region is really needed.
- Currently the dump will be copied from /proc/vmcore to a
- a new file upon user intervention. The starting address
- to be read and the range for each data point in provided
- in /sys/kernel/release_region.
- The tools to examine the dump will be same as the ones
- used for kdump.
- General notes:
- --------------
- Security: please note that there are potential security issues
- with any sort of dump mechanism. In particular, plaintext
- (unencrypted) data, and possibly passwords, may be present in
- the dump data. Userspace tools must take adequate precautions to
- preserve security.
- Open issues/ToDo:
- ------------
- o The various code paths that tell the hypervisor that a crash
- occurred, vs. it simply being a normal reboot, should be
- reviewed, and possibly clarified/fixed.
- o Instead of using /sys/kernel, should there be a /sys/dump
- instead? There is a dump_subsys being created by the s390 code,
- perhaps the pseries code should use a similar layout as well.
- o Is reserving a 256MB region really required? The goal of
- reserving a 256MB scratch area is to make sure that no
- important crash data is clobbered when the hypervisor
- save low mem to the scratch area. But, if one could assure
- that nothing important is located in some 256MB area, then
- it would not need to be reserved. Something that can be
- improved in subsequent versions.
- o Still working the kdump team to integrate this with kdump,
- some work remains but this would not affect the current
- patches.
- o Still need to write a shell script, to copy the dump away.
- Currently I am parsing it manually.
|