kdump.txt 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
  1. Documentation for kdump - the kexec-based crash dumping solution
  2. ================================================================
  3. DESIGN
  4. ======
  5. Kdump uses kexec to reboot to a second kernel whenever a dump needs to be taken.
  6. This second kernel is booted with very little memory. The first kernel reserves
  7. the section of memory that the second kernel uses. This ensures that on-going
  8. DMA from the first kernel does not corrupt the second kernel.
  9. All the necessary information about Core image is encoded in ELF format and
  10. stored in reserved area of memory before crash. Physical address of start of
  11. ELF header is passed to new kernel through command line parameter elfcorehdr=.
  12. On i386, the first 640 KB of physical memory is needed to boot, irrespective
  13. of where the kernel loads. Hence, this region is backed up by kexec just before
  14. rebooting into the new kernel.
  15. In the second kernel, "old memory" can be accessed in two ways.
  16. - The first one is through a /dev/oldmem device interface. A capture utility
  17. can read the device file and write out the memory in raw format. This is raw
  18. dump of memory and analysis/capture tool should be intelligent enough to
  19. determine where to look for the right information. ELF headers (elfcorehdr=)
  20. can become handy here.
  21. - The second interface is through /proc/vmcore. This exports the dump as an ELF
  22. format file which can be written out using any file copy command
  23. (cp, scp, etc). Further, gdb can be used to perform limited debugging on
  24. the dump file. This method ensures methods ensure that there is correct
  25. ordering of the dump pages (corresponding to the first 640 KB that has been
  26. relocated).
  27. SETUP
  28. =====
  29. 1) Download http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz
  30. and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch
  31. and after that build the source.
  32. 2) Download and build the appropriate (latest) kexec/kdump (-mm) kernel
  33. patchset and apply it to the vanilla kernel tree.
  34. Two kernels need to be built in order to get this feature working.
  35. A) First kernel:
  36. a) Enable "kexec system call" feature (in Processor type and features).
  37. CONFIG_KEXEC=y
  38. b) This kernel's physical load address should be the default value of
  39. 0x100000 (0x100000, 1 MB) (in Processor type and features).
  40. CONFIG_PHYSICAL_START=0x100000
  41. c) Enable "sysfs file system support" (in Pseudo filesystems).
  42. CONFIG_SYSFS=y
  43. d) Boot into first kernel with the command line parameter "crashkernel=Y@X".
  44. Use appropriate values for X and Y. Y denotes how much memory to reserve
  45. for the second kernel, and X denotes at what physical address the reserved
  46. memory section starts. For example: "crashkernel=64M@16M".
  47. B) Second kernel:
  48. a) Enable "kernel crash dumps" feature (in Processor type and features).
  49. CONFIG_CRASH_DUMP=y
  50. b) Specify a suitable value for "Physical address where the kernel is
  51. loaded" (in Processor type and features). Typically this value
  52. should be same as X (See option d) above, e.g., 16 MB or 0x1000000.
  53. CONFIG_PHYSICAL_START=0x1000000
  54. c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems).
  55. CONFIG_PROC_VMCORE=y
  56. Note: Options a) and b) depend upon "Configure standard kernel features
  57. (for small systems)" (under General setup).
  58. Option a) also depends on CONFIG_HIGHMEM (under Processor
  59. type and features).
  60. Both option a) and b) are under "Processor type and features".
  61. 3) Boot into the first kernel. You are now ready to try out kexec-based crash
  62. dumps.
  63. 4) Load the second kernel to be booted using:
  64. kexec -p <second-kernel> --crash-dump --args-linux --append="root=<root-dev>
  65. maxcpus=1 init 1"
  66. Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work,
  67. as of now.
  68. ii) By default ELF headers are stored in ELF32 format (for i386). This
  69. is sufficient to represent the physical memory up to 4GB. To store
  70. headers in ELF64 format, specifiy "--elf64-core-headers" on the
  71. kexec command line additionally.
  72. iii) For now (or until it is fixed), it's best to build the
  73. second-kernel without multi-processor support, i.e., make it
  74. a uniprocessor kernel.
  75. 5) System reboots into the second kernel when a panic occurs. A module can be
  76. written to force the panic, for testing purposes.
  77. 6) Write out the dump file using
  78. cp /proc/vmcore <dump-file>
  79. Dump memory can also be accessed as a /dev/oldmem device for a linear/raw
  80. view. To create the device, type:
  81. mknod /dev/oldmem c 1 12
  82. Use "dd" with suitable options for count, bs and skip to access specific
  83. portions of the dump.
  84. Entire memory: dd if=/dev/oldmem of=oldmem.001
  85. ANALYSIS
  86. ========
  87. Limited analysis can be done using gdb on the dump file copied out of
  88. /proc/vmcore. Use vmlinux built with -g and run
  89. gdb vmlinux <dump-file>
  90. Stack trace for the task on processor 0, register display, memory display
  91. work fine.
  92. Note: gdb cannot analyse core files generated in ELF64 format for i386.
  93. TODO
  94. ====
  95. 1) Provide a kernel pages filtering mechanism so that core file size is not
  96. insane on systems having huge memory banks.
  97. 2) Modify "crash" tool to make it recognize this dump.
  98. CONTACT
  99. =======
  100. Hariprasad Nellitheertha - hari at in dot ibm dot com
  101. Vivek Goyal (vgoyal@in.ibm.com)