Browse Source

mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem

There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
per-cpu bitmaps.  Provide comments so that we all know exactly what these
are used for, and why.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Naveen N. Rao 12 years ago
parent
commit
0644414e62
2 changed files with 16 additions and 1 deletions
  1. 4 1
      arch/x86/kernel/cpu/mcheck/mce.c
  2. 12 0
      arch/x86/kernel/cpu/mcheck/mce_intel.c

+ 4 - 1
arch/x86/kernel/cpu/mcheck/mce.c

@@ -89,7 +89,10 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
 static DEFINE_PER_CPU(struct mce, mces_seen);
 static int			cpu_missing;
 
-/* MCA banks polled by the period polling timer for corrected events */
+/*
+ * MCA banks polled by the period polling timer for corrected events.
+ * With Intel CMCI, this only has MCA banks which do not support CMCI (if any).
+ */
 DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
 	[0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
 };

+ 12 - 0
arch/x86/kernel/cpu/mcheck/mce_intel.c

@@ -24,6 +24,18 @@
  * Also supports reliable discovery of shared banks.
  */
 
+/*
+ * CMCI can be delivered to multiple cpus that share a machine check bank
+ * so we need to designate a single cpu to process errors logged in each bank
+ * in the interrupt handler (otherwise we would have many races and potential
+ * double reporting of the same error).
+ * Note that this can change when a cpu is offlined or brought online since
+ * some MCA banks are shared across cpus. When a cpu is offlined, cmci_clear()
+ * disables CMCI on all banks owned by the cpu and clears this bitfield. At
+ * this point, cmci_rediscover() kicks in and a different cpu may end up
+ * taking ownership of some of the shared MCA banks that were previously
+ * owned by the offlined cpu.
+ */
 static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
 
 /*