switcher_32.S 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382
  1. /*P:900 This is the Switcher: code which sits at 0xFFC00000 astride both the
  2. * Host and Guest to do the low-level Guest<->Host switch. It is as simple as
  3. * it can be made, but it's naturally very specific to x86.
  4. *
  5. * You have now completed Preparation. If this has whet your appetite; if you
  6. * are feeling invigorated and refreshed then the next, more challenging stage
  7. * can be found in "make Guest". :*/
  8. /*M:012 Lguest is meant to be simple: my rule of thumb is that 1% more LOC must
  9. * gain at least 1% more performance. Since neither LOC nor performance can be
  10. * measured beforehand, it generally means implementing a feature then deciding
  11. * if it's worth it. And once it's implemented, who can say no?
  12. *
  13. * This is why I haven't implemented this idea myself. I want to, but I
  14. * haven't. You could, though.
  15. *
  16. * The main place where lguest performance sucks is Guest page faulting. When
  17. * a Guest userspace process hits an unmapped page we switch back to the Host,
  18. * walk the page tables, find it's not mapped, switch back to the Guest page
  19. * fault handler, which calls a hypercall to set the page table entry, then
  20. * finally returns to userspace. That's two round-trips.
  21. *
  22. * If we had a small walker in the Switcher, we could quickly check the Guest
  23. * page table and if the page isn't mapped, immediately reflect the fault back
  24. * into the Guest. This means the Switcher would have to know the top of the
  25. * Guest page table and the page fault handler address.
  26. *
  27. * For simplicity, the Guest should only handle the case where the privilege
  28. * level of the fault is 3 and probably only not present or write faults. It
  29. * should also detect recursive faults, and hand the original fault to the
  30. * Host (which is actually really easy).
  31. *
  32. * Two questions remain. Would the performance gain outweigh the complexity?
  33. * And who would write the verse documenting it? :*/
  34. /*M:011 Lguest64 handles NMI. This gave me NMI envy (until I looked at their
  35. * code). It's worth doing though, since it would let us use oprofile in the
  36. * Host when a Guest is running. :*/
  37. /*S:100
  38. * Welcome to the Switcher itself!
  39. *
  40. * This file contains the low-level code which changes the CPU to run the Guest
  41. * code, and returns to the Host when something happens. Understand this, and
  42. * you understand the heart of our journey.
  43. *
  44. * Because this is in assembler rather than C, our tale switches from prose to
  45. * verse. First I tried limericks:
  46. *
  47. * There once was an eax reg,
  48. * To which our pointer was fed,
  49. * It needed an add,
  50. * Which asm-offsets.h had
  51. * But this limerick is hurting my head.
  52. *
  53. * Next I tried haikus, but fitting the required reference to the seasons in
  54. * every stanza was quickly becoming tiresome:
  55. *
  56. * The %eax reg
  57. * Holds "struct lguest_pages" now:
  58. * Cherry blossoms fall.
  59. *
  60. * Then I started with Heroic Verse, but the rhyming requirement leeched away
  61. * the content density and led to some uniquely awful oblique rhymes:
  62. *
  63. * These constants are coming from struct offsets
  64. * For use within the asm switcher text.
  65. *
  66. * Finally, I settled for something between heroic hexameter, and normal prose
  67. * with inappropriate linebreaks. Anyway, it aint no Shakespeare.
  68. */
  69. // Not all kernel headers work from assembler
  70. // But these ones are needed: the ENTRY() define
  71. // And constants extracted from struct offsets
  72. // To avoid magic numbers and breakage:
  73. // Should they change the compiler can't save us
  74. // Down here in the depths of assembler code.
  75. #include <linux/linkage.h>
  76. #include <asm/asm-offsets.h>
  77. #include <asm/page.h>
  78. #include <asm/segment.h>
  79. #include <asm/lguest.h>
  80. // We mark the start of the code to copy
  81. // It's placed in .text tho it's never run here
  82. // You'll see the trick macro at the end
  83. // Which interleaves data and text to effect.
  84. .text
  85. ENTRY(start_switcher_text)
  86. // When we reach switch_to_guest we have just left
  87. // The safe and comforting shores of C code
  88. // %eax has the "struct lguest_pages" to use
  89. // Where we save state and still see it from the Guest
  90. // And %ebx holds the Guest shadow pagetable:
  91. // Once set we have truly left Host behind.
  92. ENTRY(switch_to_guest)
  93. // We told gcc all its regs could fade,
  94. // Clobbered by our journey into the Guest
  95. // We could have saved them, if we tried
  96. // But time is our master and cycles count.
  97. // Segment registers must be saved for the Host
  98. // We push them on the Host stack for later
  99. pushl %es
  100. pushl %ds
  101. pushl %gs
  102. pushl %fs
  103. // But the compiler is fickle, and heeds
  104. // No warning of %ebp clobbers
  105. // When frame pointers are used. That register
  106. // Must be saved and restored or chaos strikes.
  107. pushl %ebp
  108. // The Host's stack is done, now save it away
  109. // In our "struct lguest_pages" at offset
  110. // Distilled into asm-offsets.h
  111. movl %esp, LGUEST_PAGES_host_sp(%eax)
  112. // All saved and there's now five steps before us:
  113. // Stack, GDT, IDT, TSS
  114. // Then last of all the page tables are flipped.
  115. // Yet beware that our stack pointer must be
  116. // Always valid lest an NMI hits
  117. // %edx does the duty here as we juggle
  118. // %eax is lguest_pages: our stack lies within.
  119. movl %eax, %edx
  120. addl $LGUEST_PAGES_regs, %edx
  121. movl %edx, %esp
  122. // The Guest's GDT we so carefully
  123. // Placed in the "struct lguest_pages" before
  124. lgdt LGUEST_PAGES_guest_gdt_desc(%eax)
  125. // The Guest's IDT we did partially
  126. // Copy to "struct lguest_pages" as well.
  127. lidt LGUEST_PAGES_guest_idt_desc(%eax)
  128. // The TSS entry which controls traps
  129. // Must be loaded up with "ltr" now:
  130. // The GDT entry that TSS uses
  131. // Changes type when we load it: damn Intel!
  132. // For after we switch over our page tables
  133. // That entry will be read-only: we'd crash.
  134. movl $(GDT_ENTRY_TSS*8), %edx
  135. ltr %dx
  136. // Look back now, before we take this last step!
  137. // The Host's TSS entry was also marked used;
  138. // Let's clear it again for our return.
  139. // The GDT descriptor of the Host
  140. // Points to the table after two "size" bytes
  141. movl (LGUEST_PAGES_host_gdt_desc+2)(%eax), %edx
  142. // Clear "used" from type field (byte 5, bit 2)
  143. andb $0xFD, (GDT_ENTRY_TSS*8 + 5)(%edx)
  144. // Once our page table's switched, the Guest is live!
  145. // The Host fades as we run this final step.
  146. // Our "struct lguest_pages" is now read-only.
  147. movl %ebx, %cr3
  148. // The page table change did one tricky thing:
  149. // The Guest's register page has been mapped
  150. // Writable under our %esp (stack) --
  151. // We can simply pop off all Guest regs.
  152. popl %eax
  153. popl %ebx
  154. popl %ecx
  155. popl %edx
  156. popl %esi
  157. popl %edi
  158. popl %ebp
  159. popl %gs
  160. popl %fs
  161. popl %ds
  162. popl %es
  163. // Near the base of the stack lurk two strange fields
  164. // Which we fill as we exit the Guest
  165. // These are the trap number and its error
  166. // We can simply step past them on our way.
  167. addl $8, %esp
  168. // The last five stack slots hold return address
  169. // And everything needed to switch privilege
  170. // From Switcher's level 0 to Guest's 1,
  171. // And the stack where the Guest had last left it.
  172. // Interrupts are turned back on: we are Guest.
  173. iret
  174. // We tread two paths to switch back to the Host
  175. // Yet both must save Guest state and restore Host
  176. // So we put the routine in a macro.
  177. #define SWITCH_TO_HOST \
  178. /* We save the Guest state: all registers first \
  179. * Laid out just as "struct lguest_regs" defines */ \
  180. pushl %es; \
  181. pushl %ds; \
  182. pushl %fs; \
  183. pushl %gs; \
  184. pushl %ebp; \
  185. pushl %edi; \
  186. pushl %esi; \
  187. pushl %edx; \
  188. pushl %ecx; \
  189. pushl %ebx; \
  190. pushl %eax; \
  191. /* Our stack and our code are using segments \
  192. * Set in the TSS and IDT \
  193. * Yet if we were to touch data we'd use \
  194. * Whatever data segment the Guest had. \
  195. * Load the lguest ds segment for now. */ \
  196. movl $(LGUEST_DS), %eax; \
  197. movl %eax, %ds; \
  198. /* So where are we? Which CPU, which struct? \
  199. * The stack is our clue: our TSS starts \
  200. * It at the end of "struct lguest_pages". \
  201. * Or we may have stumbled while restoring \
  202. * Our Guest segment regs while in switch_to_guest, \
  203. * The fault pushed atop that part-unwound stack. \
  204. * If we round the stack down to the page start \
  205. * We're at the start of "struct lguest_pages". */ \
  206. movl %esp, %eax; \
  207. andl $(~(1 << PAGE_SHIFT - 1)), %eax; \
  208. /* Save our trap number: the switch will obscure it \
  209. * (In the Host the Guest regs are not mapped here) \
  210. * %ebx holds it safe for deliver_to_host */ \
  211. movl LGUEST_PAGES_regs_trapnum(%eax), %ebx; \
  212. /* The Host GDT, IDT and stack! \
  213. * All these lie safely hidden from the Guest: \
  214. * We must return to the Host page tables \
  215. * (Hence that was saved in struct lguest_pages) */ \
  216. movl LGUEST_PAGES_host_cr3(%eax), %edx; \
  217. movl %edx, %cr3; \
  218. /* As before, when we looked back at the Host \
  219. * As we left and marked TSS unused \
  220. * So must we now for the Guest left behind. */ \
  221. andb $0xFD, (LGUEST_PAGES_guest_gdt+GDT_ENTRY_TSS*8+5)(%eax); \
  222. /* Switch to Host's GDT, IDT. */ \
  223. lgdt LGUEST_PAGES_host_gdt_desc(%eax); \
  224. lidt LGUEST_PAGES_host_idt_desc(%eax); \
  225. /* Restore the Host's stack where its saved regs lie */ \
  226. movl LGUEST_PAGES_host_sp(%eax), %esp; \
  227. /* Last the TSS: our Host is returned */ \
  228. movl $(GDT_ENTRY_TSS*8), %edx; \
  229. ltr %dx; \
  230. /* Restore now the regs saved right at the first. */ \
  231. popl %ebp; \
  232. popl %fs; \
  233. popl %gs; \
  234. popl %ds; \
  235. popl %es
  236. // The first path is trod when the Guest has trapped:
  237. // (Which trap it was has been pushed on the stack).
  238. // We need only switch back, and the Host will decode
  239. // Why we came home, and what needs to be done.
  240. return_to_host:
  241. SWITCH_TO_HOST
  242. iret
  243. // We are lead to the second path like so:
  244. // An interrupt, with some cause external
  245. // Has ajerked us rudely from the Guest's code
  246. // Again we must return home to the Host
  247. deliver_to_host:
  248. SWITCH_TO_HOST
  249. // But now we must go home via that place
  250. // Where that interrupt was supposed to go
  251. // Had we not been ensconced, running the Guest.
  252. // Here we see the trickness of run_guest_once():
  253. // The Host stack is formed like an interrupt
  254. // With EIP, CS and EFLAGS layered.
  255. // Interrupt handlers end with "iret"
  256. // And that will take us home at long long last.
  257. // But first we must find the handler to call!
  258. // The IDT descriptor for the Host
  259. // Has two bytes for size, and four for address:
  260. // %edx will hold it for us for now.
  261. movl (LGUEST_PAGES_host_idt_desc+2)(%eax), %edx
  262. // We now know the table address we need,
  263. // And saved the trap's number inside %ebx.
  264. // Yet the pointer to the handler is smeared
  265. // Across the bits of the table entry.
  266. // What oracle can tell us how to extract
  267. // From such a convoluted encoding?
  268. // I consulted gcc, and it gave
  269. // These instructions, which I gladly credit:
  270. leal (%edx,%ebx,8), %eax
  271. movzwl (%eax),%edx
  272. movl 4(%eax), %eax
  273. xorw %ax, %ax
  274. orl %eax, %edx
  275. // Now the address of the handler's in %edx
  276. // We call it now: its "iret" drops us home.
  277. jmp *%edx
  278. // Every interrupt can come to us here
  279. // But we must truly tell each apart.
  280. // They number two hundred and fifty six
  281. // And each must land in a different spot,
  282. // Push its number on stack, and join the stream.
  283. // And worse, a mere six of the traps stand apart
  284. // And push on their stack an addition:
  285. // An error number, thirty two bits long
  286. // So we punish the other two fifty
  287. // And make them push a zero so they match.
  288. // Yet two fifty six entries is long
  289. // And all will look most the same as the last
  290. // So we create a macro which can make
  291. // As many entries as we need to fill.
  292. // Note the change to .data then .text:
  293. // We plant the address of each entry
  294. // Into a (data) table for the Host
  295. // To know where each Guest interrupt should go.
  296. .macro IRQ_STUB N TARGET
  297. .data; .long 1f; .text; 1:
  298. // Trap eight, ten through fourteen and seventeen
  299. // Supply an error number. Else zero.
  300. .if (\N <> 8) && (\N < 10 || \N > 14) && (\N <> 17)
  301. pushl $0
  302. .endif
  303. pushl $\N
  304. jmp \TARGET
  305. ALIGN
  306. .endm
  307. // This macro creates numerous entries
  308. // Using GAS macros which out-power C's.
  309. .macro IRQ_STUBS FIRST LAST TARGET
  310. irq=\FIRST
  311. .rept \LAST-\FIRST+1
  312. IRQ_STUB irq \TARGET
  313. irq=irq+1
  314. .endr
  315. .endm
  316. // Here's the marker for our pointer table
  317. // Laid in the data section just before
  318. // Each macro places the address of code
  319. // Forming an array: each one points to text
  320. // Which handles interrupt in its turn.
  321. .data
  322. .global default_idt_entries
  323. default_idt_entries:
  324. .text
  325. // The first two traps go straight back to the Host
  326. IRQ_STUBS 0 1 return_to_host
  327. // We'll say nothing, yet, about NMI
  328. IRQ_STUB 2 handle_nmi
  329. // Other traps also return to the Host
  330. IRQ_STUBS 3 31 return_to_host
  331. // All interrupts go via their handlers
  332. IRQ_STUBS 32 127 deliver_to_host
  333. // 'Cept system calls coming from userspace
  334. // Are to go to the Guest, never the Host.
  335. IRQ_STUB 128 return_to_host
  336. IRQ_STUBS 129 255 deliver_to_host
  337. // The NMI, what a fabulous beast
  338. // Which swoops in and stops us no matter that
  339. // We're suspended between heaven and hell,
  340. // (Or more likely between the Host and Guest)
  341. // When in it comes! We are dazed and confused
  342. // So we do the simplest thing which one can.
  343. // Though we've pushed the trap number and zero
  344. // We discard them, return, and hope we live.
  345. handle_nmi:
  346. addl $8, %esp
  347. iret
  348. // We are done; all that's left is Mastery
  349. // And "make Mastery" is a journey long
  350. // Designed to make your fingers itch to code.
  351. // Here ends the text, the file and poem.
  352. ENTRY(end_switcher_text)