|
@@ -62,15 +62,15 @@ A. In kernel use of migrate_pages()
|
|
|
It also prevents the swapper or other scans to encounter
|
|
|
the page.
|
|
|
|
|
|
-2. Generate a list of newly allocates page. These pages will contain the
|
|
|
+2. Generate a list of newly allocates pages. These pages will contain the
|
|
|
contents of the pages from the first list after page migration is
|
|
|
complete.
|
|
|
|
|
|
3. The migrate_pages() function is called which attempts
|
|
|
to do the migration. It returns the moved pages in the
|
|
|
list specified as the third parameter and the failed
|
|
|
- migrations in the fourth parameter. The first parameter
|
|
|
- will contain the pages that could still be retried.
|
|
|
+ migrations in the fourth parameter. When the function
|
|
|
+ returns the first list will contain the pages that could still be retried.
|
|
|
|
|
|
4. The leftover pages of various types are returned
|
|
|
to the LRU using putback_to_lru_pages() or otherwise
|
|
@@ -93,83 +93,58 @@ Steps:
|
|
|
|
|
|
2. Insure that writeback is complete.
|
|
|
|
|
|
-3. Make sure that the page has assigned swap cache entry if
|
|
|
- it is an anonyous page. The swap cache reference is necessary
|
|
|
- to preserve the information contain in the page table maps while
|
|
|
- page migration occurs.
|
|
|
-
|
|
|
-4. Prep the new page that we want to move to. It is locked
|
|
|
+3. Prep the new page that we want to move to. It is locked
|
|
|
and set to not being uptodate so that all accesses to the new
|
|
|
page immediately lock while the move is in progress.
|
|
|
|
|
|
-5. All the page table references to the page are either dropped (file
|
|
|
- backed pages) or converted to swap references (anonymous pages).
|
|
|
- This should decrease the reference count.
|
|
|
+4. The new page is prepped with some settings from the old page so that
|
|
|
+ accesses to the new page will discover a page with the correct settings.
|
|
|
+
|
|
|
+5. All the page table references to the page are converted
|
|
|
+ to migration entries or dropped (nonlinear vmas).
|
|
|
+ This decrease the mapcount of a page. If the resulting
|
|
|
+ mapcount is not zero then we do not migrate the page.
|
|
|
+ All user space processes that attempt to access the page
|
|
|
+ will now wait on the page lock.
|
|
|
|
|
|
6. The radix tree lock is taken. This will cause all processes trying
|
|
|
- to reestablish a pte to block on the radix tree spinlock.
|
|
|
+ to access the page via the mapping to block on the radix tree spinlock.
|
|
|
|
|
|
7. The refcount of the page is examined and we back out if references remain
|
|
|
otherwise we know that we are the only one referencing this page.
|
|
|
|
|
|
8. The radix tree is checked and if it does not contain the pointer to this
|
|
|
- page then we back out because someone else modified the mapping first.
|
|
|
-
|
|
|
-9. The mapping is checked. If the mapping is gone then a truncate action may
|
|
|
- be in progress and we back out.
|
|
|
-
|
|
|
-10. The new page is prepped with some settings from the old page so that
|
|
|
- accesses to the new page will be discovered to have the correct settings.
|
|
|
+ page then we back out because someone else modified the radix tree.
|
|
|
|
|
|
-11. The radix tree is changed to point to the new page.
|
|
|
+9. The radix tree is changed to point to the new page.
|
|
|
|
|
|
-12. The reference count of the old page is dropped because the radix tree
|
|
|
- reference is gone.
|
|
|
+10. The reference count of the old page is dropped because the radix tree
|
|
|
+ reference is gone. A reference to the new page is established because
|
|
|
+ the new page is referenced to by the radix tree.
|
|
|
|
|
|
-13. The radix tree lock is dropped. With that lookups become possible again
|
|
|
- and other processes will move from spinning on the tree lock to sleeping on
|
|
|
- the locked new page.
|
|
|
+11. The radix tree lock is dropped. With that lookups in the mapping
|
|
|
+ become possible again. Processes will move from spinning on the tree_lock
|
|
|
+ to sleeping on the locked new page.
|
|
|
|
|
|
-14. The page contents are copied to the new page.
|
|
|
+12. The page contents are copied to the new page.
|
|
|
|
|
|
-15. The remaining page flags are copied to the new page.
|
|
|
+13. The remaining page flags are copied to the new page.
|
|
|
|
|
|
-16. The old page flags are cleared to indicate that the page does
|
|
|
- not use any information anymore.
|
|
|
+14. The old page flags are cleared to indicate that the page does
|
|
|
+ not provide any information anymore.
|
|
|
|
|
|
-17. Queued up writeback on the new page is triggered.
|
|
|
+15. Queued up writeback on the new page is triggered.
|
|
|
|
|
|
-18. If swap pte's were generated for the page then replace them with real
|
|
|
- ptes. This will reenable access for processes not blocked by the page lock.
|
|
|
+16. If migration entries were page then replace them with real ptes. Doing
|
|
|
+ so will enable access for user space processes not already waiting for
|
|
|
+ the page lock.
|
|
|
|
|
|
19. The page locks are dropped from the old and new page.
|
|
|
- Processes waiting on the page lock can continue.
|
|
|
+ Processes waiting on the page lock will redo their page faults
|
|
|
+ and will reach the new page.
|
|
|
|
|
|
20. The new page is moved to the LRU and can be scanned by the swapper
|
|
|
etc again.
|
|
|
|
|
|
-TODO list
|
|
|
----------
|
|
|
-
|
|
|
-- Page migration requires the use of swap handles to preserve the
|
|
|
- information of the anonymous page table entries. This means that swap
|
|
|
- space is reserved but never used. The maximum number of swap handles used
|
|
|
- is determined by CHUNK_SIZE (see mm/mempolicy.c) per ongoing migration.
|
|
|
- Reservation of pages could be avoided by having a special type of swap
|
|
|
- handle that does not require swap space and that would only track the page
|
|
|
- references. Something like that was proposed by Marcelo Tosatti in the
|
|
|
- past (search for migration cache on lkml or linux-mm@kvack.org).
|
|
|
-
|
|
|
-- Page migration unmaps ptes for file backed pages and requires page
|
|
|
- faults to reestablish these ptes. This could be optimized by somehow
|
|
|
- recording the references before migration and then reestablish them later.
|
|
|
- However, there are several locking challenges that have to be overcome
|
|
|
- before this is possible.
|
|
|
-
|
|
|
-- Page migration generates read ptes for anonymous pages. Dirty page
|
|
|
- faults are required to make the pages writable again. It may be possible
|
|
|
- to generate a pte marked dirty if it is known that the page is dirty and
|
|
|
- that this process has the only reference to that page.
|
|
|
-
|
|
|
-Christoph Lameter, March 8, 2006.
|
|
|
+Christoph Lameter, May 8, 2006.
|
|
|
|