|
@@ -137,13 +137,6 @@ shrink_page_list() where they will be detected when vmscan walks the reverse
|
|
map in try_to_unmap(). If try_to_unmap() returns SWAP_MLOCK, shrink_page_list()
|
|
map in try_to_unmap(). If try_to_unmap() returns SWAP_MLOCK, shrink_page_list()
|
|
will cull the page at that point.
|
|
will cull the page at that point.
|
|
|
|
|
|
-Note that for anonymous pages, shrink_page_list() attempts to add the page to
|
|
|
|
-the swap cache before it tries to unmap the page. To avoid this unnecessary
|
|
|
|
-consumption of swap space, shrink_page_list() calls try_to_munlock() to check
|
|
|
|
-whether any VM_LOCKED vmas map the page without attempting to unmap the page.
|
|
|
|
-If try_to_munlock() returns SWAP_MLOCK, shrink_page_list() will cull the page
|
|
|
|
-without consuming swap space. try_to_munlock() will be described below.
|
|
|
|
-
|
|
|
|
To "cull" an unevictable page, vmscan simply puts the page back on the lru
|
|
To "cull" an unevictable page, vmscan simply puts the page back on the lru
|
|
list using putback_lru_page()--the inverse operation to isolate_lru_page()--
|
|
list using putback_lru_page()--the inverse operation to isolate_lru_page()--
|
|
after dropping the page lock. Because the condition which makes the page
|
|
after dropping the page lock. Because the condition which makes the page
|
|
@@ -190,8 +183,8 @@ several places:
|
|
in the VM_LOCKED flag being set for the vma.
|
|
in the VM_LOCKED flag being set for the vma.
|
|
3) in the fault path, if mlocked pages are "culled" in the fault path,
|
|
3) in the fault path, if mlocked pages are "culled" in the fault path,
|
|
and when a VM_LOCKED stack segment is expanded.
|
|
and when a VM_LOCKED stack segment is expanded.
|
|
-4) as mentioned above, in vmscan:shrink_page_list() with attempting to
|
|
|
|
- reclaim a page in a VM_LOCKED vma--via try_to_unmap() or try_to_munlock().
|
|
|
|
|
|
+4) as mentioned above, in vmscan:shrink_page_list() when attempting to
|
|
|
|
+ reclaim a page in a VM_LOCKED vma via try_to_unmap().
|
|
|
|
|
|
Mlocked pages become unlocked and rescued from the unevictable list when:
|
|
Mlocked pages become unlocked and rescued from the unevictable list when:
|
|
|
|
|
|
@@ -260,9 +253,9 @@ mlock_fixup() filters several classes of "special" vmas:
|
|
|
|
|
|
2) vmas mapping hugetlbfs page are already effectively pinned into memory.
|
|
2) vmas mapping hugetlbfs page are already effectively pinned into memory.
|
|
We don't need nor want to mlock() these pages. However, to preserve the
|
|
We don't need nor want to mlock() these pages. However, to preserve the
|
|
- prior behavior of mlock()--before the unevictable/mlock changes--mlock_fixup()
|
|
|
|
- will call make_pages_present() in the hugetlbfs vma range to allocate the
|
|
|
|
- huge pages and populate the ptes.
|
|
|
|
|
|
+ prior behavior of mlock()--before the unevictable/mlock changes--
|
|
|
|
+ mlock_fixup() will call make_pages_present() in the hugetlbfs vma range
|
|
|
|
+ to allocate the huge pages and populate the ptes.
|
|
|
|
|
|
3) vmas with VM_DONTEXPAND|VM_RESERVED are generally user space mappings of
|
|
3) vmas with VM_DONTEXPAND|VM_RESERVED are generally user space mappings of
|
|
kernel pages, such as the vdso page, relay channel pages, etc. These pages
|
|
kernel pages, such as the vdso page, relay channel pages, etc. These pages
|
|
@@ -322,7 +315,7 @@ __mlock_vma_pages_range()--the same function used to mlock a vma range--
|
|
passing a flag to indicate that munlock() is being performed.
|
|
passing a flag to indicate that munlock() is being performed.
|
|
|
|
|
|
Because the vma access protections could have been changed to PROT_NONE after
|
|
Because the vma access protections could have been changed to PROT_NONE after
|
|
-faulting in and mlocking some pages, get_user_pages() was unreliable for visiting
|
|
|
|
|
|
+faulting in and mlocking pages, get_user_pages() was unreliable for visiting
|
|
these pages for munlocking. Because we don't want to leave pages mlocked(),
|
|
these pages for munlocking. Because we don't want to leave pages mlocked(),
|
|
get_user_pages() was enhanced to accept a flag to ignore the permissions when
|
|
get_user_pages() was enhanced to accept a flag to ignore the permissions when
|
|
fetching the pages--all of which should be resident as a result of previous
|
|
fetching the pages--all of which should be resident as a result of previous
|
|
@@ -416,8 +409,8 @@ Mlocked Pages: munmap()/exit()/exec() System Call Handling
|
|
When unmapping an mlocked region of memory, whether by an explicit call to
|
|
When unmapping an mlocked region of memory, whether by an explicit call to
|
|
munmap() or via an internal unmap from exit() or exec() processing, we must
|
|
munmap() or via an internal unmap from exit() or exec() processing, we must
|
|
munlock the pages if we're removing the last VM_LOCKED vma that maps the pages.
|
|
munlock the pages if we're removing the last VM_LOCKED vma that maps the pages.
|
|
-Before the unevictable/mlock changes, mlocking did not mark the pages in any way,
|
|
|
|
-so unmapping them required no processing.
|
|
|
|
|
|
+Before the unevictable/mlock changes, mlocking did not mark the pages in any
|
|
|
|
+way, so unmapping them required no processing.
|
|
|
|
|
|
To munlock a range of memory under the unevictable/mlock infrastructure, the
|
|
To munlock a range of memory under the unevictable/mlock infrastructure, the
|
|
munmap() hander and task address space tear down function call
|
|
munmap() hander and task address space tear down function call
|
|
@@ -517,12 +510,10 @@ couldn't be mlocked.
|
|
Mlocked pages: try_to_munlock() Reverse Map Scan
|
|
Mlocked pages: try_to_munlock() Reverse Map Scan
|
|
|
|
|
|
TODO/FIXME: a better name might be page_mlocked()--analogous to the
|
|
TODO/FIXME: a better name might be page_mlocked()--analogous to the
|
|
-page_referenced() reverse map walker--especially if we continue to call this
|
|
|
|
-from shrink_page_list(). See related TODO/FIXME below.
|
|
|
|
|
|
+page_referenced() reverse map walker.
|
|
|
|
|
|
-When munlock_vma_page()--see "Mlocked Pages: munlock()/munlockall() System
|
|
|
|
-Call Handling" above--tries to munlock a page, or when shrink_page_list()
|
|
|
|
-encounters an anonymous page that is not yet in the swap cache, they need to
|
|
|
|
|
|
+When munlock_vma_page()--see "Mlocked Pages: munlock()/munlockall()
|
|
|
|
+System Call Handling" above--tries to munlock a page, it needs to
|
|
determine whether or not the page is mapped by any VM_LOCKED vma, without
|
|
determine whether or not the page is mapped by any VM_LOCKED vma, without
|
|
actually attempting to unmap all ptes from the page. For this purpose, the
|
|
actually attempting to unmap all ptes from the page. For this purpose, the
|
|
unevictable/mlock infrastructure introduced a variant of try_to_unmap() called
|
|
unevictable/mlock infrastructure introduced a variant of try_to_unmap() called
|
|
@@ -535,10 +526,7 @@ for VM_LOCKED vmas. When such a vma is found for anonymous pages and file
|
|
pages mapped in linear VMAs, as in the try_to_unmap() case, the functions
|
|
pages mapped in linear VMAs, as in the try_to_unmap() case, the functions
|
|
attempt to acquire the associated mmap semphore, mlock the page via
|
|
attempt to acquire the associated mmap semphore, mlock the page via
|
|
mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the
|
|
mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the
|
|
-pre-clearing of the page's PG_mlocked done by munlock_vma_page() and informs
|
|
|
|
-shrink_page_list() that the anonymous page should be culled rather than added
|
|
|
|
-to the swap cache in preparation for a try_to_unmap() that will almost
|
|
|
|
-certainly fail.
|
|
|
|
|
|
+pre-clearing of the page's PG_mlocked done by munlock_vma_page.
|
|
|
|
|
|
If try_to_unmap() is unable to acquire a VM_LOCKED vma's associated mmap
|
|
If try_to_unmap() is unable to acquire a VM_LOCKED vma's associated mmap
|
|
semaphore, it will return SWAP_AGAIN. This will allow shrink_page_list()
|
|
semaphore, it will return SWAP_AGAIN. This will allow shrink_page_list()
|
|
@@ -557,10 +545,7 @@ However, the scan can terminate when it encounters a VM_LOCKED vma and can
|
|
successfully acquire the vma's mmap semphore for read and mlock the page.
|
|
successfully acquire the vma's mmap semphore for read and mlock the page.
|
|
Although try_to_munlock() can be called many [very many!] times when
|
|
Although try_to_munlock() can be called many [very many!] times when
|
|
munlock()ing a large region or tearing down a large address space that has been
|
|
munlock()ing a large region or tearing down a large address space that has been
|
|
-mlocked via mlockall(), overall this is a fairly rare event. In addition,
|
|
|
|
-although shrink_page_list() calls try_to_munlock() for every anonymous page that
|
|
|
|
-it handles that is not yet in the swap cache, on average anonymous pages will
|
|
|
|
-have very short reverse map lists.
|
|
|
|
|
|
+mlocked via mlockall(), overall this is a fairly rare event.
|
|
|
|
|
|
Mlocked Page: Page Reclaim in shrink_*_list()
|
|
Mlocked Page: Page Reclaim in shrink_*_list()
|
|
|
|
|
|
@@ -588,8 +573,8 @@ Some examples of these unevictable pages on the LRU lists are:
|
|
munlock_vma_page() was forced to let the page back on to the normal
|
|
munlock_vma_page() was forced to let the page back on to the normal
|
|
LRU list for vmscan to handle.
|
|
LRU list for vmscan to handle.
|
|
|
|
|
|
-shrink_inactive_list() also culls any unevictable pages that it finds
|
|
|
|
-on the inactive lists, again diverting them to the appropriate zone's unevictable
|
|
|
|
|
|
+shrink_inactive_list() also culls any unevictable pages that it finds on
|
|
|
|
+the inactive lists, again diverting them to the appropriate zone's unevictable
|
|
lru list. shrink_inactive_list() should only see SHM_LOCKed pages that became
|
|
lru list. shrink_inactive_list() should only see SHM_LOCKed pages that became
|
|
SHM_LOCKed after shrink_active_list() had moved them to the inactive list, or
|
|
SHM_LOCKed after shrink_active_list() had moved them to the inactive list, or
|
|
pages mapped into VM_LOCKED vmas that munlock_vma_page() couldn't isolate from
|
|
pages mapped into VM_LOCKED vmas that munlock_vma_page() couldn't isolate from
|
|
@@ -597,19 +582,7 @@ the lru to recheck via try_to_munlock(). shrink_inactive_list() won't notice
|
|
the latter, but will pass on to shrink_page_list().
|
|
the latter, but will pass on to shrink_page_list().
|
|
|
|
|
|
shrink_page_list() again culls obviously unevictable pages that it could
|
|
shrink_page_list() again culls obviously unevictable pages that it could
|
|
-encounter for similar reason to shrink_inactive_list(). As already discussed,
|
|
|
|
-shrink_page_list() proactively looks for anonymous pages that should have
|
|
|
|
-PG_mlocked set but don't--these would not be detected by page_evictable()--to
|
|
|
|
-avoid adding them to the swap cache unnecessarily. File pages mapped into
|
|
|
|
|
|
+encounter for similar reason to shrink_inactive_list(). Pages mapped into
|
|
VM_LOCKED vmas but without PG_mlocked set will make it all the way to
|
|
VM_LOCKED vmas but without PG_mlocked set will make it all the way to
|
|
-try_to_unmap(). shrink_page_list() will divert them to the unevictable list when
|
|
|
|
-try_to_unmap() returns SWAP_MLOCK, as discussed above.
|
|
|
|
-
|
|
|
|
-TODO/FIXME: If we can enhance the swap cache to reliably remove entries
|
|
|
|
-with page_count(page) > 2, as long as all ptes are mapped to the page and
|
|
|
|
-not the swap entry, we can probably remove the call to try_to_munlock() in
|
|
|
|
-shrink_page_list() and just remove the page from the swap cache when
|
|
|
|
-try_to_unmap() returns SWAP_MLOCK. Currently, remove_exclusive_swap_page()
|
|
|
|
-doesn't seem to allow that.
|
|
|
|
-
|
|
|
|
-
|
|
|
|
|
|
+try_to_unmap(). shrink_page_list() will divert them to the unevictable list
|
|
|
|
+when try_to_unmap() returns SWAP_MLOCK, as discussed above.
|