ext4.txt 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287
  1. Ext4 Filesystem
  2. ===============
  3. This is a development version of the ext4 filesystem, an advanced level
  4. of the ext3 filesystem which incorporates scalability and reliability
  5. enhancements for supporting large filesystems (64 bit) in keeping with
  6. increasing disk capacities and state-of-the-art feature requirements.
  7. Mailing list: linux-ext4@vger.kernel.org
  8. 1. Quick usage instructions:
  9. ===========================
  10. - Compile and install the latest version of e2fsprogs (as of this
  11. writing version 1.41) from:
  12. http://sourceforge.net/project/showfiles.php?group_id=2406
  13. or
  14. ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
  15. or grab the latest git repository from:
  16. git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
  17. - Create a new filesystem using the ext4dev filesystem type:
  18. # mke2fs -t ext4dev /dev/hda1
  19. Or configure an existing ext3 filesystem to support extents and set
  20. the test_fs flag to indicate that it's ok for an in-development
  21. filesystem to touch this filesystem:
  22. # tune2fs -O extents -E test_fs /dev/hda1
  23. If the filesystem was created with 128 byte inodes, it can be
  24. converted to use 256 byte for greater efficiency via:
  25. # tune2fs -I 256 /dev/hda1
  26. (Note: we currently do not have tools to convert an ext4dev
  27. filesystem back to ext3; so please do not do try this on production
  28. filesystems.)
  29. - Mounting:
  30. # mount -t ext4dev /dev/hda1 /wherever
  31. - When comparing performance with other filesystems, remember that
  32. ext3/4 by default offers higher data integrity guarantees than most.
  33. So when comparing with a metadata-only journalling filesystem, such
  34. as ext3, use `mount -o data=writeback'. And you might as well use
  35. `mount -o nobh' too along with it. Making the journal larger than
  36. the mke2fs default often helps performance with metadata-intensive
  37. workloads.
  38. 2. Features
  39. ===========
  40. 2.1 Currently available
  41. * ability to use filesystems > 16TB (e2fsprogs support not available yet)
  42. * extent format reduces metadata overhead (RAM, IO for access, transactions)
  43. * extent format more robust in face of on-disk corruption due to magics,
  44. * internal redunancy in tree
  45. * improved file allocation (multi-block alloc)
  46. * fix 32000 subdirectory limit
  47. * nsec timestamps for mtime, atime, ctime, create time
  48. * inode version field on disk (NFSv4, Lustre)
  49. * reduced e2fsck time via uninit_bg feature
  50. * journal checksumming for robustness, performance
  51. * persistent file preallocation (e.g for streaming media, databases)
  52. * ability to pack bitmaps and inode tables into larger virtual groups via the
  53. flex_bg feature
  54. * large file support
  55. * Inode allocation using large virtual block groups via flex_bg
  56. * delayed allocation
  57. * large block (up to pagesize) support
  58. * efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
  59. the ordering)
  60. 2.2 Candidate features for future inclusion
  61. * Online defrag (patches available but not well tested)
  62. * reduced mke2fs time via lazy itable initialization in conjuction with
  63. the uninit_bg feature (capability to do this is available in e2fsprogs
  64. but a kernel thread to do lazy zeroing of unused inode table blocks
  65. after filesystem is first mounted is required for safety)
  66. There are several others under discussion, whether they all make it in is
  67. partly a function of how much time everyone has to work on them. Features like
  68. metadata checksumming have been discussed and planned for a bit but no patches
  69. exist yet so I'm not sure they're in the near-term roadmap.
  70. The big performance win will come with mballoc, delalloc and flex_bg
  71. grouping of bitmaps and inode tables. Some test results available here:
  72. - http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
  73. - http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
  74. 3. Options
  75. ==========
  76. When mounting an ext4 filesystem, the following option are accepted:
  77. (*) == default
  78. extents (*) ext4 will use extents to address file data. The
  79. file system will no longer be mountable by ext3.
  80. noextents ext4 will not use extents for newly created files
  81. journal_checksum Enable checksumming of the journal transactions.
  82. This will allow the recovery code in e2fsck and the
  83. kernel to detect corruption in the kernel. It is a
  84. compatible change and will be ignored by older kernels.
  85. journal_async_commit Commit block can be written to disk without waiting
  86. for descriptor blocks. If enabled older kernels cannot
  87. mount the device. This will enable 'journal_checksum'
  88. internally.
  89. journal=update Update the ext4 file system's journal to the current
  90. format.
  91. journal=inum When a journal already exists, this option is ignored.
  92. Otherwise, it specifies the number of the inode which
  93. will represent the ext4 file system's journal file.
  94. journal_dev=devnum When the external journal device's major/minor numbers
  95. have changed, this option allows the user to specify
  96. the new journal location. The journal device is
  97. identified through its new major/minor numbers encoded
  98. in devnum.
  99. noload Don't load the journal on mounting.
  100. data=journal All data are committed into the journal prior to being
  101. written into the main file system.
  102. data=ordered (*) All data are forced directly out to the main file
  103. system prior to its metadata being committed to the
  104. journal.
  105. data=writeback Data ordering is not preserved, data may be written
  106. into the main file system after its metadata has been
  107. committed to the journal.
  108. commit=nrsec (*) Ext4 can be told to sync all its data and metadata
  109. every 'nrsec' seconds. The default value is 5 seconds.
  110. This means that if you lose your power, you will lose
  111. as much as the latest 5 seconds of work (your
  112. filesystem will not be damaged though, thanks to the
  113. journaling). This default value (or any low value)
  114. will hurt performance, but it's good for data-safety.
  115. Setting it to 0 will have the same effect as leaving
  116. it at the default (5 seconds).
  117. Setting it to very large values will improve
  118. performance.
  119. barrier=<0|1(*)> This enables/disables the use of write barriers in
  120. the jbd code. barrier=0 disables, barrier=1 enables.
  121. This also requires an IO stack which can support
  122. barriers, and if jbd gets an error on a barrier
  123. write, it will disable again with a warning.
  124. Write barriers enforce proper on-disk ordering
  125. of journal commits, making volatile disk write caches
  126. safe to use, at some performance penalty. If
  127. your disks are battery-backed in one way or another,
  128. disabling barriers may safely improve performance.
  129. orlov (*) This enables the new Orlov block allocator. It is
  130. enabled by default.
  131. oldalloc This disables the Orlov block allocator and enables
  132. the old block allocator. Orlov should have better
  133. performance - we'd like to get some feedback if it's
  134. the contrary for you.
  135. user_xattr Enables Extended User Attributes. Additionally, you
  136. need to have extended attribute support enabled in the
  137. kernel configuration (CONFIG_EXT4_FS_XATTR). See the
  138. attr(5) manual page and http://acl.bestbits.at/ to
  139. learn more about extended attributes.
  140. nouser_xattr Disables Extended User Attributes.
  141. acl Enables POSIX Access Control Lists support.
  142. Additionally, you need to have ACL support enabled in
  143. the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL).
  144. See the acl(5) manual page and http://acl.bestbits.at/
  145. for more information.
  146. noacl This option disables POSIX Access Control List
  147. support.
  148. reservation
  149. noreservation
  150. bsddf (*) Make 'df' act like BSD.
  151. minixdf Make 'df' act like Minix.
  152. check=none Don't do extra checking of bitmaps on mount.
  153. nocheck
  154. debug Extra debugging information is sent to syslog.
  155. errors=remount-ro(*) Remount the filesystem read-only on an error.
  156. errors=continue Keep going on a filesystem error.
  157. errors=panic Panic and halt the machine if an error occurs.
  158. grpid Give objects the same group ID as their creator.
  159. bsdgroups
  160. nogrpid (*) New objects have the group ID of their creator.
  161. sysvgroups
  162. resgid=n The group ID which may use the reserved blocks.
  163. resuid=n The user ID which may use the reserved blocks.
  164. sb=n Use alternate superblock at this location.
  165. quota
  166. noquota
  167. grpquota
  168. usrquota
  169. bh (*) ext4 associates buffer heads to data pages to
  170. nobh (a) cache disk block mapping information
  171. (b) link pages into transaction to provide
  172. ordering guarantees.
  173. "bh" option forces use of buffer heads.
  174. "nobh" option tries to avoid associating buffer
  175. heads (supported only for "writeback" mode).
  176. mballoc (*) Use the multiple block allocator for block allocation
  177. nomballoc disabled multiple block allocator for block allocation.
  178. stripe=n Number of filesystem blocks that mballoc will try
  179. to use for allocation size and alignment. For RAID5/6
  180. systems this should be the number of data
  181. disks * RAID chunk size in file system blocks.
  182. delalloc (*) Deferring block allocation until write-out time.
  183. nodelalloc Disable delayed allocation. Blocks are allocation
  184. when data is copied from user to page cache.
  185. Data Mode
  186. =========
  187. There are 3 different data modes:
  188. * writeback mode
  189. In data=writeback mode, ext4 does not journal data at all. This mode provides
  190. a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
  191. mode - metadata journaling. A crash+recovery can cause incorrect data to
  192. appear in files which were written shortly before the crash. This mode will
  193. typically provide the best ext4 performance.
  194. * ordered mode
  195. In data=ordered mode, ext4 only officially journals metadata, but it logically
  196. groups metadata information related to data changes with the data blocks into a
  197. single unit called a transaction. When it's time to write the new metadata
  198. out to disk, the associated data blocks are written first. In general,
  199. this mode performs slightly slower than writeback but significantly faster than journal mode.
  200. * journal mode
  201. data=journal mode provides full data and metadata journaling. All new data is
  202. written to the journal first, and then to its final location.
  203. In the event of a crash, the journal can be replayed, bringing both data and
  204. metadata into a consistent state. This mode is the slowest except when data
  205. needs to be read from and written to disk at the same time where it
  206. outperforms all others modes. Curently ext4 does not have delayed
  207. allocation support if this data journalling mode is selected.
  208. References
  209. ==========
  210. kernel source: <file:fs/ext4/>
  211. <file:fs/jbd2/>
  212. programs: http://e2fsprogs.sourceforge.net/
  213. useful links: http://fedoraproject.org/wiki/ext3-devel
  214. http://www.bullopensource.org/ext4/
  215. http://ext4.wiki.kernel.org/index.php/Main_Page
  216. http://fedoraproject.org/wiki/Features/Ext4