|
@@ -13,72 +13,89 @@ Mailing list: linux-ext4@vger.kernel.org
|
|
|
1. Quick usage instructions:
|
|
|
===========================
|
|
|
|
|
|
- - Grab updated e2fsprogs from
|
|
|
- ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/
|
|
|
- This is a patchset on top of e2fsprogs-1.39, which can be found at
|
|
|
+ - Compile and install the latest version of e2fsprogs (as of this
|
|
|
+ writing version 1.41) from:
|
|
|
+
|
|
|
+ http://sourceforge.net/project/showfiles.php?group_id=2406
|
|
|
+
|
|
|
+ or
|
|
|
+
|
|
|
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
|
|
|
|
|
|
- - It's still mke2fs -j /dev/hda1
|
|
|
+ or grab the latest git repository from:
|
|
|
+
|
|
|
+ git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
|
|
|
+
|
|
|
+ - Create a new filesystem using the ext4dev filesystem type:
|
|
|
+
|
|
|
+ # mke2fs -t ext4dev /dev/hda1
|
|
|
+
|
|
|
+ Or configure an existing ext3 filesystem to support extents and set
|
|
|
+ the test_fs flag to indicate that it's ok for an in-development
|
|
|
+ filesystem to touch this filesystem:
|
|
|
|
|
|
- - mount /dev/hda1 /wherever -t ext4dev
|
|
|
+ # tune2fs -O extents -E test_fs /dev/hda1
|
|
|
|
|
|
- - To enable extents,
|
|
|
+ If the filesystem was created with 128 byte inodes, it can be
|
|
|
+ converted to use 256 byte for greater efficiency via:
|
|
|
|
|
|
- mount /dev/hda1 /wherever -t ext4dev -o extents
|
|
|
+ # tune2fs -I 256 /dev/hda1
|
|
|
|
|
|
- - The filesystem is compatible with the ext3 driver until you add a file
|
|
|
- which has extents (ie: `mount -o extents', then create a file).
|
|
|
+ (Note: we currently do not have tools to convert an ext4dev
|
|
|
+ filesystem back to ext3; so please do not do try this on production
|
|
|
+ filesystems.)
|
|
|
|
|
|
- NOTE: The "extents" mount flag is temporary. It will soon go away and
|
|
|
- extents will be enabled by the "-o extents" flag to mke2fs or tune2fs
|
|
|
+ - Mounting:
|
|
|
+
|
|
|
+ # mount -t ext4dev /dev/hda1 /wherever
|
|
|
|
|
|
- When comparing performance with other filesystems, remember that
|
|
|
- ext3/4 by default offers higher data integrity guarantees than most. So
|
|
|
- when comparing with a metadata-only journalling filesystem, use `mount -o
|
|
|
- data=writeback'. And you might as well use `mount -o nobh' too along
|
|
|
- with it. Making the journal larger than the mke2fs default often helps
|
|
|
- performance with metadata-intensive workloads.
|
|
|
+ ext3/4 by default offers higher data integrity guarantees than most.
|
|
|
+ So when comparing with a metadata-only journalling filesystem, such
|
|
|
+ as ext3, use `mount -o data=writeback'. And you might as well use
|
|
|
+ `mount -o nobh' too along with it. Making the journal larger than
|
|
|
+ the mke2fs default often helps performance with metadata-intensive
|
|
|
+ workloads.
|
|
|
|
|
|
2. Features
|
|
|
===========
|
|
|
|
|
|
2.1 Currently available
|
|
|
|
|
|
-* ability to use filesystems > 16TB
|
|
|
+* ability to use filesystems > 16TB (e2fsprogs support not available yet)
|
|
|
* extent format reduces metadata overhead (RAM, IO for access, transactions)
|
|
|
* extent format more robust in face of on-disk corruption due to magics,
|
|
|
* internal redunancy in tree
|
|
|
-
|
|
|
-2.1 Previously available, soon to be enabled by default by "mkefs.ext4":
|
|
|
-
|
|
|
-* dir_index and resize inode will be on by default
|
|
|
-* large inodes will be used by default for fast EAs, nsec timestamps, etc
|
|
|
+* improved file allocation (multi-block alloc, delayed alloc)
|
|
|
+* fix 32000 subdirectory limit
|
|
|
+* nsec timestamps for mtime, atime, ctime, create time
|
|
|
+* inode version field on disk (NFSv4, Lustre)
|
|
|
+* reduced e2fsck time via uninit_bg feature
|
|
|
+* journal checksumming for robustness, performance
|
|
|
+* persistent file preallocation (e.g for streaming media, databases)
|
|
|
+* ability to pack bitmaps and inode tables into larger virtual groups via the
|
|
|
+ flex_bg feature
|
|
|
+* large file support
|
|
|
+* Inode allocation using large virtual block groups via flex_bg
|
|
|
|
|
|
2.2 Candidate features for future inclusion
|
|
|
|
|
|
-There are several under discussion, whether they all make it in is
|
|
|
-partly a function of how much time everyone has to work on them:
|
|
|
+* Online defrag (patches available but not well tested)
|
|
|
+* reduced mke2fs time via lazy itable initialization in conjuction with
|
|
|
+ the uninit_bg feature (capability to do this is available in e2fsprogs
|
|
|
+ but a kernel thread to do lazy zeroing of unused inode table blocks
|
|
|
+ after filesystem is first mounted is required for safety)
|
|
|
|
|
|
-* improved file allocation (multi-block alloc, delayed alloc; basically done)
|
|
|
-* fix 32000 subdirectory limit (patch exists, needs some e2fsck work)
|
|
|
-* nsec timestamps for mtime, atime, ctime, create time (patch exists,
|
|
|
- needs some e2fsck work)
|
|
|
-* inode version field on disk (NFSv4, Lustre; prototype exists)
|
|
|
-* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists)
|
|
|
-* journal checksumming for robustness, performance (prototype exists)
|
|
|
-* persistent file preallocation (e.g for streaming media, databases)
|
|
|
+There are several others under discussion, whether they all make it in is
|
|
|
+partly a function of how much time everyone has to work on them. Features like
|
|
|
+metadata checksumming have been discussed and planned for a bit but no patches
|
|
|
+exist yet so I'm not sure they're in the near-term roadmap.
|
|
|
|
|
|
-Features like metadata checksumming have been discussed and planned for
|
|
|
-a bit but no patches exist yet so I'm not sure they're in the near-term
|
|
|
-roadmap.
|
|
|
+The big performance win will come with mballoc, delalloc and flex_bg
|
|
|
+grouping of bitmaps and inode tables. Some test results available here:
|
|
|
|
|
|
-The big performance win will come with mballoc and delalloc. CFS has
|
|
|
-been using mballoc for a few years already with Lustre, and IBM + Bull
|
|
|
-did a lot of benchmarking on it. The reason it isn't in the first set of
|
|
|
-patches is partly a manageability issue, and partly because it doesn't
|
|
|
-directly affect the on-disk format (outside of much better allocation)
|
|
|
-so it isn't critical to get into the first round of changes. I believe
|
|
|
-Alex is working on a new set of patches right now.
|
|
|
+ - http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
|
|
|
+ - http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
|
|
|
|
|
|
3. Options
|
|
|
==========
|
|
@@ -224,7 +241,7 @@ stripe=n Number of filesystem blocks that mballoc will try
|
|
|
disks * RAID chunk size in file system blocks.
|
|
|
|
|
|
Data Mode
|
|
|
----------
|
|
|
+=========
|
|
|
There are 3 different data modes:
|
|
|
|
|
|
* writeback mode
|
|
@@ -256,7 +273,8 @@ kernel source: <file:fs/ext4/>
|
|
|
<file:fs/jbd2/>
|
|
|
|
|
|
programs: http://e2fsprogs.sourceforge.net/
|
|
|
- http://ext2resize.sourceforge.net
|
|
|
|
|
|
useful links: http://fedoraproject.org/wiki/ext3-devel
|
|
|
http://www.bullopensource.org/ext4/
|
|
|
+ http://ext4.wiki.kernel.org/index.php/Main_Page
|
|
|
+ http://fedoraproject.org/wiki/Features/Ext4
|