|
@@ -2,6 +2,10 @@
|
|
|
kinds of locks - per-inode (->i_mutex) and per-filesystem
|
|
|
(->s_vfs_rename_mutex).
|
|
|
|
|
|
+ When taking the i_mutex on multiple non-directory objects, we
|
|
|
+always acquire the locks in order by increasing address. We'll call
|
|
|
+that "inode pointer" order in the following.
|
|
|
+
|
|
|
For our purposes all operations fall in 5 classes:
|
|
|
|
|
|
1) read access. Locking rules: caller locks directory we are accessing.
|
|
@@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem
|
|
|
locks victim and calls the method.
|
|
|
|
|
|
4) rename() that is _not_ cross-directory. Locking rules: caller locks
|
|
|
-the parent, finds source and target, if target already exists - locks it
|
|
|
-and then calls the method.
|
|
|
+the parent and finds source and target. If target already exists, lock
|
|
|
+it. If source is a non-directory, lock it. If that means we need to
|
|
|
+lock both, lock them in inode pointer order.
|
|
|
|
|
|
5) link creation. Locking rules:
|
|
|
* lock parent
|
|
@@ -30,7 +35,9 @@ rules:
|
|
|
fail with -ENOTEMPTY
|
|
|
* if new parent is equal to or is a descendent of source
|
|
|
fail with -ELOOP
|
|
|
- * if target exists - lock it.
|
|
|
+ * If target exists, lock it. If source is a non-directory, lock
|
|
|
+ it. In case that means we need to lock both source and target,
|
|
|
+ do so in inode pointer order.
|
|
|
* call the method.
|
|
|
|
|
|
|
|
@@ -56,9 +63,11 @@ objects - A < B iff A is an ancestor of B.
|
|
|
renames will be blocked on filesystem lock and we don't start changing
|
|
|
the order until we had acquired all locks).
|
|
|
|
|
|
-(3) any operation holds at most one lock on non-directory object and
|
|
|
- that lock is acquired after all other locks. (Proof: see descriptions
|
|
|
- of operations).
|
|
|
+(3) locks on non-directory objects are acquired only after locks on
|
|
|
+ directory objects, and are acquired in inode pointer order.
|
|
|
+ (Proof: all operations but renames take lock on at most one
|
|
|
+ non-directory object, except renames, which take locks on source and
|
|
|
+ target in inode pointer order in the case they are not directories.)
|
|
|
|
|
|
Now consider the minimal deadlock. Each process is blocked on
|
|
|
attempt to acquire some lock and already holds at least one lock. Let's
|
|
@@ -66,9 +75,13 @@ consider the set of contended locks. First of all, filesystem lock is
|
|
|
not contended, since any process blocked on it is not holding any locks.
|
|
|
Thus all processes are blocked on ->i_mutex.
|
|
|
|
|
|
- Non-directory objects are not contended due to (3). Thus link
|
|
|
-creation can't be a part of deadlock - it can't be blocked on source
|
|
|
-and it means that it doesn't hold any locks.
|
|
|
+ By (3), any process holding a non-directory lock can only be
|
|
|
+waiting on another non-directory lock with a larger address. Therefore
|
|
|
+the process holding the "largest" such lock can always make progress, and
|
|
|
+non-directory objects are not included in the set of contended locks.
|
|
|
+
|
|
|
+ Thus link creation can't be a part of deadlock - it can't be
|
|
|
+blocked on source and it means that it doesn't hold any locks.
|
|
|
|
|
|
Any contended object is either held by cross-directory rename or
|
|
|
has a child that is also contended. Indeed, suppose that it is held by
|