Btrfs: pin logs earlier when doing a rename exchange operation
authorFilipe Manana <fdmanana@suse.com>
Thu, 5 May 2016 01:08:56 +0000 (02:08 +0100)
committerFilipe Manana <fdmanana@suse.com>
Fri, 13 May 2016 00:59:28 +0000 (01:59 +0100)
The btrfs_rename_exchange() started as a copy-paste from btrfs_rename(),
which had a race fixed by my previous patch titled "Btrfs: pin log earlier
when renaming", and so it suffers from the same problem.

We pin the logs of the affected roots after we insert the new inode
references, leaving a time window where concurrent tasks logging the
inodes can end up logging both the new and old references, resulting
in log trees that when replayed can turn the metadata into inconsistent
states. This behaviour was added to btrfs_rename() in 2009 without any
explanation about why not pinning the logs earlier, just leaving a
comment about the posibility for the race. As of today it's perfectly
safe and sane to pin the logs before we start doing any of the steps
involved in the rename operation.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
fs/btrfs/inode.c

index c92d9b8..a7db3ed 100644 (file)
@@ -9458,6 +9458,8 @@ static int btrfs_rename_exchange(struct inode *old_dir,
                /* force full log commit if subvolume involved. */
                btrfs_set_log_full_commit(root->fs_info, trans);
        } else {
+               btrfs_pin_log_trans(root);
+               root_log_pinned = true;
                ret = btrfs_insert_inode_ref(trans, dest,
                                             new_dentry->d_name.name,
                                             new_dentry->d_name.len,
@@ -9465,8 +9467,6 @@ static int btrfs_rename_exchange(struct inode *old_dir,
                                             btrfs_ino(new_dir), old_idx);
                if (ret)
                        goto out_fail;
-               btrfs_pin_log_trans(root);
-               root_log_pinned = true;
        }
 
        /* And now for the dest. */
@@ -9474,6 +9474,8 @@ static int btrfs_rename_exchange(struct inode *old_dir,
                /* force full log commit if subvolume involved. */
                btrfs_set_log_full_commit(dest->fs_info, trans);
        } else {
+               btrfs_pin_log_trans(dest);
+               dest_log_pinned = true;
                ret = btrfs_insert_inode_ref(trans, root,
                                             old_dentry->d_name.name,
                                             old_dentry->d_name.len,
@@ -9481,8 +9483,6 @@ static int btrfs_rename_exchange(struct inode *old_dir,
                                             btrfs_ino(old_dir), new_idx);
                if (ret)
                        goto out_fail;
-               btrfs_pin_log_trans(dest);
-               dest_log_pinned = true;
        }
 
        /* Update inode version and ctime/mtime. */