[v4,00/11] block atomic writes

Message ID	20240219130109.341523-1-john.g.garry@oracle.com
Headers	show Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B53E33CC4; Mon, 19 Feb 2024 13:02:27 +0000 (UTC) From: John Garry <john.g.garry@oracle.com> To: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com, martin.petersen@oracle.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, jack@suse.cz Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org, ojaswin@linux.ibm.com, linux-aio@kvack.org, linux-btrfs@vger.kernel.org, io-uring@vger.kernel.org, nilay@linux.ibm.com, ritesh.list@gmail.com, John Garry <john.g.garry@oracle.com> Subject: [PATCH v4 00/11] block atomic writes Date: Mon, 19 Feb 2024 13:00:58 +0000 Message-Id: <20240219130109.341523-1-john.g.garry@oracle.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk MIME-Version: 1.0
Series	block atomic writes \| expand [v4,00/11] block atomic writes [v4,01/11] block: Pass blk_queue_get_max_sectors() a request pointer [v4,02/11] block: Call blkdev_dio_unaligned() from blkdev_direct_IO() [v4,03/11] fs: Initial atomic write support [v4,04/11] fs: Add initial atomic write support info to statx [v4,05/11] block: Add core atomic write support [v4,06/11] block: Add atomic write support for statx [v4,07/11] block: Add fops atomic write support [v4,08/11] scsi: sd: Atomic write support [v4,09/11] scsi: scsi_debug: Atomic write support [v4,10/11] nvme: Atomic write support [v4,11/11] nvme: Ensure atomic writes will be executed atomically

John Garry Feb. 19, 2024, 1 p.m. UTC

This series introduces a proposal to implementing atomic writes in the
kernel for torn-write protection.

This series takes the approach of adding a new "atomic" flag to each of
pwritev2() and iocb->ki_flags - RWF_ATOMIC and IOCB_ATOMIC, respectively.
When set, these indicate that we want the write issued "atomically".

Only direct IO is supported and for block devices here. For this, atomic
write HW is required, like SCSI ATOMIC WRITE (16).

XFS FS support will require rework according to discussion at:
https://lore.kernel.org/linux-fsdevel/20240124142645.9334-1-john.g.garry@oracle.com/T/#m916df899e9d0fb688cdbd415826ae2423306c2e0

The current plan there is to use forcealign feature from the start. This
will take a bit of time to get done.

Updated man pages have been posted at:
https://lore.kernel.org/lkml/20240124112731.28579-1-john.g.garry@oracle.com/T/#m520dca97a9748de352b5a723d3155a4bb1e46456

The goal here is to provide an interface that allows applications use
application-specific block sizes larger than logical block size
reported by the storage device or larger than filesystem block size as
reported by stat().

With this new interface, application blocks will never be torn or
fractured when written. For a power fail, for each individual application
block, all or none of the data to be written. A racing atomic write and
read will mean that the read sees all the old data or all the new data,
but never a mix of old and new.

Three new fields are added to struct statx - atomic_write_unit_min,
atomic_write_unit_max, and atomic_write_segments_max. For each atomic
individual write, the total length of a write must be a between
atomic_write_unit_min and atomic_write_unit_max, inclusive, and a
power-of-2. The write must also be at a natural offset in the file
wrt the write length. For pwritev2, iovcnt is limited by
atomic_write_segments_max.

SCSI sd.c and scsi_debug and NVMe kernel support is added.

This series is based on v6.8-rc5.

Patches can be found at:
https://github.com/johnpgarry/linux/commits/atomic-writes-v6.8-v4

Changes since v3:
- Condense and reorder patches, and also write proper commit messages
- Add patch block fops.c patch to change blkdev_dio_unaligned() callsite
- Re-use block limits in nvme_valid_atomic_write()
- Disallow RWF_ATOMIC for reads
- Add HCH RB tag for blk_queue_get_max_sectors() patch and modify commit
  message for new position

Changes since v2:
- Support atomic_write_segments_max
- Limit atomic write paramaters to max_hw_sectors_kb
- Don't increase fmode_t
- Change value for RWF_ATOMIC
- Various tidying (including advised by Jan)


Alan Adamson (2):
  nvme: Atomic write support
  nvme: Ensure atomic writes will be executed atomically

John Garry (6):
  block: Pass blk_queue_get_max_sectors() a request pointer
  block: Call blkdev_dio_unaligned() from blkdev_direct_IO()
  block: Add core atomic write support
  block: Add fops atomic write support
  scsi: sd: Atomic write support
  scsi: scsi_debug: Atomic write support

Prasad Singamsetty (3):
  fs: Initial atomic write support
  fs: Add initial atomic write support info to statx
  block: Add atomic write support for statx

 Documentation/ABI/stable/sysfs-block |  52 +++
 block/bdev.c                         |  37 +-
 block/blk-merge.c                    |  94 ++++-
 block/blk-mq.c                       |   2 +-
 block/blk-settings.c                 | 103 +++++
 block/blk-sysfs.c                    |  33 ++
 block/blk.h                          |   9 +-
 block/fops.c                         |  43 +-
 drivers/nvme/host/core.c             |  73 ++++
 drivers/scsi/scsi_debug.c            | 589 +++++++++++++++++++++------
 drivers/scsi/scsi_trace.c            |  22 +
 drivers/scsi/sd.c                    |  93 ++++-
 drivers/scsi/sd.h                    |   8 +
 fs/aio.c                             |   8 +-
 fs/btrfs/ioctl.c                     |   2 +-
 fs/read_write.c                      |   2 +-
 fs/stat.c                            |  47 ++-
 include/linux/blk_types.h            |   2 +
 include/linux/blkdev.h               |  65 ++-
 include/linux/fs.h                   |  39 +-
 include/linux/stat.h                 |   3 +
 include/scsi/scsi_proto.h            |   1 +
 include/trace/events/scsi.h          |   1 +
 include/uapi/linux/fs.h              |   5 +-
 include/uapi/linux/stat.h            |   9 +-
 io_uring/rw.c                        |   4 +-
 26 files changed, 1166 insertions(+), 180 deletions(-)

Keith Busch Feb. 19, 2024, 6:57 p.m. UTC | #1

On Mon, Feb 19, 2024 at 01:00:59PM +0000, John Garry wrote:
> Currently blk_queue_get_max_sectors() is passed a enum req_op. In future
> the value returned from blk_queue_get_max_sectors() may depend on certain
> request flags, so pass a request pointer.
> 
> Also use rq->cmd_flags instead of rq->bio->bi_opf when possible.

Looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>

David Sterba Feb. 19, 2024, 7:16 p.m. UTC | #2

On Mon, Feb 19, 2024 at 01:01:01PM +0000, John Garry wrote:
> From: Prasad Singamsetty <prasad.singamsetty@oracle.com>
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -301,9 +301,12 @@ typedef int __bitwise __kernel_rwf_t;
>  /* per-IO O_APPEND */
>  #define RWF_APPEND	((__force __kernel_rwf_t)0x00000010)
>  
> +/* Atomic Write */
> +#define RWF_ATOMIC	((__force __kernel_rwf_t)0x00000040)

Should this be 0x20 so it's the next bit after RWF_APPEND?

Keith Busch Feb. 19, 2024, 7:21 p.m. UTC | #3

On Mon, Feb 19, 2024 at 01:01:08PM +0000, John Garry wrote:
> From: Alan Adamson <alan.adamson@oracle.com>
> 
> Add support to set block layer request_queue atomic write limits. The
> limits will be derived from either the namespace or controller atomic
> parameters.
> 
> NVMe atomic-related parameters are grouped into "normal" and "power-fail"
> (or PF) class of parameter. For atomic write support, only PF parameters
> are of interest. The "normal" parameters are concerned with racing reads
> and writes (which also applies to PF). See NVM Command Set Specification
> Revision 1.0d section 2.1.4 for reference.
> 
> Whether to use per namespace or controller atomic parameters is decided by
> NSFEAT bit 1 - see Figure 97: Identify - Identify Namespace Data Structure,
> #NVM Command Set.
> 
> NVMe namespaces may define an atomic boundary, whereby no atomic guarantees
> are provided for a write which straddles this per-lba space boundary. The
> block layer merging policy is such that no merges may occur in which the
> resultant request would straddle such a boundary.
> 
> Unlike SCSI, NVMe specifies no granularity or alignment rules. In addition,
> again unlike SCSI, there is no dedicated atomic write command - a write
> which adheres to the atomic size limit and boundary is implicitly atomic.
> 
> If NSFEAT bit 1 is set, the following parameters are of interest:
> - NAWUPF (Namespace Atomic Write Unit Power Fail)
> - NABSPF (Namespace Atomic Boundary Size Power Fail)
> - NABO (Namespace Atomic Boundary Offset)
> 
> and we set request_queue limits as follows:
> - atomic_write_unit_max = rounddown_pow_of_two(NAWUPF)
> - atomic_write_max_bytes = NAWUPF
> - atomic_write_boundary = NABSPF
> 
> If in the unlikely scenario that NABO is non-zero, then atomic writes will
> not be supported at all as dealing with this adds extra complexity. This
> policy may change in future.
> 
> In all cases, atomic_write_unit_min is set to the logical block size.
> 
> If NSFEAT bit 1 is unset, the following parameter is of interest:
> - AWUPF (Atomic Write Unit Power Fail)
> 
> and we set request_queue limits as follows:
> - atomic_write_unit_max = rounddown_pow_of_two(AWUPF)
> - atomic_write_max_bytes = AWUPF
> - atomic_write_boundary = 0
> 
> The block layer requires that the atomic_write_boundary value is a
> power-of-2. However, it is really only required that atomic_write_boundary
> be a multiple of atomic_write_unit_max. As such, if NABSPF were not a
> power-of-2, atomic_write_unit_max could be reduced such that it was
> divisible into NABSPF. However, this complexity will not be yet supported.
> 
> A helper function, nvme_valid_atomic_write(), is also added for the
> submission path to verify that a request has been submitted to the driver
> will actually be executed atomically.

Maybe patch 11 should be folded into this one. No bigged, the series as
a whole looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>

Dave Chinner Feb. 19, 2024, 10:28 p.m. UTC | #4

On Mon, Feb 19, 2024 at 01:01:02PM +0000, John Garry wrote:
> From: Prasad Singamsetty <prasad.singamsetty@oracle.com>
> 
> Extend statx system call to return additional info for atomic write support
> support for a file.
> 
> Helper function generic_fill_statx_atomic_writes() can be used by FSes to
> fill in the relevant statx fields.
> 
> Signed-off-by: Prasad Singamsetty <prasad.singamsetty@oracle.com>
> #jpg: relocate bdev support to another patch
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/stat.c                 | 34 ++++++++++++++++++++++++++++++++++
>  include/linux/fs.h        |  3 +++
>  include/linux/stat.h      |  3 +++
>  include/uapi/linux/stat.h |  9 ++++++++-
>  4 files changed, 48 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 77cdc69eb422..522787a4ab6a 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -89,6 +89,37 @@ void generic_fill_statx_attr(struct inode *inode, struct kstat *stat)
>  }
>  EXPORT_SYMBOL(generic_fill_statx_attr);
>  
> +/**
> + * generic_fill_statx_atomic_writes - Fill in the atomic writes statx attributes
> + * @stat:	Where to fill in the attribute flags
> + * @unit_min:	Minimum supported atomic write length
> + * @unit_max:	Maximum supported atomic write length
> + *
> + * Fill in the STATX{_ATTR}_WRITE_ATOMIC flags in the kstat structure from
> + * atomic write unit_min and unit_max values.
> + */
> +void generic_fill_statx_atomic_writes(struct kstat *stat,
> +				      unsigned int unit_min,
> +				      unsigned int unit_max)
> +{
> +	/* Confirm that the request type is known */
> +	stat->result_mask |= STATX_WRITE_ATOMIC;
> +
> +	/* Confirm that the file attribute type is known */
> +	stat->attributes_mask |= STATX_ATTR_WRITE_ATOMIC;
> +
> +	if (unit_min) {
> +		stat->atomic_write_unit_min = unit_min;
> +		stat->atomic_write_unit_max = unit_max;
> +		/* Initially only allow 1x segment */
> +		stat->atomic_write_segments_max = 1;
> +
> +		/* Confirm atomic writes are actually supported */
> +		stat->attributes |= STATX_ATTR_WRITE_ATOMIC;
> +	}
> +}
> +EXPORT_SYMBOL(generic_fill_statx_atomic_writes);

What units are these in? Nothing in the patch or commit description
tells us....

-Dave.

Dave Chinner Feb. 19, 2024, 10:44 p.m. UTC | #5

On Mon, Feb 19, 2024 at 01:01:01PM +0000, John Garry wrote:
> @@ -3523,4 +3535,26 @@ extern int vfs_fadvise(struct file *file, loff_t offset, loff_t len,
>  extern int generic_fadvise(struct file *file, loff_t offset, loff_t len,
>  			   int advice);
>  
> +static inline bool atomic_write_valid(loff_t pos, struct iov_iter *iter,
> +			   unsigned int unit_min, unsigned int unit_max)
> +{
> +	size_t len = iov_iter_count(iter);
> +
> +	if (!iter_is_ubuf(iter))
> +		return false;
> +
> +	if (len == unit_min || len == unit_max) {
> +		/* ok if exactly min or max */
> +	} else if (len < unit_min || len > unit_max) {
> +		return false;
> +	} else if (!is_power_of_2(len)) {
> +		return false;
> +	}

This doesn't need if else if else if and it doesn't need to check
for exact unit min/max matches. The exact matches require the
length to be a power of 2, so the checks are simply:

	if (len < unit_min || len > unit_max)
		return false;
	if (!is_power_of_2(len))
		return false;

> +	if (pos & (len - 1))
> +		return false;

This has typing issues - 64 bit value, 32 bit mask. probably should
use:

	if (!IS_ALIGNED(pos, len))
		return false;

-Dave.

Christoph Hellwig Feb. 20, 2024, 6:55 a.m. UTC | #6

On Mon, Feb 19, 2024 at 12:21:14PM -0700, Keith Busch wrote:
> Maybe patch 11 should be folded into this one. No bigged, the series as
> a whole looks good.

I did suggest just that last round already..

Ojaswin Mujoo Feb. 20, 2024, 7:12 a.m. UTC | #7

On Mon, Feb 19, 2024 at 01:01:07PM +0000, John Garry wrote:
> Add initial support for atomic writes.
> 
> As is standard method, feed device properties via modules param, those
> being:
> - atomic_max_size_blks
> - atomic_alignment_blks
> - atomic_granularity_blks
> - atomic_max_size_with_boundary_blks
> - atomic_max_boundary_blks
> 
> These just match sbc4r22 section 6.6.4 - Block limits VPD page.
> 
> We just support ATOMIC WRITE (16).
> 
> The major change in the driver is how we lock the device for RW accesses.
> 
> Currently the driver uses a per-device lock for accessing device metadata
> and "media" data (calls to do_device_access()) atomically for the duration
> of the whole read/write command.
> 
> This should not suit verifying atomic writes. Reason being that currently
> all reads/writes are atomic, so using atomic writes does not prove
> anything.
> 
> Change device access model to basis that regular writes only atomic on a
> per-sector basis, while reads and atomic writes are fully atomic.
> 
> As mentioned, since accessing metadata and device media is atomic,
> continue to have regular writes involving metadata - like discard or PI -
> as atomic. We can improve this later.
> 
> Currently we only support model where overlapping going reads or writes
> wait for current access to complete before commencing an atomic write.
> This is described in 4.29.3.2 section of the SBC. However, we simplify,
> things and wait for all accesses to complete (when issuing an atomic
> write).
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---

<snip>

> +#define DEF_ATOMIC_WR 0

<snip>

> +static unsigned int sdebug_atomic_wr = DEF_ATOMIC_WR;

<snip>

> +MODULE_PARM_DESC(atomic_write, "enable ATOMIC WRITE support, support WRITE ATOMIC(16) (def=1)");
Hi John,

The default value here seems to be 0 and not 1. Got me a bit confused
while testing :) 

Regards,
ojaswin

>  MODULE_PARM_DESC(lowest_aligned, "lowest aligned lba (def=0)");
>  MODULE_PARM_DESC(lun_format, "LUN format: 0->peripheral (def); 1 --> flat address method");
>  MODULE_PARM_DESC(max_luns, "number of LUNs per target to simulate(def=1)");
> @@ -6260,6 +6575,11 @@ MODULE_PARM_DESC(unmap_alignment, "lowest aligned thin provisioning lba (def=0)"
>  MODULE_PARM_DESC(unmap_granularity, "thin provisioning granularity in blocks (def=1)");
>  MODULE_PARM_DESC(unmap_max_blocks, "max # of blocks can be unmapped in one cmd (def=0xffffffff)");
>  MODULE_PARM_DESC(unmap_max_desc, "max # of ranges that can be unmapped in one cmd (def=256)");
> +MODULE_PARM_DESC(atomic_wr_max_length, "max # of blocks can be atomically written in one cmd (def=8192)");
> +MODULE_PARM_DESC(atomic_wr_align, "minimum alignment of atomic write in blocks (def=2)");
> +MODULE_PARM_DESC(atomic_wr_gran, "minimum granularity of atomic write in blocks (def=2)");
> +MODULE_PARM_DESC(atomic_wr_max_length_bndry, "max # of blocks can be atomically written in one cmd with boundary set (def=8192)");
> +MODULE_PARM_DESC(atomic_wr_max_bndry, "max # boundaries per atomic write (def=128)");
>  MODULE_PARM_DESC(uuid_ctl,
>  		 "1->use uuid for lu name, 0->don't, 2->all use same (def=0)");
>  MODULE_PARM_DESC(virtual_gb, "virtual gigabyte (GiB) size (def=0 -> use dev_size_mb)");
> @@ -7406,6 +7726,7 @@ static int __init scsi_debug_init(void)
>  			return -EINVAL;
>  		}
>  	}
> +
>  	xa_init_flags(per_store_ap, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
>  	if (want_store) {
>  		idx = sdebug_add_store();
> @@ -7613,7 +7934,9 @@ static int sdebug_add_store(void)
>  			map_region(sip, 0, 2);
>  	}
>  
> -	rwlock_init(&sip->macc_lck);
> +	rwlock_init(&sip->macc_data_lck);
> +	rwlock_init(&sip->macc_meta_lck);
> +	rwlock_init(&sip->macc_sector_lck);
>  	return (int)n_idx;
>  err:
>  	sdebug_erase_store((int)n_idx, sip);
> -- 
> 2.31.1
>

John Garry Feb. 20, 2024, 8:13 a.m. UTC | #8

On 19/02/2024 19:16, David Sterba wrote:
> On Mon, Feb 19, 2024 at 01:01:01PM +0000, John Garry wrote:
>> From: Prasad Singamsetty<prasad.singamsetty@oracle.com>
>> --- a/include/uapi/linux/fs.h
>> +++ b/include/uapi/linux/fs.h
>> @@ -301,9 +301,12 @@ typedef int __bitwise __kernel_rwf_t;
>>   /* per-IO O_APPEND */
>>   #define RWF_APPEND	((__force __kernel_rwf_t)0x00000010)
>>   
>> +/* Atomic Write */
>> +#define RWF_ATOMIC	((__force __kernel_rwf_t)0x00000040)
> Should this be 0x20 so it's the next bit after RWF_APPEND?

Support for new flag RWF_NOAPPEND - which has value 0x20 - has been 
picked up on the vfs tree for 6.9 .

I had been just basing my series on Linus' release so far while also 
trying to anticipate any merge issue.

Thanks,
John

John Garry Feb. 20, 2024, 8:19 a.m. UTC | #9

On 20/02/2024 06:55, Christoph Hellwig wrote:
> On Mon, Feb 19, 2024 at 12:21:14PM -0700, Keith Busch wrote:
>> Maybe patch 11 should be folded into this one. No bigged, the series as
>> a whole looks good.
> 
> I did suggest just that last round already..
> 

I created the helper function and folded it in, but not the callsite. 
Not folding in the callsite change did not make sense to me, but that 
was my understanding of the request. Anyway, I can fold everything into 
this patch.

Thanks,
John

Christoph Hellwig Feb. 20, 2024, 8:20 a.m. UTC | #10

> +EXPORT_SYMBOL(generic_fill_statx_atomic_writes);

EXPORT_SYMBOL_GPL for any new feature, please.

Christoph Hellwig Feb. 20, 2024, 8:31 a.m. UTC | #11

Thanks for writing a good commit message!

> NVMe namespaces may define an atomic boundary, whereby no atomic guarantees
> are provided for a write which straddles this per-lba space boundary. The
> block layer merging policy is such that no merges may occur in which the
> resultant request would straddle such a boundary.
> 
> Unlike SCSI, NVMe specifies no granularity or alignment rules.

Well, the boundary really is sort of a granularity and alignment,
isn't it?

John Garry Feb. 20, 2024, 8:50 a.m. UTC | #12

On 20/02/2024 08:31, Christoph Hellwig wrote:
> Thanks for writing a good commit message!
> 
>> NVMe namespaces may define an atomic boundary, whereby no atomic guarantees
>> are provided for a write which straddles this per-lba space boundary. The
>> block layer merging policy is such that no merges may occur in which the
>> resultant request would straddle such a boundary.
>>
>> Unlike SCSI, NVMe specifies no granularity or alignment rules.
> 
> Well, the boundary really is sort of a granularity and alignment,
> isn't it?

NVMe does indeed have the boundary rule, but it is not really the same 
as SCSI granularity and alignment.

Anyway, I can word that statement to be clearer.

Thanks,
John

John Garry Feb. 20, 2024, 9:01 a.m. UTC | #13

On 20/02/2024 07:12, Ojaswin Mujoo wrote:
>> +MODULE_PARM_DESC(atomic_write, "enable ATOMIC WRITE support, support WRITE ATOMIC(16) (def=1)");
> Hi John,
> 
> The default value here seems to be 0 and not 1. Got me a bit confused
> while testing 🙂

I can fix that MODULE_PARM_DESC() text.

I don't think that many disks support atomic writes, so I was leaving 
this disabled by default.

Thanks,
John

John Garry Feb. 20, 2024, 9:01 a.m. UTC | #14

On 20/02/2024 08:20, Christoph Hellwig wrote:
>> +EXPORT_SYMBOL(generic_fill_statx_atomic_writes);
> EXPORT_SYMBOL_GPL for any new feature, please.
ok, thanks.

John Garry Feb. 20, 2024, 9:40 a.m. UTC | #15

On 19/02/2024 22:28, Dave Chinner wrote:
>>   
>> +/**
>> + * generic_fill_statx_atomic_writes - Fill in the atomic writes statx attributes
>> + * @stat:	Where to fill in the attribute flags
>> + * @unit_min:	Minimum supported atomic write length
>> + * @unit_max:	Maximum supported atomic write length
>> + *
>> + * Fill in the STATX{_ATTR}_WRITE_ATOMIC flags in the kstat structure from
>> + * atomic write unit_min and unit_max values.
>> + */
>> +void generic_fill_statx_atomic_writes(struct kstat *stat,
>> +				      unsigned int unit_min,
>> +				      unsigned int unit_max)
>> +{
>> +	/* Confirm that the request type is known */
>> +	stat->result_mask |= STATX_WRITE_ATOMIC;
>> +
>> +	/* Confirm that the file attribute type is known */
>> +	stat->attributes_mask |= STATX_ATTR_WRITE_ATOMIC;
>> +
>> +	if (unit_min) {
>> +		stat->atomic_write_unit_min = unit_min;
>> +		stat->atomic_write_unit_max = unit_max;
>> +		/* Initially only allow 1x segment */
>> +		stat->atomic_write_segments_max = 1;
>> +
>> +		/* Confirm atomic writes are actually supported */
>> +		stat->attributes |= STATX_ATTR_WRITE_ATOMIC;
>> +	}
>> +}
>> +EXPORT_SYMBOL(generic_fill_statx_atomic_writes);
> What units are these in? Nothing in the patch or commit description
> tells us....

I can append the current comments to mention that the unit is bytes.

Thanks,
John

John Garry Feb. 20, 2024, 9:52 a.m. UTC | #16

On 19/02/2024 22:44, Dave Chinner wrote:
> On Mon, Feb 19, 2024 at 01:01:01PM +0000, John Garry wrote:
>> @@ -3523,4 +3535,26 @@ extern int vfs_fadvise(struct file *file, loff_t offset, loff_t len,
>>   extern int generic_fadvise(struct file *file, loff_t offset, loff_t len,
>>   			   int advice);
>>   
>> +static inline bool atomic_write_valid(loff_t pos, struct iov_iter *iter,
>> +			   unsigned int unit_min, unsigned int unit_max)
>> +{
>> +	size_t len = iov_iter_count(iter);
>> +
>> +	if (!iter_is_ubuf(iter))
>> +		return false;
>> +
>> +	if (len == unit_min || len == unit_max) {
>> +		/* ok if exactly min or max */
>> +	} else if (len < unit_min || len > unit_max) {
>> +		return false;
>> +	} else if (!is_power_of_2(len)) {
>> +		return false;
>> +	}
> This doesn't need if else if else if and it doesn't need to check
> for exact unit min/max matches. 

This is fastpath code, and I thought it quicker to just check if min/max 
first. Based on recent discussions, for FS support I expect typically 
len == unit_max.

But I can change to your simpler checking and later change to the 
current method if those FS assumptions hold true.

> The exact matches require the
> length to be a power of 2, so the checks are simply:
> 
> 	if (len < unit_min || len > unit_max)
> 		return false;
> 	if (!is_power_of_2(len))
> 		return false;
> 
>> +	if (pos & (len - 1))
>> +		return false;
> This has typing issues - 64 bit value, 32 bit mask. probably should
> use:
> 
> 	if (!IS_ALIGNED(pos, len))
> 		return false;

ok, good idea.

Thanks,
John

[v4,00/11] block atomic writes

Message

Comments