mbox series

[v11,00/13] Apply COR-filter to the block-stream permanently

Message ID 1602524605-481160-1-git-send-email-andrey.shinkevich@virtuozzo.com
Headers show
Series Apply COR-filter to the block-stream permanently | expand

Message

Xingtao Yao (Fujitsu)" via Oct. 12, 2020, 5:43 p.m. UTC
The iotest case test_stream_parallel still does not pass after the
COR-filter is inserted into the backing chain. As the test case may not
be initialized, it does not make a sense and was removed again.

v11:
  04: Base node overlay is used instead of base.
  05: Base node overlay is used instead of base.
  06: New.
  07: New.
  08: New.
  09: The new BDS-member 'supported_read_flags' is applied.
  10: The 'base_metadata' variable renamed to 'base_unfiltered'.
  11: New.
  12: The backing-file argument is left in the QMP interface. Warning added.
  13: The BDRV_REQ_COPY_ON_READ removed from the stream_populate();
      The 'implicit' initialization moved back to COR-filter driver.
      Base node overlay is used instead of base.

The v8 Message-Id:
<1601383109-110988-1-git-send-email-andrey.shinkevich@virtuozzo.com>

Andrey Shinkevich (13):
  copy-on-read: Support preadv/pwritev_part functions
  copy-on-read: add filter append/drop functions
  qapi: add filter-node-name to block-stream
  copy-on-read: pass overlay base node name to COR driver
  copy-on-read: limit COR operations to base in COR driver
  block: modify the comment for BDRV_REQ_PREFETCH flag
  block: include supported_read_flags into BDS structure
  copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter
  copy-on-read: skip non-guest reads if no copy needed
  stream: skip filters when writing backing file name to QCOW2 header
  stream: mark backing-file argument as deprecated
  stream: remove unused backing-file name parameter
  block: apply COR-filter to block-stream jobs

 block/copy-on-read.c           | 171 ++++++++++++++++++++++++++++++++++++++---
 block/copy-on-read.h           |  35 +++++++++
 block/io.c                     |   3 +-
 block/monitor/block-hmp-cmds.c |   4 +-
 block/stream.c                 | 112 ++++++++++++++++-----------
 blockdev.c                     |  25 +++---
 docs/system/deprecated.rst     |   6 ++
 include/block/block.h          |   7 +-
 include/block/block_int.h      |  13 +++-
 qapi/block-core.json           |   6 ++
 tests/qemu-iotests/030         |  51 ++----------
 tests/qemu-iotests/030.out     |   4 +-
 tests/qemu-iotests/141.out     |   2 +-
 tests/qemu-iotests/245         |  19 +++--
 14 files changed, 324 insertions(+), 134 deletions(-)
 create mode 100644 block/copy-on-read.h

Comments

Max Reitz Oct. 14, 2020, 12:01 p.m. UTC | #1
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Limit COR operations by the base node in the backing chain when the
> overlay base node name is given. It will be useful for a block stream
> job when the COR-filter is applied. The overlay base node is passed as
> the base itself may change due to concurrent commit jobs on the same
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index c578b1b..dfbd6ad 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>                                             size_t qiov_offset,
>                                             int flags)
>  {
> -    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> -                               flags | BDRV_REQ_COPY_ON_READ);
> +    int64_t n = 0;
> +    int64_t size = offset + bytes;

Just when I hit send I noticed that “end” would be a more fitting name
for this variable.

> +    int local_flags;
> +    int ret;
> +    BDRVStateCOR *state = bs->opaque;
> +
> +    if (!state->base_overlay) {
> +        return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> +                                   flags | BDRV_REQ_COPY_ON_READ);
> +    }
> +
> +    while (offset < size) {

(because I got a bit confused looking at this)

(Though dropping @size (or @end) and just checking when @bytes becomes 0
should work, too.)

> +        local_flags = flags;
> +
> +        /* In case of failure, try to copy-on-read anyway */
> +        ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
> +        if (!ret) {
> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
> +                                          state->base_overlay, true, offset,
> +                                          n, &n);
> +            if (ret) {
> +                local_flags |= BDRV_REQ_COPY_ON_READ;
> +            }
> +        }

Furthermore, I just noticed – can the is_allocated functions not return
0 in @n, when @offset is a the EOF?  Is that something to look out for?
 (I’m not sure.)

Max

> +
> +        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
> +                                  local_flags);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        offset += n;
> +        qiov_offset += n;
> +        bytes -= n;
> +    }
> +
> +    return 0;
>  }
>  
>  
>
Andrey Shinkevich Oct. 14, 2020, 2:28 p.m. UTC | #2
On 14.10.2020 13:44, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Provide API for the COR-filter insertion/removal.
>> Also, drop the filter child permissions for an inactive state when the
>> filter node is being removed.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/copy-on-read.h | 35 +++++++++++++++++++++
>>   2 files changed, 123 insertions(+)
>>   create mode 100644 block/copy-on-read.h
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index cb03e0f..bcccf0f 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
> 
> [...]
> 
>> @@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
>>       bdrv_register(&bdrv_copy_on_read);
>>   }
>>   
>> +
>> +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
>> +                                         QDict *node_options,
>> +                                         int flags, Error **errp)
> 
> I had hoped you could make this a generic block layer function. :(
> 
> (Because it really is rather generic)
> 
> *shrug*

Actually, I did (and still can do) that for the 'append node' function 
only but not for the 'drop node' one so far...

diff --git a/block.c b/block.c
index 11ab55f..f41e876 100644
--- a/block.c
+++ b/block.c
@@ -4669,6 +4669,55 @@ static void bdrv_delete(BlockDriverState *bs)
      g_free(bs);
  }

+BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict 
*node_options,
+                                   int flags, Error **errp)
+{
+    BlockDriverState *new_node_bs;
+    Error *local_err = NULL;
+
+    new_node_bs =  bdrv_open(NULL, NULL, node_options, flags, errp);
+    if (new_node_bs == NULL) {
+        error_prepend(errp, "Could not create node: ");
+        return NULL;
+    }
+
+    bdrv_drained_begin(bs);
+    bdrv_replace_node(bs, new_node_bs, &local_err);
+    bdrv_drained_end(bs);
+
+    if (local_err) {
+        bdrv_unref(new_node_bs);
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    return new_node_bs;
+}
+
+void bdrv_remove_node(BlockDriverState *bs)
+{
+    BdrvChild *child;
+    BlockDriverState *inferior_bs;
+
+    child = bdrv_filter_or_cow_child(bs);
+    if (!child) {
+        return;
+    }
+    inferior_bs = child->bs;
+
+    /* Retain the BDS until we complete the graph change. */
+    bdrv_ref(inferior_bs);
+    /* Hold a guest back from writing while permissions are being reset. */
+    bdrv_drained_begin(inferior_bs);
+    /* Refresh permissions before the graph change. */
+    bdrv_child_refresh_perms(bs, child, &error_abort);
+    bdrv_replace_node(bs, inferior_bs, &error_abort);
+
+    bdrv_drained_end(inferior_bs);
+    bdrv_unref(inferior_bs);
+    bdrv_unref(bs);
+}

So, it is an intermediate solution in this patch of the series. I am 
going to make both functions generic once Vladimir overhauls the QEMU 
permission update system. Otherwise, the COR-filter node cannot be 
removed from the backing chain gracefully.

Thank you for your r-b. If the next version comes, I can move the 
'append node' function only to the generic layer.

Andrey

> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
>> +{
>> +    BlockDriverState *cor_filter_bs;
>> +    Error *local_err = NULL;
>> +
>> +    cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp);
>> +    if (cor_filter_bs == NULL) {
>> +        error_prepend(errp, "Could not create COR-filter node: ");
>> +        return NULL;
>> +    }
>> +
>> +    if (!qdict_get_try_str(node_options, "node-name")) {
>> +        cor_filter_bs->implicit = true;
>> +    }
>> +
>> +    bdrv_drained_begin(bs);
>> +    bdrv_replace_node(bs, cor_filter_bs, &local_err);
>> +    bdrv_drained_end(bs);
>> +
>> +    if (local_err) {
>> +        bdrv_unref(cor_filter_bs);
>> +        error_propagate(errp, local_err);
>> +        return NULL;
>> +    }
>> +
>> +    return cor_filter_bs;
>> +}
>
Max Reitz Oct. 14, 2020, 3:03 p.m. UTC | #3
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Whereas the block-stream job starts using a backing file name of the
> base node overlay after the block-stream job completes, mark the QMP
> 'backing-file' argument as deprecated.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  docs/system/deprecated.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> index 8b3ab5b..7491fcf 100644
> --- a/docs/system/deprecated.rst
> +++ b/docs/system/deprecated.rst
> @@ -285,6 +285,12 @@ details.
>  The ``query-events`` command has been superseded by the more powerful
>  and accurate ``query-qmp-schema`` command.
>  
> +``block-stream`` argument ``backing-file`` (since 5.2)
> +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
> +
> +The argument ``backing-file`` is deprecated. QEMU uses a backing file
> +name of the base node overlay after the block-stream job completes.
> +

Hm, why?  I don’t see the problem with it.

Max
Vladimir Sementsov-Ogievskiy Oct. 14, 2020, 3:22 p.m. UTC | #4
14.10.2020 15:51, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>> COR-driver to skip unneeded reading. It can be taken into account for
>> the COR-algorithms optimization. That check is being made during the
>> block stream job by the moment.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 13 +++++++++----
>>   block/io.c           |  3 ++-
>>   2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index b136895..278a11a 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>>               }
>>           }
>>   
>> -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
>> -                                  local_flags);
>> -        if (ret < 0) {
>> -            return ret;
>> +        if (!!(flags & BDRV_REQ_PREFETCH) &
> 
> How about dropping the double negation and using a logical && instead of
> the binary &?
> 
>> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
>> +            /* Skip non-guest reads if no copy needed */
>> +        } else {
> 
> Hm.  I would have just written the negated form
> 
> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
> 
> and put the “skip” comment above that condition.
> 
> (Since local_flags is initialized to flags, it can be written as a
> single comparison, but that’s a matter of taste and I’m not going to
> recommend either over the other:
> 
> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
> BDRV_REQ_PREFETCH)
> 
> )
> 
>> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
>> +                                      local_flags);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>>           }
>>   
>>           offset += n;
>> diff --git a/block/io.c b/block/io.c
>> index 11df188..bff1808 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>>   
>>       max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>       if (bytes <= max_bytes && bytes <= max_transfer) {
>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>> +                                 flags & bs->supported_read_flags);


When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.

Actually, if driver doesn't support the PREFETCH flag we should do nothing.


> 
> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
> why it isn’t.
> 


Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver.
Max Reitz Oct. 14, 2020, 4:26 p.m. UTC | #5
On 14.10.20 16:28, Andrey Shinkevich wrote:
> On 14.10.2020 13:44, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> Provide API for the COR-filter insertion/removal.
>>> Also, drop the filter child permissions for an inactive state when the
>>> filter node is being removed.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 88
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   block/copy-on-read.h | 35 +++++++++++++++++++++
>>>   2 files changed, 123 insertions(+)
>>>   create mode 100644 block/copy-on-read.h
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index cb03e0f..bcccf0f 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>
>> [...]
>>
>>> @@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
>>>       bdrv_register(&bdrv_copy_on_read);
>>>   }
>>>   +
>>> +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
>>> +                                         QDict *node_options,
>>> +                                         int flags, Error **errp)
>>
>> I had hoped you could make this a generic block layer function. :(
>>
>> (Because it really is rather generic)
>>
>> *shrug*
> 
> Actually, I did (and still can do) that for the 'append node' function
> only but not for the 'drop node' one so far...

Ah, yeah.

> diff --git a/block.c b/block.c
> index 11ab55f..f41e876 100644
> --- a/block.c
> +++ b/block.c
> @@ -4669,6 +4669,55 @@ static void bdrv_delete(BlockDriverState *bs)
>      g_free(bs);
>  }
> 
> +BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict
> *node_options,
> +                                   int flags, Error **errp)
> +{
> +    BlockDriverState *new_node_bs;
> +    Error *local_err = NULL;
> +
> +    new_node_bs =  bdrv_open(NULL, NULL, node_options, flags, errp);
> +    if (new_node_bs == NULL) {
> +        error_prepend(errp, "Could not create node: ");
> +        return NULL;
> +    }
> +
> +    bdrv_drained_begin(bs);
> +    bdrv_replace_node(bs, new_node_bs, &local_err);
> +    bdrv_drained_end(bs);
> +
> +    if (local_err) {
> +        bdrv_unref(new_node_bs);
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +
> +    return new_node_bs;
> +}
> +
> +void bdrv_remove_node(BlockDriverState *bs)
> +{
> +    BdrvChild *child;
> +    BlockDriverState *inferior_bs;
> +
> +    child = bdrv_filter_or_cow_child(bs);
> +    if (!child) {
> +        return;
> +    }
> +    inferior_bs = child->bs;
> +
> +    /* Retain the BDS until we complete the graph change. */
> +    bdrv_ref(inferior_bs);
> +    /* Hold a guest back from writing while permissions are being
> reset. */
> +    bdrv_drained_begin(inferior_bs);
> +    /* Refresh permissions before the graph change. */
> +    bdrv_child_refresh_perms(bs, child, &error_abort);
> +    bdrv_replace_node(bs, inferior_bs, &error_abort);
> +
> +    bdrv_drained_end(inferior_bs);
> +    bdrv_unref(inferior_bs);
> +    bdrv_unref(bs);
> +}
> 
> So, it is an intermediate solution in this patch of the series. I am
> going to make both functions generic once Vladimir overhauls the QEMU
> permission update system. Otherwise, the COR-filter node cannot be
> removed from the backing chain gracefully.

True.

> Thank you for your r-b. If the next version comes, I can move the
> 'append node' function only to the generic layer.

Thanks!  Even if you decide against it, I’m happy as long as there are
plans to perhaps make it generic at some point.

(I’m just afraid this function might be useful in some other case, too,
and we forget that it exists here already, and then end up
reimplementing it.)

Max
Vladimir Sementsov-Ogievskiy Oct. 14, 2020, 4:39 p.m. UTC | #6
14.10.2020 19:30, Max Reitz wrote:
> On 14.10.20 17:22, Vladimir Sementsov-Ogievskiy wrote:

>> 14.10.2020 15:51, Max Reitz wrote:

>>> On 12.10.20 19:43, Andrey Shinkevich wrote:

>>>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the

>>>> COR-driver to skip unneeded reading. It can be taken into account for

>>>> the COR-algorithms optimization. That check is being made during the

>>>> block stream job by the moment.

>>>>

>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>

>>>> ---

>>>>    block/copy-on-read.c | 13 +++++++++----

>>>>    block/io.c           |  3 ++-

>>>>    2 files changed, 11 insertions(+), 5 deletions(-)

>>>>

>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c

>>>> index b136895..278a11a 100644

>>>> --- a/block/copy-on-read.c

>>>> +++ b/block/copy-on-read.c

>>>> @@ -148,10 +148,15 @@ static int coroutine_fn

>>>> cor_co_preadv_part(BlockDriverState *bs,

>>>>                }

>>>>            }

>>>>    -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,

>>>> qiov_offset,

>>>> -                                  local_flags);

>>>> -        if (ret < 0) {

>>>> -            return ret;

>>>> +        if (!!(flags & BDRV_REQ_PREFETCH) &

>>>

>>> How about dropping the double negation and using a logical && instead of

>>> the binary &?

>>>

>>>> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {

>>>> +            /* Skip non-guest reads if no copy needed */

>>>> +        } else {

>>>

>>> Hm.  I would have just written the negated form

>>>

>>> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))

>>>

>>> and put the “skip” comment above that condition.

>>>

>>> (Since local_flags is initialized to flags, it can be written as a

>>> single comparison, but that’s a matter of taste and I’m not going to

>>> recommend either over the other:

>>>

>>> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=

>>> BDRV_REQ_PREFETCH)

>>>

>>> )

>>>

>>>> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,

>>>> qiov_offset,

>>>> +                                      local_flags);

>>>> +            if (ret < 0) {

>>>> +                return ret;

>>>> +            }

>>>>            }

>>>>              offset += n;

>>>> diff --git a/block/io.c b/block/io.c

>>>> index 11df188..bff1808 100644

>>>> --- a/block/io.c

>>>> +++ b/block/io.c

>>>> @@ -1512,7 +1512,8 @@ static int coroutine_fn

>>>> bdrv_aligned_preadv(BdrvChild *child,

>>>>          max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);

>>>>        if (bytes <= max_bytes && bytes <= max_transfer) {

>>>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov,

>>>> qiov_offset, 0);

>>>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,

>>>> +                                 flags & bs->supported_read_flags);

>>

>>

>> When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be)

>> NULL. This means, that we can't just drop the flag when call the driver

>> that doesn't support it.

> 

> True. :/

> 

>> Actually, if driver doesn't support the PREFETCH flag we should do nothing.

> 

> Hm.  But at least in the case of COR, PREFETCH is not something that can

> be optimized to be a no-op (unless the COR is a no-op); it still denotes

> a command that must be executed.

> 

> So if we can’t pass it to the driver, I don’t think we should do

> nothing, but to return an error.  Or maybe we could even assert that it

> isn’t set for drivers that don’t support it, because at least right now

> such a case would just be a bug.


Hmm. Reasonable..

So, let me summarize the cases:

1. bdrv_co_preadv(.. , PREFETCH | COR)

   In this case generic layer should handle both flags and pass flags=0 to driver

2. bdrv_co_preadv(.., PREFETCH)

2.1 driver supporst PREFETCH
   
   OK, pass PREFETCH to driver, no problems

2.2 driver doesn't support PREFETCH

   We can just abort() here, as the only source of PREFETCH without COR would be
   stream job driver, which must read from COR filter.

   More generic solution is to allocate temporary buffer (at least if qiov is NULL)
   and call underlying driver .preadv with flags=0 on that temporary buffer. But
   just abort() is simpler and should work for now.

> 

>>> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder

>>> why it isn’t.

>>>

>>

>>

>> Could it be part of patch 07? I mean introduce new field

>> supported_read_flags and handle it in generic code in one patch, prior

>> to implementing support for it in COR driver.

> 

> Sure.

> 

> Max

> 



-- 
Best regards,
Vladimir
Andrey Shinkevich Oct. 14, 2020, 6:57 p.m. UTC | #7
On 14.10.2020 15:01, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:

>> Limit COR operations by the base node in the backing chain when the

>> overlay base node name is given. It will be useful for a block stream

>> job when the COR-filter is applied. The overlay base node is passed as

>> the base itself may change due to concurrent commit jobs on the same

>> backing chain.

>>

>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>

>> ---

>>   block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--

>>   1 file changed, 37 insertions(+), 2 deletions(-)

>>

>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c

>> index c578b1b..dfbd6ad 100644

>> --- a/block/copy-on-read.c

>> +++ b/block/copy-on-read.c

>> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,

>>                                              size_t qiov_offset,

>>                                              int flags)

>>   {


[...]

>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),

>> +                                          state->base_overlay, true, offset,

>> +                                          n, &n);

>> +            if (ret) {

>> +                local_flags |= BDRV_REQ_COPY_ON_READ;

>> +            }

>> +        }

> 

> Furthermore, I just noticed – can the is_allocated functions not return

> 0 in @n, when @offset is a the EOF?  Is that something to look out for?

>   (I’m not sure.)

> 

> Max

> 


The check for EOF is managed earlier in the stream_run() for a 
block-stream job. For other cases of using the COR-filter, the check for 
EOF can be added to the cor_co_preadv_part().
I would be more than happy if we can escape the duplicated checking for 
is_allocated in the block-stream. But how the stream_run() can stop 
calling the blk_co_preadv() when EOF is reached if is_allocated removed 
from it? May the cor_co_preadv_part() return EOF (or other error code) 
to be handled by a caller if (ret == 0 && n == 0 && (flags & 
BDRV_REQ_PREFETCH)?

Andrey
Andrey Shinkevich Oct. 14, 2020, 8:49 p.m. UTC | #8
On 14.10.2020 15:51, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>> COR-driver to skip unneeded reading. It can be taken into account for
>> the COR-algorithms optimization. That check is being made during the
>> block stream job by the moment.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 13 +++++++++----
>>   block/io.c           |  3 ++-
>>   2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index b136895..278a11a 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>>               }
>>           }
>>   
>> -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
>> -                                  local_flags);
>> -        if (ret < 0) {
>> -            return ret;
>> +        if (!!(flags & BDRV_REQ_PREFETCH) &
> 
> How about dropping the double negation and using a logical && instead of
> the binary &?
> 

Yes, that's correct.

>> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
>> +            /* Skip non-guest reads if no copy needed */
>> +        } else {
> 
> Hm.  I would have just written the negated form
> 
> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
> 
> and put the “skip” comment above that condition.
> 
> (Since local_flags is initialized to flags, it can be written as a
> single comparison, but that’s a matter of taste and I’m not going to
> recommend either over the other:
> 

I played with the flags to make the idea obvious for the eye of a 
beholder: "we neither read nor write".
Comparing the BDRV_REQ_PREFETCH against the 'flags' means that the flag 
comes from outside of the function.
And the empty section means we do nothing in that case.
Eventually, I will pick up the brief expression below.
Thanks,

Andrey

> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
> BDRV_REQ_PREFETCH)
> 
> )
> 
>> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
>> +                                      local_flags);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>>           }
>>   
>>           offset += n;
>> diff --git a/block/io.c b/block/io.c
>> index 11df188..bff1808 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>>   
>>       max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>       if (bytes <= max_bytes && bytes <= max_transfer) {
>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>> +                                 flags & bs->supported_read_flags);
> 
> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
> why it isn’t.
> 
> Max
> 
>>           goto out;
>>       }
>>   
>>
> 
>
Andrey Shinkevich Oct. 15, 2020, 9:01 a.m. UTC | #9
On 14.10.2020 18:43, Vladimir Sementsov-Ogievskiy wrote:
> 14.10.2020 18:03, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> Whereas the block-stream job starts using a backing file name of the
>>> base node overlay after the block-stream job completes, mark the QMP
>>> 'backing-file' argument as deprecated.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   docs/system/deprecated.rst | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
>>> index 8b3ab5b..7491fcf 100644
>>> --- a/docs/system/deprecated.rst
>>> +++ b/docs/system/deprecated.rst
>>> @@ -285,6 +285,12 @@ details.
>>>   The ``query-events`` command has been superseded by the more powerful
>>>   and accurate ``query-qmp-schema`` command.
>>> +``block-stream`` argument ``backing-file`` (since 5.2)
>>> +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
>>> +
>>> +The argument ``backing-file`` is deprecated. QEMU uses a backing file
>>> +name of the base node overlay after the block-stream job completes.
>>> +
>>
>> Hm, why?  I don’t see the problem with it.
>>
> 
> My wrong idea, sorry. I believed that the argument is unused when I were 
> reviewing v10. But it actually become unused during the series and it is 
> wrong.
> 

I missed searching for calls to the qmp_block_stream() in the QEMU 
dinamically generated code. Will roll back.

Andrey
Max Reitz Oct. 15, 2020, 3:56 p.m. UTC | #10
On 14.10.20 20:57, Andrey Shinkevich wrote:
> On 14.10.2020 15:01, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> Limit COR operations by the base node in the backing chain when the
>>> overlay base node name is given. It will be useful for a block stream
>>> job when the COR-filter is applied. The overlay base node is passed as
>>> the base itself may change due to concurrent commit jobs on the same
>>> backing chain.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>   1 file changed, 37 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index c578b1b..dfbd6ad 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -122,8 +122,43 @@ static int coroutine_fn
>>> cor_co_preadv_part(BlockDriverState *bs,
>>>                                              size_t qiov_offset,
>>>                                              int flags)
>>>   {
> 
> [...]
> 
>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>>> +                                          state->base_overlay, true,
>>> offset,
>>> +                                          n, &n);
>>> +            if (ret) {
>>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>>> +            }
>>> +        }
>>
>> Furthermore, I just noticed – can the is_allocated functions not return
>> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>>   (I’m not sure.)
>>
>> Max
>>
> 
> The check for EOF is managed earlier in the stream_run() for a
> block-stream job. For other cases of using the COR-filter, the check for
> EOF can be added to the cor_co_preadv_part().
> I would be more than happy if we can escape the duplicated checking for
> is_allocated in the block-stream. But how the stream_run() can stop
> calling the blk_co_preadv() when EOF is reached if is_allocated removed
> from it?

True.  Is it that bad to lose that optimization, though?  (And I would
expect the case of a short backing file to be rather rare, too.)

> May the cor_co_preadv_part() return EOF (or other error code)
> to be handled by a caller if (ret == 0 && n == 0 && (flags &
> BDRV_REQ_PREFETCH)?

That sounds like a bad hack.  I’d rather keep the double is_allocated().

But what would be the problem with losing the short backing file
optimization?  Just performance?  Or would we end up writing actual
zeroes into the overlay past the end of the backing file?  Hm, probably
not, if the COR filter would detect that case and handle it like stream
does.

So it seems only a question of performance to me, and I don’t think it
would be that bad to in this rather rare case to have a bunch of useless
is_allocated and is_allocated_above calls past the backing file’s EOF.
(Maybe I’m wrong, though.)

Max
Vladimir Sementsov-Ogievskiy Oct. 22, 2020, 8:56 a.m. UTC | #11
22.10.2020 10:50, Andrey Shinkevich wrote:
> 
> On 21.10.2020 23:43, Andrey Shinkevich wrote:
>> On 14.10.2020 18:22, Vladimir Sementsov-Ogievskiy wrote:
>>> 14.10.2020 15:51, Max Reitz wrote:
>>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>>>>> COR-driver to skip unneeded reading. It can be taken into account for
>>>>> the COR-algorithms optimization. That check is being made during the
>>>>> block stream job by the moment.
>>>>>
>>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>>> ---
> 
> [...]
> 
>>>>> diff --git a/block/io.c b/block/io.c
>>>>> index 11df188..bff1808 100644
>>>>> --- a/block/io.c
>>>>> +++ b/block/io.c
>>>>> @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>>>>>       max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>>>>       if (bytes <= max_bytes && bytes <= max_transfer) {
>>>>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
>>>>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>>>>> +                                 flags & bs->supported_read_flags);
>>>
>>>
>>> When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.
>>>
>>> Actually, if driver doesn't support the PREFETCH flag we should do nothing.
>>>
>>>
>>>>
>>>> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
>>>> why it isn’t.
>>>>
>>>
>>>
>>> Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver.
>>>
>>>
>>
>> We have to add the supported flags for the COR driver in the same patch. Or before handling the supported_read_flags at the generic layer (handling zero does not make a sence). Otherwise, the test #216 (where the COR-filter is applied) will not pass.
>>
>> Andrey
> 
> I have found a workaround and am going to send all the related patches as a separate series.
> 

What is the problem?

If in a separate patch prior to modifying COR driver, we add new field supported_read_flags and add a support for it in generic layer, when no driver support it yet, nothing should be changed and all tests should pass..