diff mbox series

block: End quiescent sections when a BDS is deleted

Message ID 160344969243.4091343.14371338409686732879.stgit@bahia.lan
State Superseded
Headers show
Series block: End quiescent sections when a BDS is deleted | expand

Commit Message

Greg Kurz Oct. 23, 2020, 10:41 a.m. UTC
If a BDS gets deleted during blk_drain_all(), it might miss a
call to bdrv_do_drained_end(). This means missing a call to
aio_enable_external() and the AIO context remains disabled for
ever. This can cause a device to become irresponsive and to
disrupt the guest execution, ie. hang, loop forever or worse.

This scenario is quite easy to encounter with virtio-scsi
on POWER when punching multiple blockdev-create QMP commands
while the guest is booting and it is still running the SLOF
firmware. This happens because SLOF disables/re-enables PCI
devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it
tends to work with a single device at a time at various stages
like probing and running block/network bootloaders without
doing a full reset in-between. This naturally generates many
dataplane stops and starts, and thus many drain sections that
can race with blockdev_create_run(). In the end, SLOF bails
out.

It is somehow reproducible on x86 but it requires to generate
articial dataplane start/stop activity with stop/cont QMP
commands. In this case, seabios ends up looping for ever,
waiting for the virtio-scsi device to send a response to
a command it never received.

Add a helper that pairs all previously called bdrv_do_drained_begin()
with a bdrv_do_drained_end() and call it from bdrv_close().
While at it, update the "/bdrv-drain/graph-change/drain_all"
test in test-bdrv-drain so that it can catch the issue.

BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 block.c                 |    9 +++++++++
 block/io.c              |   13 +++++++++++++
 include/block/block.h   |    6 ++++++
 tests/test-bdrv-drain.c |    1 +
 4 files changed, 29 insertions(+)

Comments

no-reply@patchew.org Oct. 23, 2020, 10:48 a.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/160344969243.4091343.14371338409686732879.stgit@bahia.lan/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 160344969243.4091343.14371338409686732879.stgit@bahia.lan
Subject: [PATCH] block: End quiescent sections when a BDS is deleted

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/160344969243.4091343.14371338409686732879.stgit@bahia.lan -> patchew/160344969243.4091343.14371338409686732879.stgit@bahia.lan
 - [tag update]      patchew/20201023101222.250147-1-kwolf@redhat.com -> patchew/20201023101222.250147-1-kwolf@redhat.com
Switched to a new branch 'test'
f9501f8 block: End quiescent sections when a BDS is deleted

=== OUTPUT BEGIN ===
ERROR: Use g_assert or g_assert_not_reached
#73: FILE: block/io.c:640:
+    g_assert_cmpint(bs->quiesce_counter, >, 0);

ERROR: Use g_assert or g_assert_not_reached
#74: FILE: block/io.c:641:
+    g_assert_cmpint(bs->refcnt, ==, 0);

total: 2 errors, 0 warnings, 53 lines checked

Commit f9501f846de1 (block: End quiescent sections when a BDS is deleted) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/160344969243.4091343.14371338409686732879.stgit@bahia.lan/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Greg Kurz Oct. 23, 2020, 10:57 a.m. UTC | #2
On Fri, 23 Oct 2020 03:48:39 -0700
<no-reply@patchew.org> wrote:

> Patchew URL: https://patchew.org/QEMU/160344969243.4091343.14371338409686732879.stgit@bahia.lan/
> 
> 
> 
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Type: series
> Message-id: 160344969243.4091343.14371338409686732879.stgit@bahia.lan
> Subject: [PATCH] block: End quiescent sections when a BDS is deleted
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> git rev-parse base > /dev/null || exit 0
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> git config --local diff.algorithm histogram
> ./scripts/checkpatch.pl --mailback base..
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
>  * [new tag]         patchew/160344969243.4091343.14371338409686732879.stgit@bahia.lan -> patchew/160344969243.4091343.14371338409686732879.stgit@bahia.lan
>  - [tag update]      patchew/20201023101222.250147-1-kwolf@redhat.com -> patchew/20201023101222.250147-1-kwolf@redhat.com
> Switched to a new branch 'test'
> f9501f8 block: End quiescent sections when a BDS is deleted
> 
> === OUTPUT BEGIN ===
> ERROR: Use g_assert or g_assert_not_reached
> #73: FILE: block/io.c:640:
> +    g_assert_cmpint(bs->quiesce_counter, >, 0);
> 

Ah... sorry I wasn't aware of the story behind glib commit a6a875068779,
I'll post a v2 what uses g_assert() instead.

> ERROR: Use g_assert or g_assert_not_reached
> #74: FILE: block/io.c:641:
> +    g_assert_cmpint(bs->refcnt, ==, 0);
> 
> total: 2 errors, 0 warnings, 53 lines checked
> 
> Commit f9501f846de1 (block: End quiescent sections when a BDS is deleted) has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> === OUTPUT END ===
> 
> Test command exited with code: 1
> 
> 
> The full log is available at
> http://patchew.org/logs/160344969243.4091343.14371338409686732879.stgit@bahia.lan/testing.checkpatch/?type=message.
> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-devel@redhat.com
Kevin Wolf Oct. 23, 2020, 2:18 p.m. UTC | #3
Am 23.10.2020 um 12:41 hat Greg Kurz geschrieben:
> If a BDS gets deleted during blk_drain_all(), it might miss a
> call to bdrv_do_drained_end(). This means missing a call to
> aio_enable_external() and the AIO context remains disabled for
> ever. This can cause a device to become irresponsive and to
> disrupt the guest execution, ie. hang, loop forever or worse.
> 
> This scenario is quite easy to encounter with virtio-scsi
> on POWER when punching multiple blockdev-create QMP commands
> while the guest is booting and it is still running the SLOF
> firmware. This happens because SLOF disables/re-enables PCI
> devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
> register after the initial probe/feature negotiation, as it
> tends to work with a single device at a time at various stages
> like probing and running block/network bootloaders without
> doing a full reset in-between. This naturally generates many
> dataplane stops and starts, and thus many drain sections that
> can race with blockdev_create_run(). In the end, SLOF bails
> out.
> 
> It is somehow reproducible on x86 but it requires to generate
> articial dataplane start/stop activity with stop/cont QMP
> commands. In this case, seabios ends up looping for ever,
> waiting for the virtio-scsi device to send a response to
> a command it never received.
> 
> Add a helper that pairs all previously called bdrv_do_drained_begin()
> with a bdrv_do_drained_end() and call it from bdrv_close().
> While at it, update the "/bdrv-drain/graph-change/drain_all"
> test in test-bdrv-drain so that it can catch the issue.
> 
> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  block.c                 |    9 +++++++++
>  block/io.c              |   13 +++++++++++++
>  include/block/block.h   |    6 ++++++
>  tests/test-bdrv-drain.c |    1 +
>  4 files changed, 29 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 430edf79bb10..ddcb36dd48a8 100644
> --- a/block.c
> +++ b/block.c
> @@ -4458,6 +4458,15 @@ static void bdrv_close(BlockDriverState *bs)
>      }
>      QLIST_INIT(&bs->aio_notifiers);
>      bdrv_drained_end(bs);
> +
> +    /*
> +     * If we're still inside some bdrv_drain_all_begin()/end() sections, end
> +     * them now since this BDS won't exist anymore when bdrv_drain_all_end()
> +     * gets called.
> +     */
> +    if (bs->quiesce_counter) {
> +        bdrv_drained_end_quiesce(bs);
> +    }
>  }
>  
>  void bdrv_close_all(void)
> diff --git a/block/io.c b/block/io.c
> index 54f0968aee27..8a0da06bbb14 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -633,6 +633,19 @@ void bdrv_drain_all_begin(void)
>      }
>  }
>  
> +void bdrv_drained_end_quiesce(BlockDriverState *bs)

I think the name should make clear that this is meant as a counterpart
for bdrv_drain_all_begin(), so maybe bdrv_drain_all_end_quiesce()?

(The function is not suitable for any other kinds of drain because the
parameters it passes to bdrv_do_drained_end() are only the same as for
bdrv_drain_all_begin().)

> +{
> +    int drained_end_counter = 0;
> +
> +    g_assert_cmpint(bs->quiesce_counter, >, 0);
> +    g_assert_cmpint(bs->refcnt, ==, 0);

By the way, I didn't know about the problem with these macros either.

> +    while (bs->quiesce_counter) {
> +        bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
> +    }
> +    BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
> +}
> +
>  void bdrv_drain_all_end(void)
>  {
>      BlockDriverState *bs = NULL;
> diff --git a/include/block/block.h b/include/block/block.h
> index d16c401cb44e..c0ce6e690081 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -779,6 +779,12 @@ void bdrv_drained_end(BlockDriverState *bs);
>   */
>  void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
>  
> +/**
> + * End all quiescent sections started by bdrv_drain_all_begin(). This is
> + * only needed when deleting a BDS before bdrv_drain_all_end() is called.
> + */
> +void bdrv_drained_end_quiesce(BlockDriverState *bs);
> +
>  /**
>   * End a quiescent section started by bdrv_subtree_drained_begin().
>   */
> diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
> index 1595bbc92e9e..8a29e33e004a 100644
> --- a/tests/test-bdrv-drain.c
> +++ b/tests/test-bdrv-drain.c
> @@ -594,6 +594,7 @@ static void test_graph_change_drain_all(void)
>  
>      g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
>      g_assert_cmpint(b_s->drain_count, ==, 0);
> +    g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0);

But here in the test case we should keep g_assert_cmpint() because it
gives better error messages when it fails (and checkpatch doesn't warn
about it in tests).

Apart from the naming and checkpatch, this looks good to me.

Kevin
Greg Kurz Oct. 23, 2020, 2:45 p.m. UTC | #4
On Fri, 23 Oct 2020 16:18:05 +0200
Kevin Wolf <kwolf@redhat.com> wrote:

> Am 23.10.2020 um 12:41 hat Greg Kurz geschrieben:
> > If a BDS gets deleted during blk_drain_all(), it might miss a
> > call to bdrv_do_drained_end(). This means missing a call to
> > aio_enable_external() and the AIO context remains disabled for
> > ever. This can cause a device to become irresponsive and to
> > disrupt the guest execution, ie. hang, loop forever or worse.
> > 
> > This scenario is quite easy to encounter with virtio-scsi
> > on POWER when punching multiple blockdev-create QMP commands
> > while the guest is booting and it is still running the SLOF
> > firmware. This happens because SLOF disables/re-enables PCI
> > devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
> > register after the initial probe/feature negotiation, as it
> > tends to work with a single device at a time at various stages
> > like probing and running block/network bootloaders without
> > doing a full reset in-between. This naturally generates many
> > dataplane stops and starts, and thus many drain sections that
> > can race with blockdev_create_run(). In the end, SLOF bails
> > out.
> > 
> > It is somehow reproducible on x86 but it requires to generate
> > articial dataplane start/stop activity with stop/cont QMP
> > commands. In this case, seabios ends up looping for ever,
> > waiting for the virtio-scsi device to send a response to
> > a command it never received.
> > 
> > Add a helper that pairs all previously called bdrv_do_drained_begin()
> > with a bdrv_do_drained_end() and call it from bdrv_close().
> > While at it, update the "/bdrv-drain/graph-change/drain_all"
> > test in test-bdrv-drain so that it can catch the issue.
> > 
> > BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  block.c                 |    9 +++++++++
> >  block/io.c              |   13 +++++++++++++
> >  include/block/block.h   |    6 ++++++
> >  tests/test-bdrv-drain.c |    1 +
> >  4 files changed, 29 insertions(+)
> > 
> > diff --git a/block.c b/block.c
> > index 430edf79bb10..ddcb36dd48a8 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -4458,6 +4458,15 @@ static void bdrv_close(BlockDriverState *bs)
> >      }
> >      QLIST_INIT(&bs->aio_notifiers);
> >      bdrv_drained_end(bs);
> > +
> > +    /*
> > +     * If we're still inside some bdrv_drain_all_begin()/end() sections, end
> > +     * them now since this BDS won't exist anymore when bdrv_drain_all_end()
> > +     * gets called.
> > +     */
> > +    if (bs->quiesce_counter) {
> > +        bdrv_drained_end_quiesce(bs);
> > +    }
> >  }
> >  
> >  void bdrv_close_all(void)
> > diff --git a/block/io.c b/block/io.c
> > index 54f0968aee27..8a0da06bbb14 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -633,6 +633,19 @@ void bdrv_drain_all_begin(void)
> >      }
> >  }
> >  
> > +void bdrv_drained_end_quiesce(BlockDriverState *bs)
> 
> I think the name should make clear that this is meant as a counterpart
> for bdrv_drain_all_begin(), so maybe bdrv_drain_all_end_quiesce()?
> 

Sure.

> (The function is not suitable for any other kinds of drain because the
> parameters it passes to bdrv_do_drained_end() are only the same as for
> bdrv_drain_all_begin().)
> 
> > +{
> > +    int drained_end_counter = 0;
> > +
> > +    g_assert_cmpint(bs->quiesce_counter, >, 0);
> > +    g_assert_cmpint(bs->refcnt, ==, 0);
> 
> By the way, I didn't know about the problem with these macros either.
> 
> > +    while (bs->quiesce_counter) {
> > +        bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
> > +    }
> > +    BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
> > +}
> > +
> >  void bdrv_drain_all_end(void)
> >  {
> >      BlockDriverState *bs = NULL;
> > diff --git a/include/block/block.h b/include/block/block.h
> > index d16c401cb44e..c0ce6e690081 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -779,6 +779,12 @@ void bdrv_drained_end(BlockDriverState *bs);
> >   */
> >  void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
> >  
> > +/**
> > + * End all quiescent sections started by bdrv_drain_all_begin(). This is
> > + * only needed when deleting a BDS before bdrv_drain_all_end() is called.
> > + */
> > +void bdrv_drained_end_quiesce(BlockDriverState *bs);
> > +
> >  /**
> >   * End a quiescent section started by bdrv_subtree_drained_begin().
> >   */
> > diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
> > index 1595bbc92e9e..8a29e33e004a 100644
> > --- a/tests/test-bdrv-drain.c
> > +++ b/tests/test-bdrv-drain.c
> > @@ -594,6 +594,7 @@ static void test_graph_change_drain_all(void)
> >  
> >      g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
> >      g_assert_cmpint(b_s->drain_count, ==, 0);
> > +    g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0);
> 
> But here in the test case we should keep g_assert_cmpint() because it
> gives better error messages when it fails (and checkpatch doesn't warn
> about it in tests).
> 

Sure, it makes sense to keep them in tests. The limitation only applies
to the core code.

> Apart from the naming and checkpatch, this looks good to me.
> 

Cool, I'll send a v2 shortly.

> Kevin
>
diff mbox series

Patch

diff --git a/block.c b/block.c
index 430edf79bb10..ddcb36dd48a8 100644
--- a/block.c
+++ b/block.c
@@ -4458,6 +4458,15 @@  static void bdrv_close(BlockDriverState *bs)
     }
     QLIST_INIT(&bs->aio_notifiers);
     bdrv_drained_end(bs);
+
+    /*
+     * If we're still inside some bdrv_drain_all_begin()/end() sections, end
+     * them now since this BDS won't exist anymore when bdrv_drain_all_end()
+     * gets called.
+     */
+    if (bs->quiesce_counter) {
+        bdrv_drained_end_quiesce(bs);
+    }
 }
 
 void bdrv_close_all(void)
diff --git a/block/io.c b/block/io.c
index 54f0968aee27..8a0da06bbb14 100644
--- a/block/io.c
+++ b/block/io.c
@@ -633,6 +633,19 @@  void bdrv_drain_all_begin(void)
     }
 }
 
+void bdrv_drained_end_quiesce(BlockDriverState *bs)
+{
+    int drained_end_counter = 0;
+
+    g_assert_cmpint(bs->quiesce_counter, >, 0);
+    g_assert_cmpint(bs->refcnt, ==, 0);
+
+    while (bs->quiesce_counter) {
+        bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
+    }
+    BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
+}
+
 void bdrv_drain_all_end(void)
 {
     BlockDriverState *bs = NULL;
diff --git a/include/block/block.h b/include/block/block.h
index d16c401cb44e..c0ce6e690081 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -779,6 +779,12 @@  void bdrv_drained_end(BlockDriverState *bs);
  */
 void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
 
+/**
+ * End all quiescent sections started by bdrv_drain_all_begin(). This is
+ * only needed when deleting a BDS before bdrv_drain_all_end() is called.
+ */
+void bdrv_drained_end_quiesce(BlockDriverState *bs);
+
 /**
  * End a quiescent section started by bdrv_subtree_drained_begin().
  */
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 1595bbc92e9e..8a29e33e004a 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -594,6 +594,7 @@  static void test_graph_change_drain_all(void)
 
     g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
     g_assert_cmpint(b_s->drain_count, ==, 0);
+    g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0);
 
     bdrv_unref(bs_b);
     blk_unref(blk_b);