Message ID | 20171110171050.19836-1-julien.grall@linaro.org |
---|---|
State | Accepted |
Commit | 0de212b03066571523f3174535bb4fb1264ca1de |
Headers | show |
Series | [Xen-devel,for-4.10] libs/evtchn: Remove active handler on clean-up or failure | expand |
On 11/10/2017 05:10 PM, Julien Grall wrote: > Commit 89d55473ed16543044a31d1e0d4660cf5a3f49df "xentoolcore_restrict_all: > Implement for libxenevtchn" added a call to register allowing to > restrict the event channel. > > However, the call to deregister the handler was not performed if open > failed or when closing the event channel. This will result to corrupt > the list of handlers and potentially crash the application later one. > > Fix it by calling xentoolcore_deregister_active_handle on failure and > closure. Thanks for fixing this. > > Signed-off-by: Julien Grall <julien.grall@linaro.org> > > --- > > This patch is fixing a bug introduced after the code freeze by > "xentoolcore_restrict_all: Implement for libxenevtchn". > > The call to xentoolcore_deregister_active_handle is done at the same > place as for the grants. But I am not convinced this is thread safe as > there are potential race between close the event channel and restict > handler. Do we care about that? Both xentoolcore__deregister_active_handle() and xentoolcore_restrict_all() hold the same lock when mutating the list so there shouldn't be a problem with the list itself. However, I think it should call xentoolcore__deregister_active_handle() _before_ calling osdep_evtchn_close() to avoid trying to restrict a closed fd or some other fd that happens to have the same number. I think all the other libs need to be fixed as well, unless there was a reason it was done this way.
Ross Lagerwall writes ("Re: [PATCH for-4.10] libs/evtchn: Remove active handler on clean-up or failure"): > On 11/10/2017 05:10 PM, Julien Grall wrote: > > Commit 89d55473ed16543044a31d1e0d4660cf5a3f49df "xentoolcore_restrict_all: > > Implement for libxenevtchn" added a call to register allowing to > > restrict the event channel. > > > > However, the call to deregister the handler was not performed if open > > failed or when closing the event channel. This will result to corrupt > > the list of handlers and potentially crash the application later one. Sorry for not spotting this during review. The fix is correct as far as it goes, so: Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> > > The call to xentoolcore_deregister_active_handle is done at the same > > place as for the grants. But I am not convinced this is thread safe as > > there are potential race between close the event channel and restict > > handler. Do we care about that? ... > However, I think it should call xentoolcore__deregister_active_handle() > _before_ calling osdep_evtchn_close() to avoid trying to restrict a > closed fd or some other fd that happens to have the same number. You are right. But this slightly weakens the guarantee provided by xentoolcore_restrict_all. > I think all the other libs need to be fixed as well, unless there was a > reason it was done this way. I will send a further patch. In the meantime I suggest we apply Julien's fix. Ian.
On 11/14/2017 11:51 AM, Ian Jackson wrote: > Ross Lagerwall writes ("Re: [PATCH for-4.10] libs/evtchn: Remove active handler on clean-up or failure"): >> On 11/10/2017 05:10 PM, Julien Grall wrote: >>> Commit 89d55473ed16543044a31d1e0d4660cf5a3f49df "xentoolcore_restrict_all: >>> Implement for libxenevtchn" added a call to register allowing to >>> restrict the event channel. >>> >>> However, the call to deregister the handler was not performed if open >>> failed or when closing the event channel. This will result to corrupt >>> the list of handlers and potentially crash the application later one. > > Sorry for not spotting this during review. > The fix is correct as far as it goes, so: > > Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> > >>> The call to xentoolcore_deregister_active_handle is done at the same >>> place as for the grants. But I am not convinced this is thread safe as >>> there are potential race between close the event channel and restict >>> handler. Do we care about that? > ... >> However, I think it should call xentoolcore__deregister_active_handle() >> _before_ calling osdep_evtchn_close() to avoid trying to restrict a >> closed fd or some other fd that happens to have the same number. > > You are right. But this slightly weakens the guarantee provided by > xentoolcore_restrict_all. > Now that I look at it, a similar scenario can happen during open. Since the handle is registered before it is actually opened, a concurrent xentoolcore_restrict_all() will try to restrict a handle that it not properly set up. I think it is OK if xentoolcore_restrict_all() works with any open handle where a handle is defined as open if it has _completed_ the call to e.g. xenevtchn_open() and has not yet called xenevtchn_close().
Hi, On 14/11/17 11:51, Ian Jackson wrote: > Ross Lagerwall writes ("Re: [PATCH for-4.10] libs/evtchn: Remove active handler on clean-up or failure"): >> On 11/10/2017 05:10 PM, Julien Grall wrote: >>> Commit 89d55473ed16543044a31d1e0d4660cf5a3f49df "xentoolcore_restrict_all: >>> Implement for libxenevtchn" added a call to register allowing to >>> restrict the event channel. >>> >>> However, the call to deregister the handler was not performed if open >>> failed or when closing the event channel. This will result to corrupt >>> the list of handlers and potentially crash the application later one. > > Sorry for not spotting this during review. > The fix is correct as far as it goes, so: > > Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> > >>> The call to xentoolcore_deregister_active_handle is done at the same >>> place as for the grants. But I am not convinced this is thread safe as >>> there are potential race between close the event channel and restict >>> handler. Do we care about that? > ... >> However, I think it should call xentoolcore__deregister_active_handle() >> _before_ calling osdep_evtchn_close() to avoid trying to restrict a >> closed fd or some other fd that happens to have the same number. > > You are right. But this slightly weakens the guarantee provided by > xentoolcore_restrict_all. > >> I think all the other libs need to be fixed as well, unless there was a >> reason it was done this way. > > I will send a further patch. In the meantime I suggest we apply > Julien's fix. I am going to leave the decision to you and Wei. It feels a bit odd to release-ack my patch :). Cheers,
Ross Lagerwall writes ("Re: [PATCH for-4.10] libs/evtchn: Remove active handler on clean-up or failure"): > Now that I look at it, a similar scenario can happen during open. Since > the handle is registered before it is actually opened, a concurrent > xentoolcore_restrict_all() will try to restrict a handle that it not > properly set up. I think this is not a problem because the handle has thing->fd = -1. So the restrict call will be a no-op (or give EBADF). Ian.
On Tue, Nov 14, 2017 at 12:14:14PM +0000, Julien Grall wrote: > Hi, > > On 14/11/17 11:51, Ian Jackson wrote: > > Ross Lagerwall writes ("Re: [PATCH for-4.10] libs/evtchn: Remove active handler on clean-up or failure"): > > > On 11/10/2017 05:10 PM, Julien Grall wrote: > > > > Commit 89d55473ed16543044a31d1e0d4660cf5a3f49df "xentoolcore_restrict_all: > > > > Implement for libxenevtchn" added a call to register allowing to > > > > restrict the event channel. > > > > > > > > However, the call to deregister the handler was not performed if open > > > > failed or when closing the event channel. This will result to corrupt > > > > the list of handlers and potentially crash the application later one. > > > > Sorry for not spotting this during review. > > The fix is correct as far as it goes, so: > > > > Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> > > > > > > The call to xentoolcore_deregister_active_handle is done at the same > > > > place as for the grants. But I am not convinced this is thread safe as > > > > there are potential race between close the event channel and restict > > > > handler. Do we care about that? > > ... > > > However, I think it should call xentoolcore__deregister_active_handle() > > > _before_ calling osdep_evtchn_close() to avoid trying to restrict a > > > closed fd or some other fd that happens to have the same number. > > > > You are right. But this slightly weakens the guarantee provided by > > xentoolcore_restrict_all. > > > > > I think all the other libs need to be fixed as well, unless there was a > > > reason it was done this way. > > > > I will send a further patch. In the meantime I suggest we apply > > Julien's fix. > > I am going to leave the decision to you and Wei. It feels a bit odd to > release-ack my patch :). We can only commit patches that are both acked and release-acked. The latter gives RM control over when the patch should be applied. Sometimes it is better to wait until something else happens (like getting the tree to a stable state). That's how I used release-ack anyway. For this particular patch, my interpretation of what you just said is you've given us release-ack and we can apply this patch anytime. I will commit it soon.
On Tue, Nov 14, 2017 at 12:15:42PM +0000, Ian Jackson wrote: > Closing the fd before unhooking it from the list runs the risk that a > concurrent thread calls xentoolcore_restrict_all will operate on the > old fd value, which might refer to a new fd by then. So we need to do > it in the other order. > > Sadly this weakens the guarantee provided by xentoolcore_restrict_all > slight, but not (I think) in a problematic way. It would be possible slightly > to implement the previous guarantee, but it would involve replacing > all of the close() calls in all of the individual osdep parts of all > of the individual libraries with calls to a new function which does > dup2("/dev/null", thing->fd); > pthread_mutex_lock(&handles_lock); > thing->fd = -1; > pthread_mutex_unlock(&handles_lock); > close(fd); > which would be terribly tedious. > > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Hi, On 14/11/17 14:02, Wei Liu wrote: > On Tue, Nov 14, 2017 at 12:15:42PM +0000, Ian Jackson wrote: >> Closing the fd before unhooking it from the list runs the risk that a >> concurrent thread calls xentoolcore_restrict_all will operate on the >> old fd value, which might refer to a new fd by then. So we need to do >> it in the other order. >> >> Sadly this weakens the guarantee provided by xentoolcore_restrict_all >> slight, but not (I think) in a problematic way. It would be possible > > slightly > >> to implement the previous guarantee, but it would involve replacing >> all of the close() calls in all of the individual osdep parts of all >> of the individual libraries with calls to a new function which does >> dup2("/dev/null", thing->fd); >> pthread_mutex_lock(&handles_lock); >> thing->fd = -1; >> pthread_mutex_unlock(&handles_lock); >> close(fd); >> which would be terribly tedious. >> >> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > > Acked-by: Wei Liu <wei.liu2@citrix.com> I think this is 4.10 material, xentoolcore was introduced in this release and it would be good to have it right from now. I want to confirm that you are both happy with that? Cheers,
Hi Wei, On 14/11/17 13:53, Wei Liu wrote: > On Tue, Nov 14, 2017 at 12:14:14PM +0000, Julien Grall wrote: >> Hi, >> >> On 14/11/17 11:51, Ian Jackson wrote: >>> Ross Lagerwall writes ("Re: [PATCH for-4.10] libs/evtchn: Remove active handler on clean-up or failure"): >>>> On 11/10/2017 05:10 PM, Julien Grall wrote: >>>>> Commit 89d55473ed16543044a31d1e0d4660cf5a3f49df "xentoolcore_restrict_all: >>>>> Implement for libxenevtchn" added a call to register allowing to >>>>> restrict the event channel. >>>>> >>>>> However, the call to deregister the handler was not performed if open >>>>> failed or when closing the event channel. This will result to corrupt >>>>> the list of handlers and potentially crash the application later one. >>> >>> Sorry for not spotting this during review. >>> The fix is correct as far as it goes, so: >>> >>> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> >>> >>>>> The call to xentoolcore_deregister_active_handle is done at the same >>>>> place as for the grants. But I am not convinced this is thread safe as >>>>> there are potential race between close the event channel and restict >>>>> handler. Do we care about that? >>> ... >>>> However, I think it should call xentoolcore__deregister_active_handle() >>>> _before_ calling osdep_evtchn_close() to avoid trying to restrict a >>>> closed fd or some other fd that happens to have the same number. >>> >>> You are right. But this slightly weakens the guarantee provided by >>> xentoolcore_restrict_all. >>> >>>> I think all the other libs need to be fixed as well, unless there was a >>>> reason it was done this way. >>> >>> I will send a further patch. In the meantime I suggest we apply >>> Julien's fix. >> >> I am going to leave the decision to you and Wei. It feels a bit odd to >> release-ack my patch :). > > We can only commit patches that are both acked and release-acked. The > latter gives RM control over when the patch should be applied. > Sometimes it is better to wait until something else happens (like > getting the tree to a stable state). > > That's how I used release-ack anyway. I feel a bit odd to release-ack my patch and usually for Arm patches deferred to Stefano the decision whether the patch is suitable for the release. > > For this particular patch, my interpretation of what you just said > is you've given us release-ack and we can apply this patch anytime. I > will commit it soon. Thanks! I hope it will fixed some osstest failure. Cheers,
On 11/14/2017 12:15 PM, Ian Jackson wrote: > Closing the fd before unhooking it from the list runs the risk that a > concurrent thread calls xentoolcore_restrict_all will operate on the > old fd value, which might refer to a new fd by then. So we need to do > it in the other order. > > Sadly this weakens the guarantee provided by xentoolcore_restrict_all > slight, but not (I think) in a problematic way. It would be possible > to implement the previous guarantee, but it would involve replacing > all of the close() calls in all of the individual osdep parts of all > of the individual libraries with calls to a new function which does > dup2("/dev/null", thing->fd); > pthread_mutex_lock(&handles_lock); > thing->fd = -1; > pthread_mutex_unlock(&handles_lock); > close(fd); > which would be terribly tedious. > ... > diff --git a/tools/libs/toolcore/include/xentoolcore.h b/tools/libs/toolcore/include/xentoolcore.h > index 8d28c2d..b3a3c93 100644 > --- a/tools/libs/toolcore/include/xentoolcore.h > +++ b/tools/libs/toolcore/include/xentoolcore.h > @@ -39,6 +39,15 @@ > * fail (even though such a call is potentially meaningful). > * (If called again with a different domid, it will necessarily fail.) > * > + * Note for multi-threaded programs: If xentoolcore_restrict_all is > + * called concurrently with a function which /or closes Xen library "which /or closes..." - Is this a typo? > + * handles (e.g. libxl_ctx_free, xs_close), the restriction is only > + * guaranteed to be effective after all of the closing functions have > + * returned, even if that is later than the return from > + * xentoolcore_restrict_all. (Of course if xentoolcore_restrict_all > + * it is called concurrently with opening functions, the new handles > + * might or might not be restricted.) > + * > * ==================================================================== > * IMPORTANT - IMPLEMENTATION STATUS > * > diff --git a/tools/libs/toolcore/include/xentoolcore_internal.h b/tools/libs/toolcore/include/xentoolcore_internal.h > index dbdb1dd..04f5848 100644 > --- a/tools/libs/toolcore/include/xentoolcore_internal.h > +++ b/tools/libs/toolcore/include/xentoolcore_internal.h > @@ -48,8 +48,10 @@ > * 4. ONLY THEN actually open the relevant fd or whatever > * > * III. during the "close handle" function > - * 1. FIRST close the relevant fd or whatever > - * 2. call xentoolcore__deregister_active_handle > + * 1. FIRST call xentoolcore__deregister_active_handle > + * 2. close the relevant fd or whatever > + * > + * [ III(b). Do the same as III for error exit from the open function. ] > * > * IV. in the restrict_callback function > * * Arrange that the fd (or other handle) can no longer by used > diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c > index 23f3f09..abffd9c 100644 > --- a/tools/xenstore/xs.c > +++ b/tools/xenstore/xs.c > @@ -279,9 +279,9 @@ err: > saved_errno = errno; > > if (h) { > + xentoolcore__deregister_active_handle(&h->tc_ah); > if (h->fd >= 0) > close(h->fd); > - xentoolcore__deregister_active_handle(&h->tc_ah); > } > free(h); > > @@ -342,8 +342,8 @@ static void close_fds_free(struct xs_handle *h) { > close(h->watch_pipe[1]); > } > > - close(h->fd); > xentoolcore__deregister_active_handle(&h->tc_ah); > + close(h->fd); > Since the rest of this file uses tabs, you may as well use tabs for this line as well. Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Julien Grall writes ("Re: [PATCH] tools: xentoolcore_restrict_all: Do deregistration before close"): > I think this is 4.10 material, xentoolcore was introduced in this > release and it would be good to have it right from now. I want to > confirm that you are both happy with that? Yes, absolutely. Sorry, I forgot the for-4.10 tag in the Subject. Ian.
Ross Lagerwall writes ("Re: [PATCH] tools: xentoolcore_restrict_all: Do deregistration before close"): > On 11/14/2017 12:15 PM, Ian Jackson wrote: > > + * Note for multi-threaded programs: If xentoolcore_restrict_all is > > + * called concurrently with a function which /or closes Xen library > > "which /or closes..." - Is this a typo? Yes, fixed, thanks. > > - close(h->fd); > > xentoolcore__deregister_active_handle(&h->tc_ah); > > + close(h->fd); > > > > Since the rest of this file uses tabs, you may as well use tabs for this > line as well. I didn't change the use of tabs vs. the use of spaces. > Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com> Thanks, Ian.
Hi Ian, On 14/11/17 14:57, Ian Jackson wrote: > Julien Grall writes ("Re: [PATCH] tools: xentoolcore_restrict_all: Do deregistration before close"): >> I think this is 4.10 material, xentoolcore was introduced in this >> release and it would be good to have it right from now. I want to >> confirm that you are both happy with that? > > Yes, absolutely. Sorry, I forgot the for-4.10 tag in the Subject. Release-acked-by: Julien Grall <julien.grall@linaro.org> Cheers,
diff --git a/tools/libs/evtchn/core.c b/tools/libs/evtchn/core.c index 14b7549a6b..2dba58bf00 100644 --- a/tools/libs/evtchn/core.c +++ b/tools/libs/evtchn/core.c @@ -56,6 +56,7 @@ xenevtchn_handle *xenevtchn_open(xentoollog_logger *logger, unsigned open_flags) err: osdep_evtchn_close(xce); + xentoolcore__deregister_active_handle(&xce->tc_ah); xtl_logger_destroy(xce->logger_tofree); free(xce); return NULL; @@ -69,6 +70,7 @@ int xenevtchn_close(xenevtchn_handle *xce) return 0; rc = osdep_evtchn_close(xce); + xentoolcore__deregister_active_handle(&xce->tc_ah); xtl_logger_destroy(xce->logger_tofree); free(xce); return rc;
Commit 89d55473ed16543044a31d1e0d4660cf5a3f49df "xentoolcore_restrict_all: Implement for libxenevtchn" added a call to register allowing to restrict the event channel. However, the call to deregister the handler was not performed if open failed or when closing the event channel. This will result to corrupt the list of handlers and potentially crash the application later one. Fix it by calling xentoolcore_deregister_active_handle on failure and closure. Signed-off-by: Julien Grall <julien.grall@linaro.org> --- This patch is fixing a bug introduced after the code freeze by "xentoolcore_restrict_all: Implement for libxenevtchn". The call to xentoolcore_deregister_active_handle is done at the same place as for the grants. But I am not convinced this is thread safe as there are potential race between close the event channel and restict handler. Do we care about that? --- tools/libs/evtchn/core.c | 2 ++ 1 file changed, 2 insertions(+)