Message ID | 1350570527-24187-1-git-send-email-peter.maydell@linaro.org |
---|---|
State | Superseded |
Headers | show |
Il 18/10/2012 16:28, Peter Maydell ha scritto: > The function vnc_stop_worker_thread() is buggy, beacuse it tries to > delete jobs from the worker thread's queue but the worker thread itself > will not cope with this happening (it would end up trying to remove > an already-removed list item from its queue list). Fortunately > nobody ever calls vnc_stop_worker_thread(), so we can fix this by > simply deleting all the untested racy code. Note that there is just one queue. The queue global == the arg argument of vnc_worker_thread == the queue argument of vnc_worker_thread_loop. So I'm not sure I follow your reasoning. So the bug may be that we never call vnc_stop_worker_thread from vnc_disconnect_finish. BTW vnc_jobs_join is called there so we could just assert that the queue is empty... Paolo > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > Seems the easiest way to deal with this bug spotted via code > inspection :-)
On 18 October 2012 16:01, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 18/10/2012 16:28, Peter Maydell ha scritto: >> The function vnc_stop_worker_thread() is buggy, beacuse it tries to >> delete jobs from the worker thread's queue but the worker thread itself >> will not cope with this happening (it would end up trying to remove >> an already-removed list item from its queue list). Fortunately >> nobody ever calls vnc_stop_worker_thread(), so we can fix this by >> simply deleting all the untested racy code. > > Note that there is just one queue. The queue global == the arg argument > of vnc_worker_thread == the queue argument of vnc_worker_thread_loop. > So I'm not sure I follow your reasoning. vnc_worker_thread_loop does this: lock queue get pointer to first job in queue unlock queue do stuff with job lock queue QTAILQ_REMOVE(&queue->jobs, job, next) unlock queue g_free(job) vnc_jobs_clear does this: lock queue QTAILQ_REMOVE each job from the queue unlock queue So two problems: (1) any job removed by vnc_jobs_clear is never g_free()d (2) if vnc_jobs_clear removes job X from the queue while the worker loop is in its "do stuff with job" phase, then the worker loop will subsequently try to QTAILQ_REMOVE an object from a list it is not in, which will probably crash or otherwise misbehave Here's a third which I just noticed: (3) vnc_stop_worker thread sets queue->exit to true, and then does a number of things with 'queue'. However, as soon as you've set queue->exit you can't touch 'queue' again, because the worker thread might (if you were unlucky) be just about to do the 'if (queue->exit) { return -1; }', which will cause it to destroy the condvar and muteux and free the queue memory. In particular, even doing the 'vnc_unlock_queue()' in vnc_stop_worker_thread() is unsafe, because the worker thread does its exit-check without the lock held, so it could destroy the mutex before you unlocked it. > So the bug may be that we never call vnc_stop_worker_thread from > vnc_disconnect_finish. BTW vnc_jobs_join is called there so we could > just assert that the queue is empty... Yes, actually trying to find somewhere to make a clean shutdown of the worker thread and fixing all the bugs in the shutdown code would be the other approach. That seems like hard work to me :-) -- PMM
Il 18/10/2012 17:12, Peter Maydell ha scritto: > On 18 October 2012 16:01, Paolo Bonzini <pbonzini@redhat.com> wrote: >> Il 18/10/2012 16:28, Peter Maydell ha scritto: >>> The function vnc_stop_worker_thread() is buggy, beacuse it tries to >>> delete jobs from the worker thread's queue but the worker thread itself >>> will not cope with this happening (it would end up trying to remove >>> an already-removed list item from its queue list). Fortunately >>> nobody ever calls vnc_stop_worker_thread(), so we can fix this by >>> simply deleting all the untested racy code. >> >> Note that there is just one queue. The queue global == the arg argument >> of vnc_worker_thread == the queue argument of vnc_worker_thread_loop. >> So I'm not sure I follow your reasoning. > > vnc_worker_thread_loop does this: > lock queue > get pointer to first job in queue > unlock queue > do stuff with job > lock queue > QTAILQ_REMOVE(&queue->jobs, job, next) > unlock queue Uh, right... the QTAILQ_REMOVE looks completely misplaced, otoh vnc_has_job relies on that. > g_free(job) > > vnc_jobs_clear does this: > lock queue > QTAILQ_REMOVE each job from the queue > unlock queue > > So two problems: > (1) any job removed by vnc_jobs_clear is never g_free()d > (2) if vnc_jobs_clear removes job X from the queue while the worker loop > is in its "do stuff with job" phase, then the worker loop will > subsequently try to QTAILQ_REMOVE an object from a list it is not > in, which will probably crash or otherwise misbehave ... but vnc_jobs_clear should be dead code because vnc_jobs_join is called first. So the "right fix" would include anyway the half of your patch that deletes vnc_jobs_clear, and would revert the other half... > Here's a third which I just noticed: > (3) vnc_stop_worker thread sets queue->exit to true, and then does > a number of things with 'queue'. However, as soon as you've set > queue->exit you can't touch 'queue' again, because the worker > thread might (if you were unlucky) be just about to do the > 'if (queue->exit) { return -1; }', which will cause it to destroy > the condvar and muteux and free the queue memory. In particular, > even doing the 'vnc_unlock_queue()' in vnc_stop_worker_thread() > is unsafe, because the worker thread does its exit-check without > the lock held, so it could destroy the mutex before you unlocked it. Right, the solution for this is to move the destruction to the caller, because now we have joinable QemuThreads and those are a better way to synchronize. Paolo >> So the bug may be that we never call vnc_stop_worker_thread from >> vnc_disconnect_finish. BTW vnc_jobs_join is called there so we could >> just assert that the queue is empty... > > Yes, actually trying to find somewhere to make a clean shutdown > of the worker thread and fixing all the bugs in the shutdown > code would be the other approach. That seems like hard work to me :-) > > -- PMM > >
Another bug: (4) if vnc_job_push() discovers that queue->exit is true it will free the job but leak its rectangle list (5) the early-exit ("goto disconnected") code paths in vnc_worker_thread_loop() also leak the rectangle list And a couple of inefficiencies/oddities which aren't actually bugs: (6) An inefficiency (unnecessary lock/unlocks): the code locks and unlocks the queue in vnc_job_new() and vnc_job_add_rect() when it is manipulating the job->rectangles list. In the former case this is definitely pointless -- we've just alloc'd the VncJob struct so it's impossible for anybody else to be trying to use the list yet. In the latter it's not necessary since the semantics are that we create a job with vnc_job_new, fill it up with rectangles and then hand it off to the queue with vnc_job_push(). So we know only one thread will be touching the job before it goes in the queue and grabbing the queue lock is unnecessary overhead. (7) vnc_has_job() is unused, which is just as well because it's pretty hard to use non-racily, since the true/false answer it returns could be made wrong as soon as it drops the queue lock. vnc_has_job_locked() is OK (as is its usage pattern in _join). -- PMM
diff --git a/ui/vnc-jobs.c b/ui/vnc-jobs.c index 087b84d..c5ce2f1 100644 --- a/ui/vnc-jobs.c +++ b/ui/vnc-jobs.c @@ -136,19 +136,6 @@ bool vnc_has_job(VncState *vs) return ret; } -void vnc_jobs_clear(VncState *vs) -{ - VncJob *job, *tmp; - - vnc_lock_queue(queue); - QTAILQ_FOREACH_SAFE(job, &queue->jobs, next, tmp) { - if (job->vs == vs || !vs) { - QTAILQ_REMOVE(&queue->jobs, job, next); - } - } - vnc_unlock_queue(queue); -} - void vnc_jobs_join(VncState *vs) { vnc_lock_queue(queue); @@ -336,16 +323,3 @@ bool vnc_worker_thread_running(void) { return queue; /* Check global queue */ } - -void vnc_stop_worker_thread(void) -{ - if (!vnc_worker_thread_running()) - return ; - - /* Remove all jobs and wake up the thread */ - vnc_lock_queue(queue); - queue->exit = true; - vnc_unlock_queue(queue); - vnc_jobs_clear(NULL); - qemu_cond_broadcast(&queue->cond); -} diff --git a/ui/vnc-jobs.h b/ui/vnc-jobs.h index 86e6d88..b87857f 100644 --- a/ui/vnc-jobs.h +++ b/ui/vnc-jobs.h @@ -35,13 +35,11 @@ VncJob *vnc_job_new(VncState *vs); int vnc_job_add_rect(VncJob *job, int x, int y, int w, int h); void vnc_job_push(VncJob *job); bool vnc_has_job(VncState *vs); -void vnc_jobs_clear(VncState *vs); void vnc_jobs_join(VncState *vs); void vnc_jobs_consume_buffer(VncState *vs); void vnc_start_worker_thread(void); bool vnc_worker_thread_running(void); -void vnc_stop_worker_thread(void); /* Locks */ static inline int vnc_trylock_display(VncDisplay *vd)
The function vnc_stop_worker_thread() is buggy, beacuse it tries to delete jobs from the worker thread's queue but the worker thread itself will not cope with this happening (it would end up trying to remove an already-removed list item from its queue list). Fortunately nobody ever calls vnc_stop_worker_thread(), so we can fix this by simply deleting all the untested racy code. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- Seems the easiest way to deal with this bug spotted via code inspection :-) ui/vnc-jobs.c | 26 -------------------------- ui/vnc-jobs.h | 2 -- 2 files changed, 28 deletions(-)