Message ID | 20220215122316.7625-4-xiubli@redhat.com |
---|---|
State | New |
Headers | show |
Series | ceph: fix cephfs rsync kworker high load issue | expand |
On Thu, Feb 17, 2022 at 6:55 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Thu, 2022-02-17 at 11:03 +0800, Yan, Zheng wrote: > > On Tue, Feb 15, 2022 at 11:04 PM <xiubli@redhat.com> wrote: > > > > > > From: Xiubo Li <xiubli@redhat.com> > > > > > > No need to update snapshot context when any of the following two > > > cases happens: > > > 1: if my context seq matches realm's seq and realm has no parent. > > > 2: if my context seq equals or is larger than my parent's, this > > > works because we rebuild_snap_realms() works _downward_ in > > > hierarchy after each update. > > > > > > This fix will avoid those inodes which accidently calling > > > ceph_queue_cap_snap() and make no sense, for exmaple: > > > > > > There have 6 directories like: > > > > > > /dir_X1/dir_X2/dir_X3/ > > > /dir_Y1/dir_Y2/dir_Y3/ > > > > > > Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then > > > make a root snapshot under /.snap/root_snap. And every time when > > > we make snapshots under /dir_Y1/..., the kclient will always try > > > to rebuild the snap context for snap_X2 realm and finally will > > > always try to queue cap snaps for dir_Y2 and dir_Y3, which makes > > > no sense. > > > > > > That's because the snap_X2's seq is 2 and root_snap's seq is 3. > > > So when creating a new snapshot under /dir_Y1/... the new seq > > > will be 4, and then the mds will send kclient a snapshot backtrace > > > in _downward_ in hierarchy: seqs 4, 3. Then in ceph_update_snap_trace() > > > it will always rebuild the from the last realm, that's the root_snap. > > > So later when rebuilding the snap context it will always rebuild > > > the snap_X2 realm and then try to queue cap snaps for all the inodes > > > related in snap_X2 realm, and we are seeing the logs like: > > > > > > "ceph: queue_cap_snap 00000000a42b796b nothing dirty|writing" > > > > > > URL: https://tracker.ceph.com/issues/44100 > > > Signed-off-by: Xiubo Li <xiubli@redhat.com> > > > --- > > > fs/ceph/snap.c | 16 +++++++++------- > > > 1 file changed, 9 insertions(+), 7 deletions(-) > > > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > > > index d075d3ce5f6d..1f24a5de81e7 100644 > > > --- a/fs/ceph/snap.c > > > +++ b/fs/ceph/snap.c > > > @@ -341,14 +341,16 @@ static int build_snap_context(struct ceph_snap_realm *realm, > > > num += parent->cached_context->num_snaps; > > > } > > > > > > - /* do i actually need to update? not if my context seq > > > - matches realm seq, and my parents' does to. (this works > > > - because we rebuild_snap_realms() works _downward_ in > > > - hierarchy after each update.) */ > > > + /* do i actually need to update? No need when any of the following > > > + * two cases: > > > + * #1: if my context seq matches realm's seq and realm has no parent. > > > + * #2: if my context seq equals or is larger than my parent's, this > > > + * works because we rebuild_snap_realms() works _downward_ in > > > + * hierarchy after each update. > > > + */ > > > if (realm->cached_context && > > > - realm->cached_context->seq == realm->seq && > > > - (!parent || > > > - realm->cached_context->seq >= parent->cached_context->seq)) { > > > + ((realm->cached_context->seq == realm->seq && !parent) || > > > + (parent && realm->cached_context->seq >= parent->cached_context->seq))) { > > > > With this change. When you mksnap on /dir_Y1/, its snap context keeps > > unchanged. In ceph_update_snap_trace, reset the 'invalidate' variable > > for each realm should fix this issue. > > > > This comment is terribly vague. "invalidate" is a local variable in that > function and isn't set on a per-realm basis. > > Could you suggest a patch on top of Xiubo's patch instead? > something like this (not tested) diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c index af502a8245f0..6ef41764008b 100644 --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -704,7 +704,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, __le64 *prior_parent_snaps; /* encoded */ struct ceph_snap_realm *realm = NULL; struct ceph_snap_realm *first_realm = NULL; - int invalidate = 0; + struct ceph_snap_realm *realm_to_inval = NULL; + int invalidate; int err = -ENOMEM; LIST_HEAD(dirty_realms); @@ -712,6 +713,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, dout("update_snap_trace deletion=%d\n", deletion); more: + invalidate = 0; ceph_decode_need(&p, e, sizeof(*ri), bad); ri = p; p += sizeof(*ri); @@ -774,8 +776,10 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, realm, invalidate, p, e); /* invalidate when we reach the _end_ (root) of the trace */ - if (invalidate && p >= e) - rebuild_snap_realms(realm, &dirty_realms); + if (invalidate) + realm_to_inval = realm; + if (realm_to_inval && p >= e) + rebuild_snap_realms(realm_to_inval, &dirty_realms); if (!first_realm) first_realm = realm; > > > > dout("build_snap_context %llx %p: %p seq %lld (%u snaps), > > > " (unchanged)\n", > > > realm->ino, realm, realm->cached_context, > > > -- > > > 2.27.0 > > > > > -- > Jeff Layton <jlayton@kernel.org>
On 2/17/22 11:28 PM, Yan, Zheng wrote: > On Thu, Feb 17, 2022 at 6:55 PM Jeff Layton <jlayton@kernel.org> wrote: >> On Thu, 2022-02-17 at 11:03 +0800, Yan, Zheng wrote: >>> On Tue, Feb 15, 2022 at 11:04 PM <xiubli@redhat.com> wrote: >>>> From: Xiubo Li <xiubli@redhat.com> >>>> >>>> No need to update snapshot context when any of the following two >>>> cases happens: >>>> 1: if my context seq matches realm's seq and realm has no parent. >>>> 2: if my context seq equals or is larger than my parent's, this >>>> works because we rebuild_snap_realms() works _downward_ in >>>> hierarchy after each update. >>>> >>>> This fix will avoid those inodes which accidently calling >>>> ceph_queue_cap_snap() and make no sense, for exmaple: >>>> >>>> There have 6 directories like: >>>> >>>> /dir_X1/dir_X2/dir_X3/ >>>> /dir_Y1/dir_Y2/dir_Y3/ >>>> >>>> Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then >>>> make a root snapshot under /.snap/root_snap. And every time when >>>> we make snapshots under /dir_Y1/..., the kclient will always try >>>> to rebuild the snap context for snap_X2 realm and finally will >>>> always try to queue cap snaps for dir_Y2 and dir_Y3, which makes >>>> no sense. >>>> >>>> That's because the snap_X2's seq is 2 and root_snap's seq is 3. >>>> So when creating a new snapshot under /dir_Y1/... the new seq >>>> will be 4, and then the mds will send kclient a snapshot backtrace >>>> in _downward_ in hierarchy: seqs 4, 3. Then in ceph_update_snap_trace() >>>> it will always rebuild the from the last realm, that's the root_snap. >>>> So later when rebuilding the snap context it will always rebuild >>>> the snap_X2 realm and then try to queue cap snaps for all the inodes >>>> related in snap_X2 realm, and we are seeing the logs like: >>>> >>>> "ceph: queue_cap_snap 00000000a42b796b nothing dirty|writing" >>>> >>>> URL: https://tracker.ceph.com/issues/44100 >>>> Signed-off-by: Xiubo Li <xiubli@redhat.com> >>>> --- >>>> fs/ceph/snap.c | 16 +++++++++------- >>>> 1 file changed, 9 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c >>>> index d075d3ce5f6d..1f24a5de81e7 100644 >>>> --- a/fs/ceph/snap.c >>>> +++ b/fs/ceph/snap.c >>>> @@ -341,14 +341,16 @@ static int build_snap_context(struct ceph_snap_realm *realm, >>>> num += parent->cached_context->num_snaps; >>>> } >>>> >>>> - /* do i actually need to update? not if my context seq >>>> - matches realm seq, and my parents' does to. (this works >>>> - because we rebuild_snap_realms() works _downward_ in >>>> - hierarchy after each update.) */ >>>> + /* do i actually need to update? No need when any of the following >>>> + * two cases: >>>> + * #1: if my context seq matches realm's seq and realm has no parent. >>>> + * #2: if my context seq equals or is larger than my parent's, this >>>> + * works because we rebuild_snap_realms() works _downward_ in >>>> + * hierarchy after each update. >>>> + */ >>>> if (realm->cached_context && >>>> - realm->cached_context->seq == realm->seq && >>>> - (!parent || >>>> - realm->cached_context->seq >= parent->cached_context->seq)) { >>>> + ((realm->cached_context->seq == realm->seq && !parent) || >>>> + (parent && realm->cached_context->seq >= parent->cached_context->seq))) { >>> With this change. When you mksnap on /dir_Y1/, its snap context keeps >>> unchanged. In ceph_update_snap_trace, reset the 'invalidate' variable >>> for each realm should fix this issue. >>> Thanks Zheng for your feedback. Yeah, there has one case this will happen. Your approach is simpler I will post a V2 for this. -- Xiubo >> This comment is terribly vague. "invalidate" is a local variable in that >> function and isn't set on a per-realm basis. >> >> Could you suggest a patch on top of Xiubo's patch instead? >> > something like this (not tested) > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > index af502a8245f0..6ef41764008b 100644 > --- a/fs/ceph/snap.c > +++ b/fs/ceph/snap.c > @@ -704,7 +704,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > __le64 *prior_parent_snaps; /* encoded */ > struct ceph_snap_realm *realm = NULL; > struct ceph_snap_realm *first_realm = NULL; > - int invalidate = 0; > + struct ceph_snap_realm *realm_to_inval = NULL; > + int invalidate; > int err = -ENOMEM; > LIST_HEAD(dirty_realms); > > @@ -712,6 +713,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > > dout("update_snap_trace deletion=%d\n", deletion); > more: > + invalidate = 0; > ceph_decode_need(&p, e, sizeof(*ri), bad); > ri = p; > p += sizeof(*ri); > @@ -774,8 +776,10 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > realm, invalidate, p, e); > > /* invalidate when we reach the _end_ (root) of the trace */ > - if (invalidate && p >= e) > - rebuild_snap_realms(realm, &dirty_realms); > + if (invalidate) > + realm_to_inval = realm; > + if (realm_to_inval && p >= e) > + rebuild_snap_realms(realm_to_inval, &dirty_realms); > > if (!first_realm) > first_realm = realm; > > > >>>> dout("build_snap_context %llx %p: %p seq %lld (%u snaps), >>>> " (unchanged)\n", >>>> realm->ino, realm, realm->cached_context, >>>> -- >>>> 2.27.0 >>>> >> -- >> Jeff Layton <jlayton@kernel.org>
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c index d075d3ce5f6d..1f24a5de81e7 100644 --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -341,14 +341,16 @@ static int build_snap_context(struct ceph_snap_realm *realm, num += parent->cached_context->num_snaps; } - /* do i actually need to update? not if my context seq - matches realm seq, and my parents' does to. (this works - because we rebuild_snap_realms() works _downward_ in - hierarchy after each update.) */ + /* do i actually need to update? No need when any of the following + * two cases: + * #1: if my context seq matches realm's seq and realm has no parent. + * #2: if my context seq equals or is larger than my parent's, this + * works because we rebuild_snap_realms() works _downward_ in + * hierarchy after each update. + */ if (realm->cached_context && - realm->cached_context->seq == realm->seq && - (!parent || - realm->cached_context->seq >= parent->cached_context->seq)) { + ((realm->cached_context->seq == realm->seq && !parent) || + (parent && realm->cached_context->seq >= parent->cached_context->seq))) { dout("build_snap_context %llx %p: %p seq %lld (%u snaps)" " (unchanged)\n", realm->ino, realm, realm->cached_context,