Message ID | 20250513000604.1396-1-mhklinux@outlook.com |
---|---|
Headers | show |
Series | hv_netvsc: Fix error "nvsp_rndis_pkt_complete error status: 2" | expand |
On Mon, May 12, 2025 at 05:06:01PM -0700, mhkelley58@gmail.com wrote: > From: Michael Kelley <mhklinux@outlook.com> > > netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus > messages. This function creates a series of GPA ranges, each of which > contains a single PFN. However, if the rndis header in the VMBus > message crosses a page boundary, the netvsc protocol with the host > requires that both PFNs for the rndis header must be in a single "GPA > range" data structure, which isn't possible with > vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a > new function netvsc_build_mpb_array() to build a VMBus message with > multiple GPA ranges, each of which may contain multiple PFNs. Use > vmbus_sendpacket_mpb_desc() to send this VMBus message to the host. > > There's no functional change since higher levels of netvsc don't > maintain or propagate knowledge of contiguous PFNs. Based on its > input, netvsc_build_mpb_array() still produces a separate GPA range > for each PFN and the behavior is the same as with > vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a > subsequent patch to provide the necessary grouping. > > Cc: <stable@vger.kernel.org> # 6.1.x > Signed-off-by: Michael Kelley <mhklinux@outlook.com> > --- > drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++---- > 1 file changed, 45 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > index d6f5b9ea3109..6d1705f87682 100644 > --- a/drivers/net/hyperv/netvsc.c > +++ b/drivers/net/hyperv/netvsc.c > @@ -1055,6 +1055,42 @@ static int netvsc_dma_map(struct hv_device *hv_dev, > return 0; > } > > +/* Build an "array" of mpb entries describing the data to be transferred > + * over VMBus. After the desc header fields, each "array" entry is variable > + * size, and each entry starts after the end of the previous entry. The > + * "offset" and "len" fields for each entry imply the size of the entry. > + * > + * The pfns are in HV_HYP_PAGE_SIZE, because all communication with Hyper-V > + * uses that granularity, even if the system page size of the guest is larger. > + * Each entry in the input "pb" array must describe a contiguous range of > + * guest physical memory so that the pfns are sequential if the range crosses > + * a page boundary. The offset field must be < HV_HYP_PAGE_SIZE. Hi Michael, Is there a guarantee that this constraint is met. And moreover, is there a guarantee that all of the entries will fit in desc? I am slightly concerned that there may be an overrun lurking here. ...
From: Simon Horman <horms@kernel.org> Sent: Wednesday, May 14, 2025 2:38 AM > > On Mon, May 12, 2025 at 05:06:01PM -0700, mhkelley58@gmail.com wrote: > > From: Michael Kelley <mhklinux@outlook.com> > > > > netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus > > messages. This function creates a series of GPA ranges, each of which > > contains a single PFN. However, if the rndis header in the VMBus > > message crosses a page boundary, the netvsc protocol with the host > > requires that both PFNs for the rndis header must be in a single "GPA > > range" data structure, which isn't possible with > > vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a > > new function netvsc_build_mpb_array() to build a VMBus message with > > multiple GPA ranges, each of which may contain multiple PFNs. Use > > vmbus_sendpacket_mpb_desc() to send this VMBus message to the host. > > > > There's no functional change since higher levels of netvsc don't > > maintain or propagate knowledge of contiguous PFNs. Based on its > > input, netvsc_build_mpb_array() still produces a separate GPA range > > for each PFN and the behavior is the same as with > > vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a > > subsequent patch to provide the necessary grouping. > > > > Cc: <stable@vger.kernel.org> # 6.1.x > > Signed-off-by: Michael Kelley <mhklinux@outlook.com> > > --- > > drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++---- > > 1 file changed, 45 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > > index d6f5b9ea3109..6d1705f87682 100644 > > --- a/drivers/net/hyperv/netvsc.c > > +++ b/drivers/net/hyperv/netvsc.c > > @@ -1055,6 +1055,42 @@ static int netvsc_dma_map(struct hv_device *hv_dev, > > return 0; > > } > > > > +/* Build an "array" of mpb entries describing the data to be transferred > > + * over VMBus. After the desc header fields, each "array" entry is variable > > + * size, and each entry starts after the end of the previous entry. The > > + * "offset" and "len" fields for each entry imply the size of the entry. > > + * > > + * The pfns are in HV_HYP_PAGE_SIZE, because all communication with Hyper-V > > + * uses that granularity, even if the system page size of the guest is larger. > > + * Each entry in the input "pb" array must describe a contiguous range of > > + * guest physical memory so that the pfns are sequential if the range crosses > > + * a page boundary. The offset field must be < HV_HYP_PAGE_SIZE. > > Hi Michael, > > Is there a guarantee that this constraint is met. And moreover, is there a > guarantee that all of the entries will fit in desc? I am slightly concerned > that there may be an overrun lurking here. > It is indeed up to the caller to ensure that the pb array is properly constructed. netvsc_build_mpb_array() doesn't do additional validation. There are only two sources of the pb array, both of which do the right thing, so additional validation seemed redundant. An overrun is a concern, but again the callers do the right thing. As described in my response to Patch 3 of the series, netvsc_xmit() counts the number of pages ahead of time, and makes sure the count is within the limit of the amount space allocated in the "desc" argument to netvsc_build_mpb_array(). Michael
Hello: This series was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Mon, 12 May 2025 17:05:59 -0700 you wrote: > From: Michael Kelley <mhklinux@outlook.com> > > Starting with commit dca5161f9bd0 in the 6.3 kernel, the Linux driver > for Hyper-V synthetic networking (netvsc) occasionally reports > "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates > that Hyper-V has rejected a network packet transmit request from the > guest, and the outgoing network packet is dropped. Higher level > network protocols presumably recover and resend the packet so there is > no functional error, but performance is slightly impacted. Commit > dca5161f9bd0 is not the cause of the error -- it only added reporting > of an error that was already happening without any notice. The error > has presumably been present since the netvsc driver was originally > introduced into Linux. > > [...] Here is the summary with links: - [net,1/5] Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges https://git.kernel.org/netdev/net/c/380b75d30786 - [net,2/5] hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages https://git.kernel.org/netdev/net/c/4f98616b855c - [net,3/5] hv_netvsc: Preserve contiguous PFN grouping in the page buffer array https://git.kernel.org/netdev/net/c/41a6328b2c55 - [net,4/5] hv_netvsc: Remove rmsg_pgcnt https://git.kernel.org/netdev/net/c/5bbc644bbf4e - [net,5/5] Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer() https://git.kernel.org/netdev/net/c/45a442fe369e You are awesome, thank you!
On Wed, May 14, 2025 at 03:44:35PM +0000, Michael Kelley wrote: > From: Simon Horman <horms@kernel.org> Sent: Wednesday, May 14, 2025 2:38 AM > > > > On Mon, May 12, 2025 at 05:06:01PM -0700, mhkelley58@gmail.com wrote: > > > From: Michael Kelley <mhklinux@outlook.com> > > > > > > netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus > > > messages. This function creates a series of GPA ranges, each of which > > > contains a single PFN. However, if the rndis header in the VMBus > > > message crosses a page boundary, the netvsc protocol with the host > > > requires that both PFNs for the rndis header must be in a single "GPA > > > range" data structure, which isn't possible with > > > vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a > > > new function netvsc_build_mpb_array() to build a VMBus message with > > > multiple GPA ranges, each of which may contain multiple PFNs. Use > > > vmbus_sendpacket_mpb_desc() to send this VMBus message to the host. > > > > > > There's no functional change since higher levels of netvsc don't > > > maintain or propagate knowledge of contiguous PFNs. Based on its > > > input, netvsc_build_mpb_array() still produces a separate GPA range > > > for each PFN and the behavior is the same as with > > > vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a > > > subsequent patch to provide the necessary grouping. > > > > > > Cc: <stable@vger.kernel.org> # 6.1.x > > > Signed-off-by: Michael Kelley <mhklinux@outlook.com> > > > --- > > > drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++---- > > > 1 file changed, 45 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > > > index d6f5b9ea3109..6d1705f87682 100644 > > > --- a/drivers/net/hyperv/netvsc.c > > > +++ b/drivers/net/hyperv/netvsc.c > > > @@ -1055,6 +1055,42 @@ static int netvsc_dma_map(struct hv_device *hv_dev, > > > return 0; > > > } > > > > > > +/* Build an "array" of mpb entries describing the data to be transferred > > > + * over VMBus. After the desc header fields, each "array" entry is variable > > > + * size, and each entry starts after the end of the previous entry. The > > > + * "offset" and "len" fields for each entry imply the size of the entry. > > > + * > > > + * The pfns are in HV_HYP_PAGE_SIZE, because all communication with Hyper-V > > > + * uses that granularity, even if the system page size of the guest is larger. > > > + * Each entry in the input "pb" array must describe a contiguous range of > > > + * guest physical memory so that the pfns are sequential if the range crosses > > > + * a page boundary. The offset field must be < HV_HYP_PAGE_SIZE. > > > > Hi Michael, > > > > Is there a guarantee that this constraint is met. And moreover, is there a > > guarantee that all of the entries will fit in desc? I am slightly concerned > > that there may be an overrun lurking here. > > > > It is indeed up to the caller to ensure that the pb array is properly > constructed. netvsc_build_mpb_array() doesn't do additional validation. > There are only two sources of the pb array, both of which do the right > thing, so additional validation seemed redundant. > > An overrun is a concern, but again the callers do the right thing. As > described in my response to Patch 3 of the series, netvsc_xmit() > counts the number of pages ahead of time, and makes sure the count is > within the limit of the amount space allocated in the "desc" argument > to netvsc_build_mpb_array(). Thanks Michael, I agree that is entirely reasonable for callers to be responsible correctly constructing the pb array. And that it's not necessary to add validation to netvsc_build_mpb_array(). Also, based on the above, I'm satisfied that the callers are correctly constructing the pb array. With the above clarified in my mind I'm now happy with this patch. Reviewed-by: Simon Horman <horms@kernel.org>
On Mon, May 12, 2025 at 05:06:00PM -0700, mhkelley58@gmail.com wrote: > From: Michael Kelley <mhklinux@outlook.com> > > vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver > and is hardcoded to create a single GPA range. To allow it to also be > used by the netvsc driver to create multiple GPA ranges, no longer > hardcode as having a single GPA range. Allow the calling driver to > specify the rangecount in the supplied descriptor. > > Update the storvsc driver to reflect this new approach. > > Cc: <stable@vger.kernel.org> # 6.1.x > Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Simon Horman <horms@kernel.org>
On Mon, May 12, 2025 at 05:06:03PM -0700, mhkelley58@gmail.com wrote: > From: Michael Kelley <mhklinux@outlook.com> > > init_page_array() now always creates a single page buffer array entry > for the rndis message, even if the rndis message crosses a page > boundary. As such, the number of page buffer array entries used for > the rndis message must no longer be tracked -- it is always just 1. > Remove the rmsg_pgcnt field and use "1" where the value is needed. > > Cc: <stable@vger.kernel.org> # 6.1.x > Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Simon Horman <horms@kernel.org>
From: Michael Kelley <mhklinux@outlook.com> Starting with commit dca5161f9bd0 in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161f9bd0 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. This patch set fixes the root cause of the problem, which is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error in the case of the rndis header. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed30a10 in the 6.14 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in a ssh session that outputs a lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed30a10 are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. One fix is to check for adjacent PFNs in vmbus_sendpacket_pagebuffer() and just combine them into a single GPA range. Such a fix is very contained. But conceptually it is fixing the problem at the wrong level. So this patch set takes the broader approach of maintaining the already known grouping of contiguous PFNs at a higher level in the netvsc driver code, and propagating that grouping down to the creation of the VMBus message to send to Hyper-V. Maintaining the grouping fixes this problem, and has the added benefit of allowing netvsc_dma_map() to make fewer calls to dma_map_single() to do bounce buffering in CoCo VMs. Patch 1 is a preparatory change to allow vmbus_sendpacket_mpb_desc() to specify multiple GPA ranges. In current code vmbus_sendpacket_mpb_desc() is used only by the storvsc synthetic SCSI driver, and it always creates a single GPA range. Patch 2 updates the netvsc driver to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(). Because the higher levels of netvsc still don't group contiguous PFNs, this patch is functionally neutral. The VMBus message to Hyper-V still has many GPA ranges, each with a single PFN. But it lays the groundwork for the next patch. Patch 3 changes the higher levels of netvsc to preserve the already known grouping of contiguous PFNs. When the contiguous groupings are passed to vmbus_sendpacket_mpb_desc(), GPA ranges containing multiple PFNs are produced, as expected by Hyper-V. This is point at which the core problem is fixed. Patches 4 and 5 remove code that is no longer necessary after the previous patches. These changes provide a net reduction of about 65 lines of code, which is an added benefit. These changes have been tested in normal VMs, in SEV-SNP and TDX CoCo VMs, and in Dv6-series VMs where the netvsp implementation is in the OpenHCL paravisor instead of the Hyper-V host. These changes are built against kernel version 6.15-rc6. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 Michael Kelley (5): Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages hv_netvsc: Preserve contiguous PFN grouping in the page buffer array hv_netvsc: Remove rmsg_pgcnt Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer() drivers/hv/channel.c | 65 ++----------------------------- drivers/net/hyperv/hyperv_net.h | 13 ++++++- drivers/net/hyperv/netvsc.c | 57 ++++++++++++++++++++++----- drivers/net/hyperv/netvsc_drv.c | 62 +++++++---------------------- drivers/net/hyperv/rndis_filter.c | 24 +++--------- drivers/scsi/storvsc_drv.c | 1 + include/linux/hyperv.h | 7 ---- 7 files changed, 83 insertions(+), 146 deletions(-)