mbox series

[v5,00/10] crypto: qat - enable QAT GEN4 SRIOV VF live migration for QAT GEN4

Message ID 20240306135855.4123535-1-xin.zeng@intel.com
Headers show
Series crypto: qat - enable QAT GEN4 SRIOV VF live migration for QAT GEN4 | expand

Message

Xin Zeng March 6, 2024, 1:58 p.m. UTC
This set enables live migration for Intel QAT GEN4 SRIOV Virtual
Functions (VFs).
It is composed of 10 patches. Patch 1~6 refactor the original QAT PF
driver implementation which will be reused by the following patches.
Patch 7 introduces the logic to the QAT PF driver that allows to save
and restore the state of a bank (a QAT VF is a wrapper around banks) and
drain a ring pair. Patch 8 adds the QAT PF driver a set of interfaces to
allow to save and restore the state of a VF that will be called by the
module qat_vfio_pci which will be introduced in the last patch. Patch 9
implements the defined device interfaces. The last one adds a vfio pci
extension specific for QAT which intercepts the vfio device operations
for a QAT VF to allow live migration.

Here are the steps required to test the live migration of a QAT GEN4 VF:
1. Bind one or more QAT GEN4 VF devices to the module qat_vfio_pci.ko 
2. Assign the VFs to the virtual machine and enable device live
migration 
3. Run a workload using a QAT VF inside the VM, for example using qatlib
(https://github.com/intel/qatlib) 
4. Migrate the VM from the source node to a destination node

Changes in v5 since v4: https://lore.kernel.org/kvm/20240228143402.89219-9-xin.zeng@intel.com
-  Remove device ID recheck as no consensus has been reached yet (Kevin)
-  Add missing state PRE_COPY_P2P in precopy_iotcl (Kevin)
-  Rearrange the state transition flow for better readability (Kevin)
-  Remove unnecessary Reviewed-by in commit message (Kevin)

Changes in v4 since v3: https://lore.kernel.org/kvm/20240221155008.960369-11-xin.zeng@intel.com
-  Change the order of maintainer entry for QAT vfio pci driver in
   MAINTAINERS to make it alphabetical (Alex)
-  Put QAT VFIO PCI driver under vfio/pci directly instead of
   vfio/pci/intel (Alex)
-  Add id_table recheck during device probe (Alex)

Changes in v3 since v2: https://lore.kernel.org/kvm/20240220032052.66834-1-xin.zeng@intel.com
-  Use state_mutex directly instead of unnecessary deferred_reset mode
   (Jason)

Changes in v2 since v1: https://lore.kernel.org/all/20240201153337.4033490-1-xin.zeng@intel.com
-  Add VFIO_MIGRATION_PRE_COPY support (Alex)
-  Remove unnecessary module dependancy in Kconfig (Alex)
-  Use direct function calls instead of function pointers in qat vfio
   variant driver (Jason)
-  Address the comments including uncessary pointer check and kfree,
   missing lock and direct use of pci_iov_vf_id (Shameer)
-  Change CHECK_STAT macro to avoid repeat comparison (Kamlesh)

Changes in v1 since RFC: https://lore.kernel.org/all/20230630131304.64243-1-xin.zeng@intel.com
-  Address comments including the right module dependancy in Kconfig,
   source file name and module description (Alex)
-  Added PCI error handler and P2P state handler (Suggested by Kevin)
-  Refactor the state check duing loading ring state (Kevin) 
-  Fix missed call to vfio_put_device in the error case (Breet)
-  Migrate the shadow states in PF driver
-  Rebase on top of 6.8-rc1

Giovanni Cabiddu (2):
  crypto: qat - adf_get_etr_base() helper
  crypto: qat - relocate CSR access code

Siming Wan (3):
  crypto: qat - rename get_sla_arr_of_type()
  crypto: qat - expand CSR operations for QAT GEN4 devices
  crypto: qat - add bank save and restore flows

Xin Zeng (5):
  crypto: qat - relocate and rename 4xxx PF2VM definitions
  crypto: qat - move PFVF compat checker to a function
  crypto: qat - add interface for live migration
  crypto: qat - implement interface for live migration
  vfio/qat: Add vfio_pci driver for Intel QAT VF devices

 MAINTAINERS                                   |    8 +
 .../intel/qat/qat_420xx/adf_420xx_hw_data.c   |    3 +
 .../intel/qat/qat_4xxx/adf_4xxx_hw_data.c     |    5 +
 .../intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c   |    1 +
 .../qat/qat_c3xxxvf/adf_c3xxxvf_hw_data.c     |    1 +
 .../intel/qat/qat_c62x/adf_c62x_hw_data.c     |    1 +
 .../intel/qat/qat_c62xvf/adf_c62xvf_hw_data.c |    1 +
 drivers/crypto/intel/qat/qat_common/Makefile  |    6 +-
 .../intel/qat/qat_common/adf_accel_devices.h  |   88 ++
 .../intel/qat/qat_common/adf_common_drv.h     |   10 +
 .../qat/qat_common/adf_gen2_hw_csr_data.c     |  101 ++
 .../qat/qat_common/adf_gen2_hw_csr_data.h     |   86 ++
 .../intel/qat/qat_common/adf_gen2_hw_data.c   |   97 --
 .../intel/qat/qat_common/adf_gen2_hw_data.h   |   76 --
 .../qat/qat_common/adf_gen4_hw_csr_data.c     |  231 ++++
 .../qat/qat_common/adf_gen4_hw_csr_data.h     |  188 +++
 .../intel/qat/qat_common/adf_gen4_hw_data.c   |  380 +++++--
 .../intel/qat/qat_common/adf_gen4_hw_data.h   |  127 +--
 .../intel/qat/qat_common/adf_gen4_pfvf.c      |    8 +-
 .../intel/qat/qat_common/adf_gen4_vf_mig.c    | 1010 +++++++++++++++++
 .../intel/qat/qat_common/adf_gen4_vf_mig.h    |   10 +
 .../intel/qat/qat_common/adf_mstate_mgr.c     |  318 ++++++
 .../intel/qat/qat_common/adf_mstate_mgr.h     |   89 ++
 .../intel/qat/qat_common/adf_pfvf_pf_proto.c  |    8 +-
 .../intel/qat/qat_common/adf_pfvf_utils.h     |   11 +
 drivers/crypto/intel/qat/qat_common/adf_rl.c  |   10 +-
 drivers/crypto/intel/qat/qat_common/adf_rl.h  |    2 +
 .../crypto/intel/qat/qat_common/adf_sriov.c   |    7 +-
 .../intel/qat/qat_common/adf_transport.c      |    4 +-
 .../crypto/intel/qat/qat_common/qat_mig_dev.c |  130 +++
 .../qat/qat_dh895xcc/adf_dh895xcc_hw_data.c   |    1 +
 .../qat_dh895xccvf/adf_dh895xccvf_hw_data.c   |    1 +
 drivers/vfio/pci/Kconfig                      |    2 +
 drivers/vfio/pci/Makefile                     |    2 +
 drivers/vfio/pci/qat/Kconfig                  |   12 +
 drivers/vfio/pci/qat/Makefile                 |    3 +
 drivers/vfio/pci/qat/main.c                   |  662 +++++++++++
 include/linux/qat/qat_mig_dev.h               |   31 +
 38 files changed, 3344 insertions(+), 387 deletions(-)
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen2_hw_csr_data.c
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen2_hw_csr_data.h
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.c
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.h
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_vf_mig.c
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_vf_mig.h
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_mstate_mgr.c
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_mstate_mgr.h
 create mode 100644 drivers/crypto/intel/qat/qat_common/qat_mig_dev.c
 create mode 100644 drivers/vfio/pci/qat/Kconfig
 create mode 100644 drivers/vfio/pci/qat/Makefile
 create mode 100644 drivers/vfio/pci/qat/main.c
 create mode 100644 include/linux/qat/qat_mig_dev.h


base-commit: 318407ed77e4140d02e43a001b1f4753e3ce6b5f

Comments

Alex Williamson March 28, 2024, 3:03 p.m. UTC | #1
On Thu, 28 Mar 2024 18:51:41 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Wed, Mar 06, 2024 at 09:58:45PM +0800, Xin Zeng wrote:
> > This set enables live migration for Intel QAT GEN4 SRIOV Virtual
> > Functions (VFs).
> > It is composed of 10 patches. Patch 1~6 refactor the original QAT PF
> > driver implementation which will be reused by the following patches.
> > Patch 7 introduces the logic to the QAT PF driver that allows to save
> > and restore the state of a bank (a QAT VF is a wrapper around banks) and
> > drain a ring pair. Patch 8 adds the QAT PF driver a set of interfaces to
> > allow to save and restore the state of a VF that will be called by the
> > module qat_vfio_pci which will be introduced in the last patch. Patch 9
> > implements the defined device interfaces. The last one adds a vfio pci
> > extension specific for QAT which intercepts the vfio device operations
> > for a QAT VF to allow live migration.
> > 
> > Here are the steps required to test the live migration of a QAT GEN4 VF:
> > 1. Bind one or more QAT GEN4 VF devices to the module qat_vfio_pci.ko 
> > 2. Assign the VFs to the virtual machine and enable device live
> > migration 
> > 3. Run a workload using a QAT VF inside the VM, for example using qatlib
> > (https://github.com/intel/qatlib) 
> > 4. Migrate the VM from the source node to a destination node
> > 
> > Changes in v5 since v4: https://lore.kernel.org/kvm/20240228143402.89219-9-xin.zeng@intel.com
> > -  Remove device ID recheck as no consensus has been reached yet (Kevin)
> > -  Add missing state PRE_COPY_P2P in precopy_iotcl (Kevin)
> > -  Rearrange the state transition flow for better readability (Kevin)
> > -  Remove unnecessary Reviewed-by in commit message (Kevin)
> > 
> > Changes in v4 since v3: https://lore.kernel.org/kvm/20240221155008.960369-11-xin.zeng@intel.com
> > -  Change the order of maintainer entry for QAT vfio pci driver in
> >    MAINTAINERS to make it alphabetical (Alex)
> > -  Put QAT VFIO PCI driver under vfio/pci directly instead of
> >    vfio/pci/intel (Alex)
> > -  Add id_table recheck during device probe (Alex)
> > 
> > Changes in v3 since v2: https://lore.kernel.org/kvm/20240220032052.66834-1-xin.zeng@intel.com
> > -  Use state_mutex directly instead of unnecessary deferred_reset mode
> >    (Jason)
> > 
> > Changes in v2 since v1: https://lore.kernel.org/all/20240201153337.4033490-1-xin.zeng@intel.com
> > -  Add VFIO_MIGRATION_PRE_COPY support (Alex)
> > -  Remove unnecessary module dependancy in Kconfig (Alex)
> > -  Use direct function calls instead of function pointers in qat vfio
> >    variant driver (Jason)
> > -  Address the comments including uncessary pointer check and kfree,
> >    missing lock and direct use of pci_iov_vf_id (Shameer)
> > -  Change CHECK_STAT macro to avoid repeat comparison (Kamlesh)
> > 
> > Changes in v1 since RFC: https://lore.kernel.org/all/20230630131304.64243-1-xin.zeng@intel.com
> > -  Address comments including the right module dependancy in Kconfig,
> >    source file name and module description (Alex)
> > -  Added PCI error handler and P2P state handler (Suggested by Kevin)
> > -  Refactor the state check duing loading ring state (Kevin) 
> > -  Fix missed call to vfio_put_device in the error case (Breet)
> > -  Migrate the shadow states in PF driver
> > -  Rebase on top of 6.8-rc1
> > 
> > Giovanni Cabiddu (2):
> >   crypto: qat - adf_get_etr_base() helper
> >   crypto: qat - relocate CSR access code
> > 
> > Siming Wan (3):
> >   crypto: qat - rename get_sla_arr_of_type()
> >   crypto: qat - expand CSR operations for QAT GEN4 devices
> >   crypto: qat - add bank save and restore flows
> > 
> > Xin Zeng (5):
> >   crypto: qat - relocate and rename 4xxx PF2VM definitions
> >   crypto: qat - move PFVF compat checker to a function
> >   crypto: qat - add interface for live migration
> >   crypto: qat - implement interface for live migration
> >   vfio/qat: Add vfio_pci driver for Intel QAT VF devices
> > 
> >  MAINTAINERS                                   |    8 +
> >  .../intel/qat/qat_420xx/adf_420xx_hw_data.c   |    3 +
> >  .../intel/qat/qat_4xxx/adf_4xxx_hw_data.c     |    5 +
> >  .../intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c   |    1 +
> >  .../qat/qat_c3xxxvf/adf_c3xxxvf_hw_data.c     |    1 +
> >  .../intel/qat/qat_c62x/adf_c62x_hw_data.c     |    1 +
> >  .../intel/qat/qat_c62xvf/adf_c62xvf_hw_data.c |    1 +
> >  drivers/crypto/intel/qat/qat_common/Makefile  |    6 +-
> >  .../intel/qat/qat_common/adf_accel_devices.h  |   88 ++
> >  .../intel/qat/qat_common/adf_common_drv.h     |   10 +
> >  .../qat/qat_common/adf_gen2_hw_csr_data.c     |  101 ++
> >  .../qat/qat_common/adf_gen2_hw_csr_data.h     |   86 ++
> >  .../intel/qat/qat_common/adf_gen2_hw_data.c   |   97 --
> >  .../intel/qat/qat_common/adf_gen2_hw_data.h   |   76 --
> >  .../qat/qat_common/adf_gen4_hw_csr_data.c     |  231 ++++
> >  .../qat/qat_common/adf_gen4_hw_csr_data.h     |  188 +++
> >  .../intel/qat/qat_common/adf_gen4_hw_data.c   |  380 +++++--
> >  .../intel/qat/qat_common/adf_gen4_hw_data.h   |  127 +--
> >  .../intel/qat/qat_common/adf_gen4_pfvf.c      |    8 +-
> >  .../intel/qat/qat_common/adf_gen4_vf_mig.c    | 1010 +++++++++++++++++
> >  .../intel/qat/qat_common/adf_gen4_vf_mig.h    |   10 +
> >  .../intel/qat/qat_common/adf_mstate_mgr.c     |  318 ++++++
> >  .../intel/qat/qat_common/adf_mstate_mgr.h     |   89 ++
> >  .../intel/qat/qat_common/adf_pfvf_pf_proto.c  |    8 +-
> >  .../intel/qat/qat_common/adf_pfvf_utils.h     |   11 +
> >  drivers/crypto/intel/qat/qat_common/adf_rl.c  |   10 +-
> >  drivers/crypto/intel/qat/qat_common/adf_rl.h  |    2 +
> >  .../crypto/intel/qat/qat_common/adf_sriov.c   |    7 +-
> >  .../intel/qat/qat_common/adf_transport.c      |    4 +-
> >  .../crypto/intel/qat/qat_common/qat_mig_dev.c |  130 +++
> >  .../qat/qat_dh895xcc/adf_dh895xcc_hw_data.c   |    1 +
> >  .../qat_dh895xccvf/adf_dh895xccvf_hw_data.c   |    1 +
> >  drivers/vfio/pci/Kconfig                      |    2 +
> >  drivers/vfio/pci/Makefile                     |    2 +
> >  drivers/vfio/pci/qat/Kconfig                  |   12 +
> >  drivers/vfio/pci/qat/Makefile                 |    3 +
> >  drivers/vfio/pci/qat/main.c                   |  662 +++++++++++
> >  include/linux/qat/qat_mig_dev.h               |   31 +
> >  38 files changed, 3344 insertions(+), 387 deletions(-)
> >  create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen2_hw_csr_data.c
> >  create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen2_hw_csr_data.h
> >  create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.c
> >  create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.h
> >  create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_vf_mig.c
> >  create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_vf_mig.h
> >  create mode 100644 drivers/crypto/intel/qat/qat_common/adf_mstate_mgr.c
> >  create mode 100644 drivers/crypto/intel/qat/qat_common/adf_mstate_mgr.h
> >  create mode 100644 drivers/crypto/intel/qat/qat_common/qat_mig_dev.c
> >  create mode 100644 drivers/vfio/pci/qat/Kconfig
> >  create mode 100644 drivers/vfio/pci/qat/Makefile
> >  create mode 100644 drivers/vfio/pci/qat/main.c
> >  create mode 100644 include/linux/qat/qat_mig_dev.h
> > 
> > 
> > base-commit: 318407ed77e4140d02e43a001b1f4753e3ce6b5f
> > -- 
> > 2.18.2  
> 
> Patches 1-9 applied.  Thanks.

Hi Herbert,

Would you mind making a branch available for those in anticipation of
the qat vfio variant driver itself being merged through the vfio tree?
Thanks,

Alex
Herbert Xu April 2, 2024, 2:52 a.m. UTC | #2
On Thu, Mar 28, 2024 at 09:03:49AM -0600, Alex Williamson wrote:
>
> Would you mind making a branch available for those in anticipation of
> the qat vfio variant driver itself being merged through the vfio tree?
> Thanks,

OK, I've just pushed out a vfio branch.  Please take a look to
see if I messed anything up.

Cheers,
Cabiddu, Giovanni April 12, 2024, 2:19 p.m. UTC | #3
Hi Alex,

On Tue, Apr 02, 2024 at 10:52:06AM +0800, Herbert Xu wrote:
> On Thu, Mar 28, 2024 at 09:03:49AM -0600, Alex Williamson wrote:
> >
> > Would you mind making a branch available for those in anticipation of
> > the qat vfio variant driver itself being merged through the vfio tree?
> > Thanks,
> 
> OK, I've just pushed out a vfio branch.  Please take a look to
> see if I messed anything up.
What are the next steps here?

Shall we re-send the patch `vfio/qat: Add vfio_pci driver for Intel QAT
VF devices` rebased against vfio-next?
Or, wait for you to merge the branch from Herbert, then rebase and re-send?
Or, are you going to take the patch that was sent to the mailing list as is
and handle the rebase? (There is only a small conflict to sort on the
makefiles).

Thanks,
Alex Williamson April 12, 2024, 10:59 p.m. UTC | #4
On Fri, 12 Apr 2024 15:19:14 +0100
"Cabiddu, Giovanni" <giovanni.cabiddu@intel.com> wrote:

> Hi Alex,
> 
> On Tue, Apr 02, 2024 at 10:52:06AM +0800, Herbert Xu wrote:
> > On Thu, Mar 28, 2024 at 09:03:49AM -0600, Alex Williamson wrote:  
> > >
> > > Would you mind making a branch available for those in anticipation of
> > > the qat vfio variant driver itself being merged through the vfio tree?
> > > Thanks,  
> > 
> > OK, I've just pushed out a vfio branch.  Please take a look to
> > see if I messed anything up.  
> What are the next steps here?
> 
> Shall we re-send the patch `vfio/qat: Add vfio_pci driver for Intel QAT
> VF devices` rebased against vfio-next?
> Or, wait for you to merge the branch from Herbert, then rebase and re-send?
> Or, are you going to take the patch that was sent to the mailing list as is
> and handle the rebase? (There is only a small conflict to sort on the
> makefiles).

Hi Giovanni,

The code itself looks fine to me, the Makefile conflict is trivial,
MAINTAINERS also requires a trivial re-ordering to keep it alphabetical
now that virtio-vfio-pci is merged.  The only thing I spot that could
use some attention is the documentation, where our acceptance criteria
requests:

  Additionally, drivers should make an attempt to provide sufficient
  documentation for reviewers to understand the device specific
  extensions, for example in the case of migration data, how is the
  device state composed and consumed, which portions are not otherwise
  available to the user via vfio-pci, what safeguards exist to validate
  the data, etc.

A lot of the code here is very similar in flow to the other migration
drivers, but I think it would be good to address some of the topics
above in comments throughout the driver.  For example, how does the
driver address P2P states, what information is provided in PRE_COPY,
how is versioning handled, is user sensitive data included in the
device migration data, typical ranges of device migration data size,
etc.

Kevin might have an edge in understanding the theory of operation
here already and documenting the interesting aspects of the driver in
comments might drive a little more engagement.  Thanks,

Alex