Message ID | 20211001173358.863017-2-jean-philippe@linaro.org |
---|---|
State | Superseded |
Headers | show |
Series | virtio-iommu: Add ACPI support | expand |
On Fri, 1 Oct 2021 18:33:49 +0100 Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > Add a function that generates a Virtual I/O Translation table (VIOT), > describing the topology of paravirtual IOMMUs. The table is created when > instantiating a virtio-iommu device. It contains a virtio-iommu node and perhaps s/when instantiating ... ./if a virtio-iommu device present/ > PCI Range nodes for endpoints managed by the IOMMU. By default, a single > node describes all PCI devices. When passing the "default_bus_bypass_iommu" > machine option and "bypass_iommu" PXB option, only buses that do not > bypass the IOMMU are described by PCI Range nodes. modulo comments, patch looks fine to me from ACPI point of view. but I don't know if values used for describing PCI structures make any sense so this might need an ACK from a person who knows PCI innards better. > Reviewed-by: Eric Auger <eric.auger@redhat.com> > Tested-by: Eric Auger <eric.auger@redhat.com> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > --- > hw/acpi/viot.h | 13 +++++ > hw/acpi/viot.c | 112 ++++++++++++++++++++++++++++++++++++++++++++ > hw/acpi/Kconfig | 4 ++ > hw/acpi/meson.build | 1 + > 4 files changed, 130 insertions(+) > create mode 100644 hw/acpi/viot.h > create mode 100644 hw/acpi/viot.c > > diff --git a/hw/acpi/viot.h b/hw/acpi/viot.h > new file mode 100644 > index 0000000000..9fe565bb87 > --- /dev/null > +++ b/hw/acpi/viot.h > @@ -0,0 +1,13 @@ > +/* > + * ACPI Virtual I/O Translation Table implementation > + * > + * SPDX-License-Identifier: GPL-2.0-or-later > + */ > +#ifndef VIOT_H > +#define VIOT_H > + > +void build_viot(MachineState *ms, GArray *table_data, BIOSLinker *linker, > + uint16_t virtio_iommu_bdf, const char *oem_id, > + const char *oem_table_id); > + > +#endif /* VIOT_H */ > diff --git a/hw/acpi/viot.c b/hw/acpi/viot.c > new file mode 100644 > index 0000000000..e33d468e11 > --- /dev/null > +++ b/hw/acpi/viot.c > @@ -0,0 +1,112 @@ > +/* > + * ACPI Virtual I/O Translation table implementation > + * > + * SPDX-License-Identifier: GPL-2.0-or-later > + */ > +#include "qemu/osdep.h" > +#include "hw/acpi/acpi.h" > +#include "hw/acpi/aml-build.h" > +#include "hw/acpi/viot.h" > +#include "hw/pci/pci.h" > +#include "hw/pci/pci_host.h" > + > +struct viot_pci_ranges { > + GArray *blob; > + size_t count; > + uint16_t output_node; > +}; > + > +/* Build PCI range for a given PCI host bridge */ > +static int build_pci_range_node(Object *obj, void *opaque) > +{ > + struct viot_pci_ranges *pci_ranges = opaque; > + GArray *blob = pci_ranges->blob; > + > + if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) { > + PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus; > + > + if (bus && !pci_bus_bypass_iommu(bus)) { > + int min_bus, max_bus; > + > + pci_bus_range(bus, &min_bus, &max_bus); > + > + /* Type (PCI range) */ see [1] below > + build_append_int_noprefix(blob, 1, 1); > + /* Reserved */ > + build_append_int_noprefix(blob, 0, 1); > + /* Length */ > + build_append_int_noprefix(blob, 24, 2); spec should be fixed to state length value for fixed length structures like it's done in ACPI specs, I who we should poke at to make this happen. zzzz > + /* Endpoint start */ > + build_append_int_noprefix(blob, PCI_BUILD_BDF(min_bus, 0), 4); > + /* PCI Segment start */ > + build_append_int_noprefix(blob, 0, 2); > + /* PCI Segment end */ > + build_append_int_noprefix(blob, 0, 2); zzzz see comment [2] > + /* PCI BDF start */ > + build_append_int_noprefix(blob, PCI_BUILD_BDF(min_bus, 0), 2); > + /* PCI BDF end */ > + build_append_int_noprefix(blob, PCI_BUILD_BDF(max_bus, 0xff), 2); > + /* Output node */ > + build_append_int_noprefix(blob, pci_ranges->output_node, 2); > + /* Reserved */ > + build_append_int_noprefix(blob, 0, 6); > + > + pci_ranges->count++; > + } > + } > + > + return 0; > +} > + > +/* > + * Generate a VIOT table with one PCI-based virtio-iommu that manages PCI > + * endpoints. > + */ this comment needs to state spec name/version, otherwise it's not clear what code below is based on (example: build_dmar_q35). Also since there is no final spec yet and spec doesn't have permanent hosting place (i.e. hosted by one of specs org), I'd consider link in cover letter 'dead' and not suitable for long term use. So we should shovel spec docs/specs and point to it in this comment > +void build_viot(MachineState *ms, GArray *table_data, BIOSLinker *linker, > + uint16_t virtio_iommu_bdf, const char *oem_id, > + const char *oem_table_id) > +{ > + /* The virtio-iommu node follows the 48-bytes header */ > + int viommu_off = 48; > + AcpiTable table = { .sig = "VIOT", .rev = 0, > + .oem_id = oem_id, .oem_table_id = oem_table_id }; > + struct viot_pci_ranges pci_ranges = { > + .output_node = viommu_off, > + .blob = g_array_new(false, true /* clear */, 1), > + }; > + > + /* Build the list of PCI ranges that this viommu manages */ > + object_child_foreach_recursive(OBJECT(ms), build_pci_range_node, > + &pci_ranges); > + > + /* ACPI table header */ > + acpi_table_begin(&table, table_data); > + /* Node count */ > + build_append_int_noprefix(table_data, pci_ranges.count + 1, 2); > + /* Node offset */ > + build_append_int_noprefix(table_data, viommu_off, 2); > + /* Reserved */ > + build_append_int_noprefix(table_data, 0, 8); > + > + /* Virtio-iommu node */ > + /* Type (virtio-pci IOMMU) */ (1) /* Type */ > + build_append_int_noprefix(table_data, 3, 1); s:3,:3 /* virtio-pci IOMMU */,: check-patch will spit out warning but that kind comment is common practice with ACPI code > + /* Reserved */ > + build_append_int_noprefix(table_data, 0, 1); > + /* Length */ > + build_append_int_noprefix(table_data, 16, 2); > + /* PCI Segment */ > + build_append_int_noprefix(table_data, 0, 2); (2) can we fetch _SEG value from device instead of hard-codding value here? I might be obvious to PCI folks, but it would be better have at least a comment explaining where these values come from Michael, what do you think? > + /* PCI BDF number */ > + build_append_int_noprefix(table_data, virtio_iommu_bdf, 2); > + /* Reserved */ > + build_append_int_noprefix(table_data, 0, 8); > + > + /* PCI ranges found above */ > + g_array_append_vals(table_data, pci_ranges.blob->data, > + pci_ranges.blob->len); > + g_array_free(pci_ranges.blob, true); > + > + acpi_table_end(linker, &table); > +} > + > diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig > index 3b5e118c54..622b0b50b7 100644 > --- a/hw/acpi/Kconfig > +++ b/hw/acpi/Kconfig > @@ -51,6 +51,10 @@ config ACPI_VMGENID > default y > depends on PC > > +config ACPI_VIOT > + bool > + depends on ACPI > + > config ACPI_HW_REDUCED > bool > select ACPI > diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build > index 7d8c0eb43e..adf6347bc4 100644 > --- a/hw/acpi/meson.build > +++ b/hw/acpi/meson.build > @@ -20,6 +20,7 @@ acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false: files( > acpi_ss.add(when: 'CONFIG_ACPI_PIIX4', if_true: files('piix4.c')) > acpi_ss.add(when: 'CONFIG_ACPI_PCIHP', if_true: files('pcihp.c')) > acpi_ss.add(when: 'CONFIG_ACPI_PCIHP', if_false: files('acpi-pci-hotplug-stub.c')) > +acpi_ss.add(when: 'CONFIG_ACPI_VIOT', if_true: files('viot.c')) > acpi_ss.add(when: 'CONFIG_ACPI_X86_ICH', if_true: files('ich9.c', 'tco.c')) > acpi_ss.add(when: 'CONFIG_IPMI', if_true: files('ipmi.c'), if_false: files('ipmi-stub.c')) > acpi_ss.add(when: 'CONFIG_PC', if_false: files('acpi-x86-stub.c'))
On Wed, Oct 06, 2021 at 10:09:50AM +0200, Igor Mammedov wrote: > On Fri, 1 Oct 2021 18:33:49 +0100 > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > > > Add a function that generates a Virtual I/O Translation table (VIOT), > > describing the topology of paravirtual IOMMUs. The table is created when > > instantiating a virtio-iommu device. It contains a virtio-iommu node and > > perhaps > s/when instantiating ... ./if a virtio-iommu device present/ > > > PCI Range nodes for endpoints managed by the IOMMU. By default, a single > > node describes all PCI devices. When passing the "default_bus_bypass_iommu" > > machine option and "bypass_iommu" PXB option, only buses that do not > > bypass the IOMMU are described by PCI Range nodes. > > > modulo comments, patch looks fine to me from ACPI point of view. > > but I don't know if values used for describing PCI structures > make any sense so this might need an ACK from a person who knows > PCI innards better. For what it's worth I mainly looked at other similar tables (IORT, DMAR and IVRS) to figure out what values I should use [...] > > +static int build_pci_range_node(Object *obj, void *opaque) > > +{ > > + struct viot_pci_ranges *pci_ranges = opaque; > > + GArray *blob = pci_ranges->blob; > > + > > + if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) { > > + PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus; > > + > > + if (bus && !pci_bus_bypass_iommu(bus)) { > > + int min_bus, max_bus; > > + > > + pci_bus_range(bus, &min_bus, &max_bus); > > + > > + /* Type (PCI range) */ > see [1] below > > > + build_append_int_noprefix(blob, 1, 1); > > + /* Reserved */ > > + build_append_int_noprefix(blob, 0, 1); > > + /* Length */ > > + build_append_int_noprefix(blob, 24, 2); > > spec should be fixed to state length value for fixed length structures > like it's done in ACPI specs, I who we should poke at to make this happen. That doesn't seem to be applied rigorously. Several fixed-size structures don't state their sizes, for example "5.2.25.7 NVDIMM Block Data Window Region Structure", "5.2.25.9 Platform Capabilities Structure", "5.2.26.1.1 ACPI_NAMESPACE_DEVICE based Secure Device Structure". > > zzzz > > + /* Endpoint start */ > > + build_append_int_noprefix(blob, PCI_BUILD_BDF(min_bus, 0), 4); > > + /* PCI Segment start */ > > + build_append_int_noprefix(blob, 0, 2); > > + /* PCI Segment end */ > > + build_append_int_noprefix(blob, 0, 2); > zzzz > see comment [2] > > > + /* PCI BDF start */ > > + build_append_int_noprefix(blob, PCI_BUILD_BDF(min_bus, 0), 2); > > + /* PCI BDF end */ > > + build_append_int_noprefix(blob, PCI_BUILD_BDF(max_bus, 0xff), 2); > > + /* Output node */ > > + build_append_int_noprefix(blob, pci_ranges->output_node, 2); > > + /* Reserved */ > > + build_append_int_noprefix(blob, 0, 6); > > + > > + pci_ranges->count++; > > + } > > + } > > + > > + return 0; > > +} > > + > > +/* > > + * Generate a VIOT table with one PCI-based virtio-iommu that manages PCI > > + * endpoints. > > + */ > > this comment needs to state spec name/version, otherwise it's not clear > what code below is based on (example: build_dmar_q35). > > Also since there is no final spec yet and spec doesn't have permanent > hosting place (i.e. hosted by one of specs org), I'd consider > link in cover letter 'dead' and not suitable for long term use. Yes, I'll throw those documents out once the final spec is out > So we should shovel spec docs/specs and point to it in this comment I could write "Defined in the ACPI Specification (Version TBD)" For all I know it could be version 6.5 or 7.0... > > > +void build_viot(MachineState *ms, GArray *table_data, BIOSLinker *linker, > > + uint16_t virtio_iommu_bdf, const char *oem_id, > > + const char *oem_table_id) > > +{ > > + /* The virtio-iommu node follows the 48-bytes header */ > > + int viommu_off = 48; > > + AcpiTable table = { .sig = "VIOT", .rev = 0, > > + .oem_id = oem_id, .oem_table_id = oem_table_id }; > > + struct viot_pci_ranges pci_ranges = { > > + .output_node = viommu_off, > > + .blob = g_array_new(false, true /* clear */, 1), > > + }; > > + > > + /* Build the list of PCI ranges that this viommu manages */ > > + object_child_foreach_recursive(OBJECT(ms), build_pci_range_node, > > + &pci_ranges); > > + > > + /* ACPI table header */ > > + acpi_table_begin(&table, table_data); > > + /* Node count */ > > + build_append_int_noprefix(table_data, pci_ranges.count + 1, 2); > > + /* Node offset */ > > + build_append_int_noprefix(table_data, viommu_off, 2); > > + /* Reserved */ > > + build_append_int_noprefix(table_data, 0, 8); > > + > > + /* Virtio-iommu node */ > > + /* Type (virtio-pci IOMMU) */ > > (1) > /* Type */ > > + build_append_int_noprefix(table_data, 3, 1); > s:3,:3 /* virtio-pci IOMMU */,: > > check-patch will spit out warning but that kind comment > is common practice with ACPI code > > > + /* Reserved */ > > + build_append_int_noprefix(table_data, 0, 1); > > + /* Length */ > > + build_append_int_noprefix(table_data, 16, 2); > > + /* PCI Segment */ > > + build_append_int_noprefix(table_data, 0, 2); > (2) > can we fetch _SEG value from device instead of hard-codding value here? Looking for "segment" and "domain" I couldn't find any dynamic segment number, 0 seems to be hardcoded everywhere (hw/acpi/pci.c, hw/i386/acpi-build.c, hw/arm/virt.c, hw/arm/virt-acpi-build.c). > > I might be obvious to PCI folks, > but it would be better have at least a comment explaining > where these values come from I could add that "QEMU only implements segment 0" Thanks, Jean > > Michael, > what do you think? > > > + /* PCI BDF number */ > > + build_append_int_noprefix(table_data, virtio_iommu_bdf, 2); > > + /* Reserved */ > > + build_append_int_noprefix(table_data, 0, 8); > > + > > + /* PCI ranges found above */ > > + g_array_append_vals(table_data, pci_ranges.blob->data, > > + pci_ranges.blob->len); > > + g_array_free(pci_ranges.blob, true); > > + > > + acpi_table_end(linker, &table); > > +} > > + > > diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig > > index 3b5e118c54..622b0b50b7 100644 > > --- a/hw/acpi/Kconfig > > +++ b/hw/acpi/Kconfig > > @@ -51,6 +51,10 @@ config ACPI_VMGENID > > default y > > depends on PC > > > > +config ACPI_VIOT > > + bool > > + depends on ACPI > > + > > config ACPI_HW_REDUCED > > bool > > select ACPI > > diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build > > index 7d8c0eb43e..adf6347bc4 100644 > > --- a/hw/acpi/meson.build > > +++ b/hw/acpi/meson.build > > @@ -20,6 +20,7 @@ acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false: files( > > acpi_ss.add(when: 'CONFIG_ACPI_PIIX4', if_true: files('piix4.c')) > > acpi_ss.add(when: 'CONFIG_ACPI_PCIHP', if_true: files('pcihp.c')) > > acpi_ss.add(when: 'CONFIG_ACPI_PCIHP', if_false: files('acpi-pci-hotplug-stub.c')) > > +acpi_ss.add(when: 'CONFIG_ACPI_VIOT', if_true: files('viot.c')) > > acpi_ss.add(when: 'CONFIG_ACPI_X86_ICH', if_true: files('ich9.c', 'tco.c')) > > acpi_ss.add(when: 'CONFIG_IPMI', if_true: files('ipmi.c'), if_false: files('ipmi-stub.c')) > > acpi_ss.add(when: 'CONFIG_PC', if_false: files('acpi-x86-stub.c')) >
diff --git a/hw/acpi/viot.h b/hw/acpi/viot.h new file mode 100644 index 0000000000..9fe565bb87 --- /dev/null +++ b/hw/acpi/viot.h @@ -0,0 +1,13 @@ +/* + * ACPI Virtual I/O Translation Table implementation + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ +#ifndef VIOT_H +#define VIOT_H + +void build_viot(MachineState *ms, GArray *table_data, BIOSLinker *linker, + uint16_t virtio_iommu_bdf, const char *oem_id, + const char *oem_table_id); + +#endif /* VIOT_H */ diff --git a/hw/acpi/viot.c b/hw/acpi/viot.c new file mode 100644 index 0000000000..e33d468e11 --- /dev/null +++ b/hw/acpi/viot.c @@ -0,0 +1,112 @@ +/* + * ACPI Virtual I/O Translation table implementation + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ +#include "qemu/osdep.h" +#include "hw/acpi/acpi.h" +#include "hw/acpi/aml-build.h" +#include "hw/acpi/viot.h" +#include "hw/pci/pci.h" +#include "hw/pci/pci_host.h" + +struct viot_pci_ranges { + GArray *blob; + size_t count; + uint16_t output_node; +}; + +/* Build PCI range for a given PCI host bridge */ +static int build_pci_range_node(Object *obj, void *opaque) +{ + struct viot_pci_ranges *pci_ranges = opaque; + GArray *blob = pci_ranges->blob; + + if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) { + PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus; + + if (bus && !pci_bus_bypass_iommu(bus)) { + int min_bus, max_bus; + + pci_bus_range(bus, &min_bus, &max_bus); + + /* Type (PCI range) */ + build_append_int_noprefix(blob, 1, 1); + /* Reserved */ + build_append_int_noprefix(blob, 0, 1); + /* Length */ + build_append_int_noprefix(blob, 24, 2); + /* Endpoint start */ + build_append_int_noprefix(blob, PCI_BUILD_BDF(min_bus, 0), 4); + /* PCI Segment start */ + build_append_int_noprefix(blob, 0, 2); + /* PCI Segment end */ + build_append_int_noprefix(blob, 0, 2); + /* PCI BDF start */ + build_append_int_noprefix(blob, PCI_BUILD_BDF(min_bus, 0), 2); + /* PCI BDF end */ + build_append_int_noprefix(blob, PCI_BUILD_BDF(max_bus, 0xff), 2); + /* Output node */ + build_append_int_noprefix(blob, pci_ranges->output_node, 2); + /* Reserved */ + build_append_int_noprefix(blob, 0, 6); + + pci_ranges->count++; + } + } + + return 0; +} + +/* + * Generate a VIOT table with one PCI-based virtio-iommu that manages PCI + * endpoints. + */ +void build_viot(MachineState *ms, GArray *table_data, BIOSLinker *linker, + uint16_t virtio_iommu_bdf, const char *oem_id, + const char *oem_table_id) +{ + /* The virtio-iommu node follows the 48-bytes header */ + int viommu_off = 48; + AcpiTable table = { .sig = "VIOT", .rev = 0, + .oem_id = oem_id, .oem_table_id = oem_table_id }; + struct viot_pci_ranges pci_ranges = { + .output_node = viommu_off, + .blob = g_array_new(false, true /* clear */, 1), + }; + + /* Build the list of PCI ranges that this viommu manages */ + object_child_foreach_recursive(OBJECT(ms), build_pci_range_node, + &pci_ranges); + + /* ACPI table header */ + acpi_table_begin(&table, table_data); + /* Node count */ + build_append_int_noprefix(table_data, pci_ranges.count + 1, 2); + /* Node offset */ + build_append_int_noprefix(table_data, viommu_off, 2); + /* Reserved */ + build_append_int_noprefix(table_data, 0, 8); + + /* Virtio-iommu node */ + /* Type (virtio-pci IOMMU) */ + build_append_int_noprefix(table_data, 3, 1); + /* Reserved */ + build_append_int_noprefix(table_data, 0, 1); + /* Length */ + build_append_int_noprefix(table_data, 16, 2); + /* PCI Segment */ + build_append_int_noprefix(table_data, 0, 2); + /* PCI BDF number */ + build_append_int_noprefix(table_data, virtio_iommu_bdf, 2); + /* Reserved */ + build_append_int_noprefix(table_data, 0, 8); + + /* PCI ranges found above */ + g_array_append_vals(table_data, pci_ranges.blob->data, + pci_ranges.blob->len); + g_array_free(pci_ranges.blob, true); + + acpi_table_end(linker, &table); +} + diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig index 3b5e118c54..622b0b50b7 100644 --- a/hw/acpi/Kconfig +++ b/hw/acpi/Kconfig @@ -51,6 +51,10 @@ config ACPI_VMGENID default y depends on PC +config ACPI_VIOT + bool + depends on ACPI + config ACPI_HW_REDUCED bool select ACPI diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build index 7d8c0eb43e..adf6347bc4 100644 --- a/hw/acpi/meson.build +++ b/hw/acpi/meson.build @@ -20,6 +20,7 @@ acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false: files( acpi_ss.add(when: 'CONFIG_ACPI_PIIX4', if_true: files('piix4.c')) acpi_ss.add(when: 'CONFIG_ACPI_PCIHP', if_true: files('pcihp.c')) acpi_ss.add(when: 'CONFIG_ACPI_PCIHP', if_false: files('acpi-pci-hotplug-stub.c')) +acpi_ss.add(when: 'CONFIG_ACPI_VIOT', if_true: files('viot.c')) acpi_ss.add(when: 'CONFIG_ACPI_X86_ICH', if_true: files('ich9.c', 'tco.c')) acpi_ss.add(when: 'CONFIG_IPMI', if_true: files('ipmi.c'), if_false: files('ipmi-stub.c')) acpi_ss.add(when: 'CONFIG_PC', if_false: files('acpi-x86-stub.c'))