Message ID | 20230201101559.15529-1-johan+linaro@kernel.org |
---|---|
Headers | show |
Series | interconnect: fix racy provider registration | expand |
On 1.02.2023 11:15, Johan Hovold wrote: > The interconnect framework currently expects that providers are only > removed when there are no users and after all nodes have been removed. > > There is currently nothing that guarantees this to be the case and the > framework does not do any reference counting, but refusing to remove the > provider is never correct as that would leave a dangling pointer to a > resource that is about to be released in the global provider list (e.g. > accessible through debugfs). > > Replace the current sanity checks with WARN_ON() so that the provider is > always removed. I spent a considerable amount of time scratching my head what WARN_ON has to do with removing list items.. I suppose "don't return early and replace pr_warn with WARN_ON" would have been clearer, but maybe that's just me in the morning.. > > Fixes: 11f1ceca7031 ("interconnect: Add generic on-chip interconnect API") > Cc: stable@vger.kernel.org # 5.1: 680f8666baf6: interconnect: Make icc_provider_del() return void > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > --- Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Konrad > drivers/interconnect/core.c | 14 ++------------ > 1 file changed, 2 insertions(+), 12 deletions(-) > > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c > index dc61620a0191..43c5c8503ee8 100644 > --- a/drivers/interconnect/core.c > +++ b/drivers/interconnect/core.c > @@ -1062,18 +1062,8 @@ EXPORT_SYMBOL_GPL(icc_provider_add); > void icc_provider_del(struct icc_provider *provider) > { > mutex_lock(&icc_lock); > - if (provider->users) { > - pr_warn("interconnect provider still has %d users\n", > - provider->users); > - mutex_unlock(&icc_lock); > - return; > - } > - > - if (!list_empty(&provider->nodes)) { > - pr_warn("interconnect provider still has nodes\n"); > - mutex_unlock(&icc_lock); > - return; > - } > + WARN_ON(provider->users); > + WARN_ON(!list_empty(&provider->nodes)); > > list_del(&provider->provider_list); > mutex_unlock(&icc_lock);
On 01/02/2023 11:15, Johan Hovold wrote: > There is no longer any need to explicitly destroy node links as this is > now done when the node is destroyed as part of icc_nodes_remove(). > > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > --- > drivers/interconnect/samsung/exynos.c | 6 ------ > 1 file changed, 6 deletions(-) Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Best regards, Krzysztof
On 01/02/2023 11:15, Johan Hovold wrote: > The current interconnect provider interface is inherently racy as > providers are expected to be registered before being fully initialised. > > This can specifically cause racing DT lookups to fail as I recently > noticed when the Qualcomm cpufreq driver failed to probe: > > of_icc_xlate_onecell: invalid index 0 > cpu cpu0: error -EINVAL: error finding src node > cpu cpu0: dev_pm_opp_of_find_icc_paths: Unable to get path0: -22 > qcom-cpufreq-hw: probe of 18591000.cpufreq failed with error -22 > > This only happens very rarely, but the bug is easily reproduced by > increasing the race window by adding an msleep() after registering > osm-l3 interconnect provider. > > Note that the Qualcomm cpufreq driver is especially susceptible to this > race as the interconnect path is looked up from the CPU nodes so that > driver core does not guarantee the probe order even when device links > are enabled (which they not always are). > > This series adds a new interconnect provider registration API which is > used to fix up the interconnect drivers before removing the old racy > API. > So is there a dependency or not? Can you make it clear that I shouldn't take memory controller bits? Best regards, Krzysztof
On Thu, Feb 02, 2023 at 12:13:33PM +0100, Krzysztof Kozlowski wrote: > On 01/02/2023 11:15, Johan Hovold wrote: > > The current interconnect provider interface is inherently racy as > > providers are expected to be registered before being fully initialised. > > > > This can specifically cause racing DT lookups to fail as I recently > > noticed when the Qualcomm cpufreq driver failed to probe: > > > > of_icc_xlate_onecell: invalid index 0 > > cpu cpu0: error -EINVAL: error finding src node > > cpu cpu0: dev_pm_opp_of_find_icc_paths: Unable to get path0: -22 > > qcom-cpufreq-hw: probe of 18591000.cpufreq failed with error -22 > > > > This only happens very rarely, but the bug is easily reproduced by > > increasing the race window by adding an msleep() after registering > > osm-l3 interconnect provider. > > > > Note that the Qualcomm cpufreq driver is especially susceptible to this > > race as the interconnect path is looked up from the CPU nodes so that > > driver core does not guarantee the probe order even when device links > > are enabled (which they not always are). > > > > This series adds a new interconnect provider registration API which is > > used to fix up the interconnect drivers before removing the old racy > > API. > > > > So is there a dependency or not? Can you make it clear that I shouldn't > take memory controller bits? As the fixes depend on the new API it is best if these could all go through Georgi's tree. Johan
On 01/02/2023 11:15, Johan Hovold wrote: > The current interconnect provider registration interface is inherently > racy as nodes are not added until the after adding the provider. This > can specifically cause racing DT lookups to fail. > > Switch to using the new API where the provider is not registered until > after it has been fully initialised. > > Fixes: d5ef16ba5fbe ("memory: tegra20: Support interconnect framework") > Cc: stable@vger.kernel.org # 5.11 > Cc: Dmitry Osipenko <digetx@gmail.com> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > --- Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Best regards, Krzysztof
On Wed, Feb 01, 2023 at 11:15:47AM +0100, Johan Hovold wrote: > The current interconnect provider registration interface is inherently > racy as nodes are not added until the after adding the provider. This > can specifically cause racing DT lookups to fail. > > Switch to using the new API where the provider is not registered until > after it has been fully initialised. > > Fixes: 4e60a9568dc6 ("interconnect: qcom: add msm8974 driver") > Cc: stable@vger.kernel.org # 5.5 > Cc: Brian Masney <bmasney@redhat.com> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Brian Masney <bmasney@redhat.com>
On 1.02.2023 11:15, Johan Hovold wrote: > Make sure to disable clocks also in case attaching the power domain > fails. > > Fixes: 7de109c0abe9 ("interconnect: icc-rpm: Add support for bus power domain") > Cc: stable@vger.kernel.org # 5.17 > Cc: Yassine Oudjana <y.oudjana@protonmail.com> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > --- Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Konrad > drivers/interconnect/qcom/icc-rpm.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/drivers/interconnect/qcom/icc-rpm.c b/drivers/interconnect/qcom/icc-rpm.c > index 91778cfcbc65..da595059cafd 100644 > --- a/drivers/interconnect/qcom/icc-rpm.c > +++ b/drivers/interconnect/qcom/icc-rpm.c > @@ -498,8 +498,7 @@ int qnoc_probe(struct platform_device *pdev) > > if (desc->has_bus_pd) { > ret = dev_pm_domain_attach(dev, true); > - if (ret) > - return ret; > + goto err_disable_clks; > } > > provider = &qp->provider; > @@ -514,8 +513,7 @@ int qnoc_probe(struct platform_device *pdev) > ret = icc_provider_add(provider); > if (ret) { > dev_err(dev, "error adding interconnect provider: %d\n", ret); > - clk_bulk_disable_unprepare(qp->num_clks, qp->bus_clks); > - return ret; > + goto err_disable_clks; > } > > for (i = 0; i < num_nodes; i++) { > @@ -550,8 +548,9 @@ int qnoc_probe(struct platform_device *pdev) > return 0; > err: > icc_nodes_remove(provider); > - clk_bulk_disable_unprepare(qp->num_clks, qp->bus_clks); > icc_provider_del(provider); > +err_disable_clks: > + clk_bulk_disable_unprepare(qp->num_clks, qp->bus_clks); > > return ret; > }
On 1.02.2023 11:15, Johan Hovold wrote: > Make sure to clean up and release resources properly also in case probe > fails when populating child devices. > > Fixes: 57eb14779dfd ("interconnect: qcom: icc-rpmh: Support child NoC device probe") > Cc: stable@vger.kernel.org # 6.0 > Cc: Luca Weiss <luca.weiss@fairphone.com> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > --- Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Konrad > drivers/interconnect/qcom/icc-rpmh.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/interconnect/qcom/icc-rpmh.c b/drivers/interconnect/qcom/icc-rpmh.c > index fd17291c61eb..5168bbf3d92f 100644 > --- a/drivers/interconnect/qcom/icc-rpmh.c > +++ b/drivers/interconnect/qcom/icc-rpmh.c > @@ -235,8 +235,11 @@ int qcom_icc_rpmh_probe(struct platform_device *pdev) > platform_set_drvdata(pdev, qp); > > /* Populate child NoC devices if any */ > - if (of_get_child_count(dev->of_node) > 0) > - return of_platform_populate(dev->of_node, NULL, NULL, dev); > + if (of_get_child_count(dev->of_node) > 0) { > + ret = of_platform_populate(dev->of_node, NULL, NULL, dev); > + if (ret) > + goto err; > + } > > return 0; > err:
On 1.02.2023 11:15, Johan Hovold wrote: > The current interconnect provider registration interface is inherently > racy as nodes are not added until the after adding the provider. This > can specifically cause racing DT lookups to fail. > > Switch to using the new API where the provider is not registered until > after it has been fully initialised. > > Fixes: 4e60a9568dc6 ("interconnect: qcom: add msm8974 driver") > Cc: stable@vger.kernel.org # 5.5 > Cc: Brian Masney <bmasney@redhat.com> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > --- Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Konrad > drivers/interconnect/qcom/msm8974.c | 20 ++++++++------------ > 1 file changed, 8 insertions(+), 12 deletions(-) > > diff --git a/drivers/interconnect/qcom/msm8974.c b/drivers/interconnect/qcom/msm8974.c > index 5ea192f1141d..1828deaca443 100644 > --- a/drivers/interconnect/qcom/msm8974.c > +++ b/drivers/interconnect/qcom/msm8974.c > @@ -692,7 +692,6 @@ static int msm8974_icc_probe(struct platform_device *pdev) > return ret; > > provider = &qp->provider; > - INIT_LIST_HEAD(&provider->nodes); > provider->dev = dev; > provider->set = msm8974_icc_set; > provider->aggregate = icc_std_aggregate; > @@ -700,11 +699,7 @@ static int msm8974_icc_probe(struct platform_device *pdev) > provider->data = data; > provider->get_bw = msm8974_get_bw; > > - ret = icc_provider_add(provider); > - if (ret) { > - dev_err(dev, "error adding interconnect provider: %d\n", ret); > - goto err_disable_clks; > - } > + icc_provider_init(provider); > > for (i = 0; i < num_nodes; i++) { > size_t j; > @@ -712,7 +707,7 @@ static int msm8974_icc_probe(struct platform_device *pdev) > node = icc_node_create(qnodes[i]->id); > if (IS_ERR(node)) { > ret = PTR_ERR(node); > - goto err_del_icc; > + goto err_remove_nodes; > } > > node->name = qnodes[i]->name; > @@ -729,15 +724,16 @@ static int msm8974_icc_probe(struct platform_device *pdev) > } > data->num_nodes = num_nodes; > > + ret = icc_provider_register(provider); > + if (ret) > + goto err_remove_nodes; > + > platform_set_drvdata(pdev, qp); > > return 0; > > -err_del_icc: > +err_remove_nodes: > icc_nodes_remove(provider); > - icc_provider_del(provider); > - > -err_disable_clks: > clk_bulk_disable_unprepare(qp->num_clks, qp->bus_clks); > > return ret; > @@ -747,9 +743,9 @@ static int msm8974_icc_remove(struct platform_device *pdev) > { > struct msm8974_icc_provider *qp = platform_get_drvdata(pdev); > > + icc_provider_deregister(&qp->provider); > icc_nodes_remove(&qp->provider); > clk_bulk_disable_unprepare(qp->num_clks, qp->bus_clks); > - icc_provider_del(&qp->provider); > > return 0; > }