diff mbox series

serial: 8250_dw: Revert: Do not reclock if already at correct rate

Message ID 20240317214123.34482-1-hdegoede@redhat.com
State New
Headers show
Series serial: 8250_dw: Revert: Do not reclock if already at correct rate | expand

Commit Message

Hans de Goede March 17, 2024, 9:41 p.m. UTC
Commit e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at
correct rate") breaks the dw UARTs on Intel Bay Trail (BYT) and
Cherry Trail (CHT) SoCs.

Before this change the RTL8732BS Bluetooth HCI which is found
connected over the dw UART on both BYT and CHT boards works properly:

Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
Bluetooth: hci0: RTL: rom_version status=0 version=1
Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
Bluetooth: hci0: RTL: fw version 0x365d462e

where as after this change probing it fails:

Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
Bluetooth: hci0: RTL: rom_version status=0 version=1
Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
Bluetooth: hci0: command 0xfc20 tx timeout
Bluetooth: hci0: RTL: download fw command failed (-110)

Revert the changes to fix this regression.

Fixes: e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at correct rate")
Cc: stable@vger.kernel.org
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
Note it is not entirely clear to me why this commit is causing
this issue. Maybe probe() needs to explicitly set the clk rate
which it just got (that feels like a clk driver issue) or maybe
the issue is that unless setup before hand by firmware /
the bootloader serial8250_update_uartclk() needs to be called
at least once to setup things ?  Note that probe() does not call
serial8250_update_uartclk(), this is only called from the
dw8250_clk_notifier_cb()

This requires more debugging which is why I'm proposing
a straight revert to fix the regression ASAP and then this
can be investigated further.
---
 drivers/tty/serial/8250/8250_dw.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Andy Shevchenko March 18, 2024, 10:36 a.m. UTC | #1
On Sun, Mar 17, 2024 at 10:41:23PM +0100, Hans de Goede wrote:
> Commit e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at
> correct rate") breaks the dw UARTs on Intel Bay Trail (BYT) and
> Cherry Trail (CHT) SoCs.
> 
> Before this change the RTL8732BS Bluetooth HCI which is found
> connected over the dw UART on both BYT and CHT boards works properly:
> 
> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
> Bluetooth: hci0: RTL: rom_version status=0 version=1
> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
> Bluetooth: hci0: RTL: fw version 0x365d462e
> 
> where as after this change probing it fails:
> 
> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
> Bluetooth: hci0: RTL: rom_version status=0 version=1
> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
> Bluetooth: hci0: command 0xfc20 tx timeout
> Bluetooth: hci0: RTL: download fw command failed (-110)
> 
> Revert the changes to fix this regression.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Note it is not entirely clear to me why this commit is causing
> this issue. Maybe probe() needs to explicitly set the clk rate
> which it just got (that feels like a clk driver issue) or maybe
> the issue is that unless setup before hand by firmware /
> the bootloader serial8250_update_uartclk() needs to be called
> at least once to setup things ?  Note that probe() does not call
> serial8250_update_uartclk(), this is only called from the
> dw8250_clk_notifier_cb()
> 
> This requires more debugging which is why I'm proposing
> a straight revert to fix the regression ASAP and then this
> can be investigated further.

Yep. When I reviewed the original submission I was got puzzled with
the CLK APIs. Now I might remember that ->set_rate() can't be called
on prepared/enabled clocks and it's possible the same limitation
is applied to ->round_rate().

I also tried to find documentation about the requirements for those
APIs, but failed (maybe was not pursuing enough, dunno). If you happen
to know the one, can you point on it?
Peter Collingbourne March 18, 2024, 6:52 p.m. UTC | #2
On Mon, Mar 18, 2024 at 3:36 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Sun, Mar 17, 2024 at 10:41:23PM +0100, Hans de Goede wrote:
> > Commit e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at
> > correct rate") breaks the dw UARTs on Intel Bay Trail (BYT) and
> > Cherry Trail (CHT) SoCs.
> >
> > Before this change the RTL8732BS Bluetooth HCI which is found
> > connected over the dw UART on both BYT and CHT boards works properly:
> >
> > Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
> > Bluetooth: hci0: RTL: rom_version status=0 version=1
> > Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
> > Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
> > Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
> > Bluetooth: hci0: RTL: fw version 0x365d462e
> >
> > where as after this change probing it fails:
> >
> > Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
> > Bluetooth: hci0: RTL: rom_version status=0 version=1
> > Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
> > Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
> > Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
> > Bluetooth: hci0: command 0xfc20 tx timeout
> > Bluetooth: hci0: RTL: download fw command failed (-110)
> >
> > Revert the changes to fix this regression.
>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>
> > Note it is not entirely clear to me why this commit is causing
> > this issue. Maybe probe() needs to explicitly set the clk rate
> > which it just got (that feels like a clk driver issue) or maybe
> > the issue is that unless setup before hand by firmware /
> > the bootloader serial8250_update_uartclk() needs to be called
> > at least once to setup things ?  Note that probe() does not call
> > serial8250_update_uartclk(), this is only called from the
> > dw8250_clk_notifier_cb()
> >
> > This requires more debugging which is why I'm proposing
> > a straight revert to fix the regression ASAP and then this
> > can be investigated further.
>
> Yep. When I reviewed the original submission I was got puzzled with
> the CLK APIs. Now I might remember that ->set_rate() can't be called
> on prepared/enabled clocks and it's possible the same limitation
> is applied to ->round_rate().
>
> I also tried to find documentation about the requirements for those
> APIs, but failed (maybe was not pursuing enough, dunno). If you happen
> to know the one, can you point on it?

To me it seems to be unlikely to be related to round_rate(). It seems
more likely that my patch causes us to never actually set the clock
rate (e.g. because uartclk was initialized to the intended clock rate
instead of the current actual clock rate). It should be possible to
confirm by checking the behavior with my patch with `&& p->uartclk !=
rate` removed, which I would expect to unbreak Hans's scenario. If my
hypothesis is correct, the fix might involve querying the clock with
clk_get_rate() in the if instead of reading from uartclk.

Peter
Hans de Goede March 28, 2024, 12:35 p.m. UTC | #3
Hi,

On 3/28/24 8:10 AM, Hans de Goede wrote:
> Hi,
> 
> On 3/18/24 7:52 PM, Peter Collingbourne wrote:
>> On Mon, Mar 18, 2024 at 3:36 AM Andy Shevchenko
>> <andriy.shevchenko@linux.intel.com> wrote:
>>>
>>> On Sun, Mar 17, 2024 at 10:41:23PM +0100, Hans de Goede wrote:
>>>> Commit e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at
>>>> correct rate") breaks the dw UARTs on Intel Bay Trail (BYT) and
>>>> Cherry Trail (CHT) SoCs.
>>>>
>>>> Before this change the RTL8732BS Bluetooth HCI which is found
>>>> connected over the dw UART on both BYT and CHT boards works properly:
>>>>
>>>> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
>>>> Bluetooth: hci0: RTL: rom_version status=0 version=1
>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
>>>> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
>>>> Bluetooth: hci0: RTL: fw version 0x365d462e
>>>>
>>>> where as after this change probing it fails:
>>>>
>>>> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
>>>> Bluetooth: hci0: RTL: rom_version status=0 version=1
>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
>>>> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
>>>> Bluetooth: hci0: command 0xfc20 tx timeout
>>>> Bluetooth: hci0: RTL: download fw command failed (-110)
>>>>
>>>> Revert the changes to fix this regression.
>>>
>>> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>>>
>>>> Note it is not entirely clear to me why this commit is causing
>>>> this issue. Maybe probe() needs to explicitly set the clk rate
>>>> which it just got (that feels like a clk driver issue) or maybe
>>>> the issue is that unless setup before hand by firmware /
>>>> the bootloader serial8250_update_uartclk() needs to be called
>>>> at least once to setup things ?  Note that probe() does not call
>>>> serial8250_update_uartclk(), this is only called from the
>>>> dw8250_clk_notifier_cb()
>>>>
>>>> This requires more debugging which is why I'm proposing
>>>> a straight revert to fix the regression ASAP and then this
>>>> can be investigated further.
>>>
>>> Yep. When I reviewed the original submission I was got puzzled with
>>> the CLK APIs. Now I might remember that ->set_rate() can't be called
>>> on prepared/enabled clocks and it's possible the same limitation
>>> is applied to ->round_rate().
>>>
>>> I also tried to find documentation about the requirements for those
>>> APIs, but failed (maybe was not pursuing enough, dunno). If you happen
>>> to know the one, can you point on it?
>>
>> To me it seems to be unlikely to be related to round_rate(). It seems
>> more likely that my patch causes us to never actually set the clock
>> rate (e.g. because uartclk was initialized to the intended clock rate
>> instead of the current actual clock rate).
> 
> I agree that the likely cause is that we never set the clk-rate. I'm not
> sure if the issue is us never actually calling clk_set_rate() or if
> the issue is that by never calling clk_set_rate() dw8250_clk_notifier_cb()
> never gets called and thus we never call serial8250_update_uartclk()
> 
>> It should be possible to
>> confirm by checking the behavior with my patch with `&& p->uartclk !=
>> rate` removed, which I would expect to unbreak Hans's scenario. If my
>> hypothesis is correct, the fix might involve querying the clock with
>> clk_get_rate() in the if instead of reading from uartclk.
> 
> Querying the clk with clk_get_rate() instead of reading it from
> uartclk will not help as uartclk gets initialized with clk_get_rate()
> in dw8250_probe(). So I believe that in my scenario clk_get_rate()
> already returns the desired rate causing us to never call clk_set_rate()
> at all which leaves 2 possible root causes for the regressions:
> 
> 1. The clk generator has non readable registers and the returned
> rate from clk_get_rate() is a default rate and the actual hw is
> programmed differently, iow we need to call clk_set_rate() at
> least once on this hw to ensure that the clk generator is prggrammed
> properly.
> 
> 2. The 8250 code is not working as it should because
> serial8250_update_uartclk() has never been called.

Ok, so it looks like this actually is an issue with how clk_round_rate()
works on this hw (atm, maybe the clk driver needs fixing).

I have added the following to debug this:

diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c
index a3acbf0f5da1..3152872e50b2 100644
--- a/drivers/tty/serial/8250/8250_dw.c
+++ b/drivers/tty/serial/8250/8250_dw.c
@@ -306,6 +306,8 @@ static void dw8250_clk_work_cb(struct work_struct *work)
 	if (rate <= 0)
 		return;
 
+	pr_info("uartclk work_cb clk_get_rate() returns: %ld\n", rate);
+
 	up = serial8250_get_port(d->data.line);
 
 	serial8250_update_uartclk(&up->port, rate);
@@ -353,11 +355,15 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
 {
 	unsigned long newrate = tty_termios_baud_rate(termios) * 16;
 	struct dw8250_data *d = to_dw8250_data(p->private_data);
+	unsigned long currentrate = clk_get_rate(d->clk);
 	long rate;
 	int ret;
 
+
 	rate = clk_round_rate(d->clk, newrate);
-	if (rate > 0 && p->uartclk != rate) {
+	pr_info("uartclk set_termios new: %ld new-rounded: %ld current %ld cached %d\n",
+		newrate, rate, currentrate, p->uartclk);
+	if (rate > 0) {
 		clk_disable_unprepare(d->clk);
 		/*
 		 * Note that any clock-notifer worker will block in
@@ -593,6 +599,8 @@ static int dw8250_probe(struct platform_device *pdev)
 	if (!p->uartclk)
 		return dev_err_probe(dev, -EINVAL, "clock rate not defined\n");
 
+	pr_info("uartclk initial cached %d\n", p->uartclk);
+
 	data->pclk = devm_clk_get_optional_enabled(dev, "apb_pclk");
 	if (IS_ERR(data->pclk))
 		return PTR_ERR(data->pclk);

And then I get the following output:

[    3.119182] uartclk initial cached 44236800
[    3.139923] uartclk work_cb clk_get_rate() returns: 44236800
[    3.152469] uartclk initial cached 44236800
[    3.172165] uartclk work_cb clk_get_rate() returns: 44236800
[   34.128257] uartclk set_termios new: 153600 new-rounded: 44236800 current 44236800 cached 44236800
[   34.130039] uartclk work_cb clk_get_rate() returns: 153600
[   34.131975] uartclk set_termios new: 153600 new-rounded: 153600 current 153600 cached 153600
[   34.132091] uartclk set_termios new: 153600 new-rounded: 153600 current 153600 cached 153600
[   34.132140] uartclk set_termios new: 153600 new-rounded: 153600 current 153600 cached 153600
[   34.132187] uartclk set_termios new: 1843200 new-rounded: 153600 current 153600 cached 153600
[   34.133536] uartclk work_cb clk_get_rate() returns: 1843200

Notice how the new-rounded just returns the current rate of the clk,
rather then a rounded value of new.

I'm not familiar enough with the clk framework to debug this further.

Peter, IMHO we really must revert your commit since it is completely
breaking UARTs on many different Intel boards. Can you please give your
ack for reverting this for now ?

Regards,

Hans


p.s.

For anyone who wants to dive into the clk_round_rate() issue deeper,
the code registering the involved clks is here:

drivers/acpi/acpi_lpss.c: register_device_clock()

And for the clocks in question fixed_clk_rate is 0 and both
the LPSS_CLK_GATE and LPSS_CLK_DIVIDER flags are set, so
for a single UART I get:

[root@fedora ~]# ls -d /sys/kernel/debug/clk/80860F0A:01*
/sys/kernel/debug/clk/80860F0A:01      /sys/kernel/debug/clk/80860F0A:01-update
/sys/kernel/debug/clk/80860F0A:01-div

With the 80860F0A:01-update clk being the clk which is
actually used / controlled by the 8250_dw.c code.
Hans de Goede March 29, 2024, 11:42 a.m. UTC | #4
Hi Peter,

On 3/29/24 3:35 AM, Peter Collingbourne wrote:
> On Thu, Mar 28, 2024 at 5:35 AM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 3/28/24 8:10 AM, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 3/18/24 7:52 PM, Peter Collingbourne wrote:
>>>> On Mon, Mar 18, 2024 at 3:36 AM Andy Shevchenko
>>>> <andriy.shevchenko@linux.intel.com> wrote:
>>>>>
>>>>> On Sun, Mar 17, 2024 at 10:41:23PM +0100, Hans de Goede wrote:
>>>>>> Commit e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at
>>>>>> correct rate") breaks the dw UARTs on Intel Bay Trail (BYT) and
>>>>>> Cherry Trail (CHT) SoCs.
>>>>>>
>>>>>> Before this change the RTL8732BS Bluetooth HCI which is found
>>>>>> connected over the dw UART on both BYT and CHT boards works properly:
>>>>>>
>>>>>> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
>>>>>> Bluetooth: hci0: RTL: rom_version status=0 version=1
>>>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
>>>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
>>>>>> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
>>>>>> Bluetooth: hci0: RTL: fw version 0x365d462e
>>>>>>
>>>>>> where as after this change probing it fails:
>>>>>>
>>>>>> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
>>>>>> Bluetooth: hci0: RTL: rom_version status=0 version=1
>>>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
>>>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
>>>>>> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
>>>>>> Bluetooth: hci0: command 0xfc20 tx timeout
>>>>>> Bluetooth: hci0: RTL: download fw command failed (-110)
>>>>>>
>>>>>> Revert the changes to fix this regression.
>>>>>
>>>>> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>>>>>
>>>>>> Note it is not entirely clear to me why this commit is causing
>>>>>> this issue. Maybe probe() needs to explicitly set the clk rate
>>>>>> which it just got (that feels like a clk driver issue) or maybe
>>>>>> the issue is that unless setup before hand by firmware /
>>>>>> the bootloader serial8250_update_uartclk() needs to be called
>>>>>> at least once to setup things ?  Note that probe() does not call
>>>>>> serial8250_update_uartclk(), this is only called from the
>>>>>> dw8250_clk_notifier_cb()
>>>>>>
>>>>>> This requires more debugging which is why I'm proposing
>>>>>> a straight revert to fix the regression ASAP and then this
>>>>>> can be investigated further.
>>>>>
>>>>> Yep. When I reviewed the original submission I was got puzzled with
>>>>> the CLK APIs. Now I might remember that ->set_rate() can't be called
>>>>> on prepared/enabled clocks and it's possible the same limitation
>>>>> is applied to ->round_rate().
>>>>>
>>>>> I also tried to find documentation about the requirements for those
>>>>> APIs, but failed (maybe was not pursuing enough, dunno). If you happen
>>>>> to know the one, can you point on it?
>>>>
>>>> To me it seems to be unlikely to be related to round_rate(). It seems
>>>> more likely that my patch causes us to never actually set the clock
>>>> rate (e.g. because uartclk was initialized to the intended clock rate
>>>> instead of the current actual clock rate).
>>>
>>> I agree that the likely cause is that we never set the clk-rate. I'm not
>>> sure if the issue is us never actually calling clk_set_rate() or if
>>> the issue is that by never calling clk_set_rate() dw8250_clk_notifier_cb()
>>> never gets called and thus we never call serial8250_update_uartclk()
>>>
>>>> It should be possible to
>>>> confirm by checking the behavior with my patch with `&& p->uartclk !=
>>>> rate` removed, which I would expect to unbreak Hans's scenario. If my
>>>> hypothesis is correct, the fix might involve querying the clock with
>>>> clk_get_rate() in the if instead of reading from uartclk.
>>>
>>> Querying the clk with clk_get_rate() instead of reading it from
>>> uartclk will not help as uartclk gets initialized with clk_get_rate()
>>> in dw8250_probe(). So I believe that in my scenario clk_get_rate()
>>> already returns the desired rate causing us to never call clk_set_rate()
>>> at all which leaves 2 possible root causes for the regressions:
>>>
>>> 1. The clk generator has non readable registers and the returned
>>> rate from clk_get_rate() is a default rate and the actual hw is
>>> programmed differently, iow we need to call clk_set_rate() at
>>> least once on this hw to ensure that the clk generator is prggrammed
>>> properly.
>>>
>>> 2. The 8250 code is not working as it should because
>>> serial8250_update_uartclk() has never been called.
>>
>> Ok, so it looks like this actually is an issue with how clk_round_rate()
>> works on this hw (atm, maybe the clk driver needs fixing).
>>
>> I have added the following to debug this:
>>
>> diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c
>> index a3acbf0f5da1..3152872e50b2 100644
>> --- a/drivers/tty/serial/8250/8250_dw.c
>> +++ b/drivers/tty/serial/8250/8250_dw.c
>> @@ -306,6 +306,8 @@ static void dw8250_clk_work_cb(struct work_struct *work)
>>         if (rate <= 0)
>>                 return;
>>
>> +       pr_info("uartclk work_cb clk_get_rate() returns: %ld\n", rate);
>> +
>>         up = serial8250_get_port(d->data.line);
>>
>>         serial8250_update_uartclk(&up->port, rate);
>> @@ -353,11 +355,15 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
>>  {
>>         unsigned long newrate = tty_termios_baud_rate(termios) * 16;
>>         struct dw8250_data *d = to_dw8250_data(p->private_data);
>> +       unsigned long currentrate = clk_get_rate(d->clk);
>>         long rate;
>>         int ret;
>>
>> +
>>         rate = clk_round_rate(d->clk, newrate);
>> -       if (rate > 0 && p->uartclk != rate) {
>> +       pr_info("uartclk set_termios new: %ld new-rounded: %ld current %ld cached %d\n",
>> +               newrate, rate, currentrate, p->uartclk);
>> +       if (rate > 0) {
>>                 clk_disable_unprepare(d->clk);
>>                 /*
>>                  * Note that any clock-notifer worker will block in
>> @@ -593,6 +599,8 @@ static int dw8250_probe(struct platform_device *pdev)
>>         if (!p->uartclk)
>>                 return dev_err_probe(dev, -EINVAL, "clock rate not defined\n");
>>
>> +       pr_info("uartclk initial cached %d\n", p->uartclk);
>> +
>>         data->pclk = devm_clk_get_optional_enabled(dev, "apb_pclk");
>>         if (IS_ERR(data->pclk))
>>                 return PTR_ERR(data->pclk);
>>
>> And then I get the following output:
>>
>> [    3.119182] uartclk initial cached 44236800
>> [    3.139923] uartclk work_cb clk_get_rate() returns: 44236800
>> [    3.152469] uartclk initial cached 44236800
>> [    3.172165] uartclk work_cb clk_get_rate() returns: 44236800
>> [   34.128257] uartclk set_termios new: 153600 new-rounded: 44236800 current 44236800 cached 44236800
>> [   34.130039] uartclk work_cb clk_get_rate() returns: 153600
>> [   34.131975] uartclk set_termios new: 153600 new-rounded: 153600 current 153600 cached 153600
>> [   34.132091] uartclk set_termios new: 153600 new-rounded: 153600 current 153600 cached 153600
>> [   34.132140] uartclk set_termios new: 153600 new-rounded: 153600 current 153600 cached 153600
>> [   34.132187] uartclk set_termios new: 1843200 new-rounded: 153600 current 153600 cached 153600
>> [   34.133536] uartclk work_cb clk_get_rate() returns: 1843200
>>
>> Notice how the new-rounded just returns the current rate of the clk,
>> rather then a rounded value of new.
>>
>> I'm not familiar enough with the clk framework to debug this further.
>>
>> Peter, IMHO we really must revert your commit since it is completely
>> breaking UARTs on many different Intel boards. Can you please give your
>> ack for reverting this for now ?
> 
> That's fine with me.

Great, thank you.

> I will try to dig into the code soon to figure
> out what is going on unless someone gets there first.

Thinking some more about this I think the following might
be going on (this is only a theory I have, not sure at all):

The 80860F0A:01-update clk itself does not allow
changing it rate, which is why clk_round_rate() is simply
returning the current rate of it.

But its parent, 80860F0A:01-div does allow changing its
rate and it has only 1 child / consumer,
the 80860F0A:01-update clk. So because of this
clk_set_rate() propagates the clk_set_rate() call
the 8250_dw code does on 80860F0A:01-update to
80860F0A:01-div, which is ok to do since there
are no competing conumers who would be affected
by the clk-rate change.

And then the propagated clk_set_rate() call on
80860F0A:01-div successfully updates the clk-rate
and a get_rate on 80860F0A:01-update (which AFAICT
is just a gate after the divider) now returns
the new rate.

Again just a theory but this would explain the weird
clk_get_rate() behavior.

In case it helps here is the clk chain for the
8250_dw clk:

lpss_clk (fixed 100MHz) ->
80860F0A:01 (gate only?) ->
80860F0A:01-div (working set_rate()) ->
80860F0A:01-update (gate only ?) ->
8500_dw-baudclk

> Acked-by: Peter Collingbourne <pcc@google.com>

Thanks. Greg can we get this merged please
(it is a regression fix for a 6.8 regression) ?


Regards,

Hans








>> p.s.
>>
>> For anyone who wants to dive into the clk_round_rate() issue deeper,
>> the code registering the involved clks is here:
>>
>> drivers/acpi/acpi_lpss.c: register_device_clock()
>>
>> And for the clocks in question fixed_clk_rate is 0 and both
>> the LPSS_CLK_GATE and LPSS_CLK_DIVIDER flags are set, so
>> for a single UART I get:
>>
>> [root@fedora ~]# ls -d /sys/kernel/debug/clk/80860F0A:01*
>> /sys/kernel/debug/clk/80860F0A:01      /sys/kernel/debug/clk/80860F0A:01-update
>> /sys/kernel/debug/clk/80860F0A:01-div
>>
>> With the 80860F0A:01-update clk being the clk which is
>> actually used / controlled by the 8250_dw.c code.
>>
>
Linux regression tracking (Thorsten Leemhuis) April 5, 2024, 6:14 a.m. UTC | #5
On 29.03.24 13:12, Greg Kroah-Hartman wrote:
> On Fri, Mar 29, 2024 at 12:42:14PM +0100, Hans de Goede wrote:
>> On 3/29/24 3:35 AM, Peter Collingbourne wrote:
>>> On Thu, Mar 28, 2024 at 5:35 AM Hans de Goede <hdegoede@redhat.com> wrote:
>>>> On 3/28/24 8:10 AM, Hans de Goede wrote:
>>>>> On 3/18/24 7:52 PM, Peter Collingbourne wrote:
>>>>>> On Mon, Mar 18, 2024 at 3:36 AM Andy Shevchenko
>>>>>> <andriy.shevchenko@linux.intel.com> wrote:
>>>>>>>
>>>>>>> On Sun, Mar 17, 2024 at 10:41:23PM +0100, Hans de Goede wrote:
>>>>>>>> Commit e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at
>>>>>>>> correct rate") breaks the dw UARTs on Intel Bay Trail (BYT) and
>>>>>>>> Cherry Trail (CHT) SoCs.
>>>>>>>>
>>>>>>>> Before this change the RTL8732BS Bluetooth HCI which is found
>>>>>>>> connected over the dw UART on both BYT and CHT boards works properly:
>>>>>>>>
>>>>>>>> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
>>>>>>>> Bluetooth: hci0: RTL: rom_version status=0 version=1
>>>>>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
>>>>>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
>>>>>>>> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
>>>>>>>> Bluetooth: hci0: RTL: fw version 0x365d462e
>>>>>>>>
>>>>>>>> where as after this change probing it fails:
> [...]
>>> Acked-by: Peter Collingbourne <pcc@google.com>
>>
>> Thanks. Greg can we get this merged please
>> (it is a regression fix for a 6.8 regression) ?
> 
> Will queue it up soon, thanks.

You are obviously busy (we really need to enhance Git so it can clone
humans, too!), nevertheless: friendly reminder that that fix afaics
still is not queued.

Side note: there is another fix for a serial 6.8 regression I track
waiting for review here:
https://lore.kernel.org/linux-serial/20240325071649.27040-1-tony@atomide.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke
Greg KH April 5, 2024, 6:42 a.m. UTC | #6
On Fri, Apr 05, 2024 at 08:14:03AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 29.03.24 13:12, Greg Kroah-Hartman wrote:
> > On Fri, Mar 29, 2024 at 12:42:14PM +0100, Hans de Goede wrote:
> >> On 3/29/24 3:35 AM, Peter Collingbourne wrote:
> >>> On Thu, Mar 28, 2024 at 5:35 AM Hans de Goede <hdegoede@redhat.com> wrote:
> >>>> On 3/28/24 8:10 AM, Hans de Goede wrote:
> >>>>> On 3/18/24 7:52 PM, Peter Collingbourne wrote:
> >>>>>> On Mon, Mar 18, 2024 at 3:36 AM Andy Shevchenko
> >>>>>> <andriy.shevchenko@linux.intel.com> wrote:
> >>>>>>>
> >>>>>>> On Sun, Mar 17, 2024 at 10:41:23PM +0100, Hans de Goede wrote:
> >>>>>>>> Commit e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at
> >>>>>>>> correct rate") breaks the dw UARTs on Intel Bay Trail (BYT) and
> >>>>>>>> Cherry Trail (CHT) SoCs.
> >>>>>>>>
> >>>>>>>> Before this change the RTL8732BS Bluetooth HCI which is found
> >>>>>>>> connected over the dw UART on both BYT and CHT boards works properly:
> >>>>>>>>
> >>>>>>>> Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
> >>>>>>>> Bluetooth: hci0: RTL: rom_version status=0 version=1
> >>>>>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
> >>>>>>>> Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
> >>>>>>>> Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
> >>>>>>>> Bluetooth: hci0: RTL: fw version 0x365d462e
> >>>>>>>>
> >>>>>>>> where as after this change probing it fails:
> > [...]
> >>> Acked-by: Peter Collingbourne <pcc@google.com>
> >>
> >> Thanks. Greg can we get this merged please
> >> (it is a regression fix for a 6.8 regression) ?
> > 
> > Will queue it up soon, thanks.
> 
> You are obviously busy (we really need to enhance Git so it can clone
> humans, too!), nevertheless: friendly reminder that that fix afaics
> still is not queued.

"soon" is relative :)

> Side note: there is another fix for a serial 6.8 regression I track
> waiting for review here:
> https://lore.kernel.org/linux-serial/20240325071649.27040-1-tony@atomide.com/

It's in my queue, I'll try to get to serial stuff later today, but not
promising anything...

thanks,

greg k-h
diff mbox series

Patch

diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c
index c1d43f040c43..2d1f350a4bea 100644
--- a/drivers/tty/serial/8250/8250_dw.c
+++ b/drivers/tty/serial/8250/8250_dw.c
@@ -357,9 +357,9 @@  static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
 	long rate;
 	int ret;
 
+	clk_disable_unprepare(d->clk);
 	rate = clk_round_rate(d->clk, newrate);
-	if (rate > 0 && p->uartclk != rate) {
-		clk_disable_unprepare(d->clk);
+	if (rate > 0) {
 		/*
 		 * Note that any clock-notifer worker will block in
 		 * serial8250_update_uartclk() until we are done.
@@ -367,8 +367,8 @@  static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
 		ret = clk_set_rate(d->clk, newrate);
 		if (!ret)
 			p->uartclk = rate;
-		clk_prepare_enable(d->clk);
 	}
+	clk_prepare_enable(d->clk);
 
 	dw8250_do_set_termios(p, termios, old);
 }