mbox series

[0/4] serial: 8250: Fixes for Oxford Semiconductor 950 UARTs

Message ID alpine.DEB.2.21.2106071700090.1601@angie.orcam.me.uk
Headers show
Series serial: 8250: Fixes for Oxford Semiconductor 950 UARTs | expand

Message

Maciej W. Rozycki June 10, 2021, 6:38 p.m. UTC
Hi,

 In the course of verifying support for SMSC FDC37M817 Super I/O chip's 
data rate of 460800bps with an EXSYS EX-44072 option card based on the 
Oxford Semiconductor OXPCIe952 device I have observed the inability to 
transfer any data correctly at this rate between the two devices.  Lower 
rates, including 230400bps, appeared to work correctly, also with other 
kinds of serial ports referred to with my previous patch series, which 
strongly indicated something being wrong with the Oxford device.

 In the end I have tracked the issue down to our baud base set to 4000000 
for the device being off by 2.4%.  Enough for an incorrect divisor value 
of 9 to be chosen to yield the bit rate of 460800bps being particularly 
inaccurate for the baud base selected and caused the actual rate of 
434027.78bps of being used, -5.8% off.  Consequently the ten bits of data 
sent with every 8-bit character were enough for the drift to accumulate up 
to the two ends to get out of sync, with the stop bit interpreted as bit 7 
of data.  Obviously whoever wrote this code never actually tried higher 
data rates, or only connected Oxford devices to each other causing the 
systematic errors at both ends to cancel each other.

 With the baud base corrected to the chip's default of 3906250 for the 650 
mode which we use (the reset default for the initial 450 mode is 115314, 
approximated for maximum backwards compatibility with legacy OS drivers by 
dividing the device's 62.5MHz clock by 33.875), the new calculated divisor 
value and the new actual bit rate became 8 and 488281.25bps respectively.  
Now +5.96% off, so the stop bit could be missed causing data corruption 
with continuous data streams, but at least that could be worked around by 
using two stop bits instead.  Not a good solution though.

 So I chose to implement proper clock handling for the chip.  The bit rate 
with this device is worked out from the 62.5MHz clock first by choosing an 
oversampling rate between 4 and 16 inclusive, then by the clock prescaler 
between 1 and 63.875 in increments of 0.125, and finally a 16-bit unsigned 
divisor, all of which divide the input clock by the respective value.

 By choosing the right values of these three parameters either exact or 
highly-accurate actual bit rates can be programmed for standard and many 
non-standard rates from 1bps up to 15625000bps, e.g. for the data rate of 
460800bps concerned here I was able to get the accuracy of 0.0064% by 
choosing the values of 7, 3.875, and 5 respectively for the oversampling 
rate, the clock prescaler, and the clock divisor.

 Additionally even with my considerably mighty POWER9 box I have observed 
frequent input overruns with the bit rates of 460800bps and higher, and I 
have noticed we have the receive interrupt trigger level set particularly 
high in terms of FIFO usage percentage for 16C950 UARTs and then we don't 
make the levels configurable.  Lowering the default to a saner value made
the overruns go away altogether for rates below 921600bps.  As I've only 
verified these changes in terminal environment rather than with modems I 
could not make use of hardware flow control which this chip supports and 
which I presume would prevent overruns from happening even with higher bit 
rates.

 There's more that could be done here, for example we don't make use of 
the 950 mode where FIFO trigger levels can be fine-tuned in increments of 
1, which, interestingly, could help with the lower rate modes as reception 
is quite choppy with them, owing to the minimum receive interrupt trigger 
level of 16 in the 650 mode.  I gave the 950 mode a try, but it made the 
chip freeze frequently until new data was received, so clearly I must have 
missed something in the chip's configuration, which I did not investigate.  
Something for a different time perhaps then.

 I have verified these changes for standard termios rates between 300bps 
and 460800bps with my WTI CPM-800 site manager device and my Malta board's 
serial ports, as suitable, all working flawlessly in terminal mode now.  
I have verified standard termios rates between 500000bps and 4000000bps as 
well, however for the lack of other high-speed hardware with a pair of 
Oxford devices only.  Except for input overruns noted above and growing in 
numbers as the rate increased rates of up to 3500000bps worked flawlessly.  
In particular the rate of 576000bps, still without input overruns, gave 
this nice feeling as if working with a virtual terminal rather than over a 
serial line!

 Conversely the rate of 4000000bps showed significant data corruption, 
happening randomly, i.e. some characters went through just fine, while 
other ones became garbled in no particular pattern, unlike with the rate 
inaccuracy described above.  Also with no input overruns whatsoever.  I 
have double-checked that all the three parameters making up the bit rate 
from the clock rate have been programmed correctly.

 Therefore I have concluded this is not an issue with my change (or indeed 
any other part the driver) and it is simply that the rate has exceeded 
either the maximum frequency the EX-44072 board's serial transceivers 
support (I haven't checked the chip types used and I can't persuade myself 
to disassemble the system just to have a look at the board again), or the 
bandwidth of the transmission line used (a flat 8-core telephone cable of 
a standard Cisco console cable assembly).  Not an issue to be addressed in 
software and I find it rather astonishing anyway it worked so well for up 
to 3.5MHz already!

 I have no modems, so I couldn't verify DCE interoperation, but I don't 
expect issues with the bit rates being more accurate now, or the default 
FIFO receiver trigger level tweaked to be more conservative.

 Finally the 16-bit UART_DIV_MAX limitation of the baud rate requested 
with `serial8250_get_baud_rate' makes the standard rates of 200bps and 
lower inaccessible in the regular way with the baud base of 15625000.  
That could be avoided by tweaking our 8250 driver core appropriately, but 
I have figured out with modern serial port usage that would not be the 
best use of my time.  Someone who does have a real need to use an Oxford 
device at these low rates can step in and make the necessary chances.

 Meanwhile I took advantage of the ancient spd_cust feature we thankfully 
continue supporting and actually did verify not only the standard rates 
between 50bps and 200bps, but the rates of 4bps and 2bps as well, using my 
old x86 server's serial port with the baud base of 115200.  That was, 
ahem, an interesting experience both by itself and also with 2bps, which 
revealed a phenomenon with SMSC Super I/O ports not working as documented 
(already noted in the preceding patch series).  Eventually I verified the 
2bps rate with a plain ISA multi I/O card and its 16450 UART my EISA 486 
box has as the remote console, which does support the divisor value of 
57600 required.

 See individual change descriptions for further details including figures.

 Please apply.

  Maciej

Comments

Greg KH June 15, 2021, 12:07 p.m. UTC | #1
On Thu, Jun 10, 2021 at 08:38:55PM +0200, Maciej W. Rozycki wrote:
> Hi,

> 

>  In the course of verifying support for SMSC FDC37M817 Super I/O chip's 

> data rate of 460800bps with an EXSYS EX-44072 option card based on the 

> Oxford Semiconductor OXPCIe952 device I have observed the inability to 

> transfer any data correctly at this rate between the two devices.  Lower 

> rates, including 230400bps, appeared to work correctly, also with other 

> kinds of serial ports referred to with my previous patch series, which 

> strongly indicated something being wrong with the Oxford device.

> 

>  In the end I have tracked the issue down to our baud base set to 4000000 

> for the device being off by 2.4%.  Enough for an incorrect divisor value 

> of 9 to be chosen to yield the bit rate of 460800bps being particularly 

> inaccurate for the baud base selected and caused the actual rate of 

> 434027.78bps of being used, -5.8% off.  Consequently the ten bits of data 

> sent with every 8-bit character were enough for the drift to accumulate up 

> to the two ends to get out of sync, with the stop bit interpreted as bit 7 

> of data.  Obviously whoever wrote this code never actually tried higher 

> data rates, or only connected Oxford devices to each other causing the 

> systematic errors at both ends to cancel each other.

> 

>  With the baud base corrected to the chip's default of 3906250 for the 650 

> mode which we use (the reset default for the initial 450 mode is 115314, 

> approximated for maximum backwards compatibility with legacy OS drivers by 

> dividing the device's 62.5MHz clock by 33.875), the new calculated divisor 

> value and the new actual bit rate became 8 and 488281.25bps respectively.  

> Now +5.96% off, so the stop bit could be missed causing data corruption 

> with continuous data streams, but at least that could be worked around by 

> using two stop bits instead.  Not a good solution though.

> 

>  So I chose to implement proper clock handling for the chip.  The bit rate 

> with this device is worked out from the 62.5MHz clock first by choosing an 

> oversampling rate between 4 and 16 inclusive, then by the clock prescaler 

> between 1 and 63.875 in increments of 0.125, and finally a 16-bit unsigned 

> divisor, all of which divide the input clock by the respective value.

> 

>  By choosing the right values of these three parameters either exact or 

> highly-accurate actual bit rates can be programmed for standard and many 

> non-standard rates from 1bps up to 15625000bps, e.g. for the data rate of 

> 460800bps concerned here I was able to get the accuracy of 0.0064% by 

> choosing the values of 7, 3.875, and 5 respectively for the oversampling 

> rate, the clock prescaler, and the clock divisor.

> 

>  Additionally even with my considerably mighty POWER9 box I have observed 

> frequent input overruns with the bit rates of 460800bps and higher, and I 

> have noticed we have the receive interrupt trigger level set particularly 

> high in terms of FIFO usage percentage for 16C950 UARTs and then we don't 

> make the levels configurable.  Lowering the default to a saner value made

> the overruns go away altogether for rates below 921600bps.  As I've only 

> verified these changes in terminal environment rather than with modems I 

> could not make use of hardware flow control which this chip supports and 

> which I presume would prevent overruns from happening even with higher bit 

> rates.

> 

>  There's more that could be done here, for example we don't make use of 

> the 950 mode where FIFO trigger levels can be fine-tuned in increments of 

> 1, which, interestingly, could help with the lower rate modes as reception 

> is quite choppy with them, owing to the minimum receive interrupt trigger 

> level of 16 in the 650 mode.  I gave the 950 mode a try, but it made the 

> chip freeze frequently until new data was received, so clearly I must have 

> missed something in the chip's configuration, which I did not investigate.  

> Something for a different time perhaps then.

> 

>  I have verified these changes for standard termios rates between 300bps 

> and 460800bps with my WTI CPM-800 site manager device and my Malta board's 

> serial ports, as suitable, all working flawlessly in terminal mode now.  

> I have verified standard termios rates between 500000bps and 4000000bps as 

> well, however for the lack of other high-speed hardware with a pair of 

> Oxford devices only.  Except for input overruns noted above and growing in 

> numbers as the rate increased rates of up to 3500000bps worked flawlessly.  

> In particular the rate of 576000bps, still without input overruns, gave 

> this nice feeling as if working with a virtual terminal rather than over a 

> serial line!

> 

>  Conversely the rate of 4000000bps showed significant data corruption, 

> happening randomly, i.e. some characters went through just fine, while 

> other ones became garbled in no particular pattern, unlike with the rate 

> inaccuracy described above.  Also with no input overruns whatsoever.  I 

> have double-checked that all the three parameters making up the bit rate 

> from the clock rate have been programmed correctly.

> 

>  Therefore I have concluded this is not an issue with my change (or indeed 

> any other part the driver) and it is simply that the rate has exceeded 

> either the maximum frequency the EX-44072 board's serial transceivers 

> support (I haven't checked the chip types used and I can't persuade myself 

> to disassemble the system just to have a look at the board again), or the 

> bandwidth of the transmission line used (a flat 8-core telephone cable of 

> a standard Cisco console cable assembly).  Not an issue to be addressed in 

> software and I find it rather astonishing anyway it worked so well for up 

> to 3.5MHz already!

> 

>  I have no modems, so I couldn't verify DCE interoperation, but I don't 

> expect issues with the bit rates being more accurate now, or the default 

> FIFO receiver trigger level tweaked to be more conservative.

> 

>  Finally the 16-bit UART_DIV_MAX limitation of the baud rate requested 

> with `serial8250_get_baud_rate' makes the standard rates of 200bps and 

> lower inaccessible in the regular way with the baud base of 15625000.  

> That could be avoided by tweaking our 8250 driver core appropriately, but 

> I have figured out with modern serial port usage that would not be the 

> best use of my time.  Someone who does have a real need to use an Oxford 

> device at these low rates can step in and make the necessary chances.

> 

>  Meanwhile I took advantage of the ancient spd_cust feature we thankfully 

> continue supporting and actually did verify not only the standard rates 

> between 50bps and 200bps, but the rates of 4bps and 2bps as well, using my 

> old x86 server's serial port with the baud base of 115200.  That was, 

> ahem, an interesting experience both by itself and also with 2bps, which 

> revealed a phenomenon with SMSC Super I/O ports not working as documented 

> (already noted in the preceding patch series).  Eventually I verified the 

> 2bps rate with a plain ISA multi I/O card and its 16450 UART my EISA 486 

> box has as the remote console, which does support the divisor value of 

> 57600 required.

> 

>  See individual change descriptions for further details including figures.

> 

>  Please apply.


This patch series causes the following build warning to be added:

drivers/tty/serial/8250/8250_pci.c: In function ‘pci_oxsemi_tornado_setup’:
drivers/tty/serial/8250/8250_pci.c:1258:32: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘-129’ to ‘127’ [-Woverflow]
 1258 |                 up->mcr_mask = ~UART_MCR_CLKSEL;
      |                                ^


Can you fix this up and resend?

thanks,

greg k-h
Maciej W. Rozycki June 15, 2021, 2:19 p.m. UTC | #2
On Tue, 15 Jun 2021, Greg Kroah-Hartman wrote:

> This patch series causes the following build warning to be added:

> 

> drivers/tty/serial/8250/8250_pci.c: In function ‘pci_oxsemi_tornado_setup’:

> drivers/tty/serial/8250/8250_pci.c:1258:32: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘-129’ to ‘127’ [-Woverflow]

>  1258 |                 up->mcr_mask = ~UART_MCR_CLKSEL;

>       |                                ^

> 

> 

> Can you fix this up and resend?


 I've seen that, but that's not a problem with my change, but rather with 
<linux/serial_reg.h> making this macro (and the remaining ones from this 
group) expand to a signed constant (0x80 rather than 0x80u).

 I can fix the header, but that would be a separate change, and mind too 
that this is a user header, so it's not clear to me what the impact might 
be on user apps making use of it.

 We could use a GCC pragma to suppress the warning temporarily across this 
piece of code, but it's not clear to me either what our policy has been on 
such approach.

 Thoughts?

 NB casting UART_MCR_CLKSEL here to an unsigned type does not help as GCC
still sees the original constant through the cast; I've already tried that 
of course.

 Last but not least: do we need to have this warning enabled in the first 
place?

  Maciej
Greg KH June 15, 2021, 3:52 p.m. UTC | #3
On Tue, Jun 15, 2021 at 04:19:03PM +0200, Maciej W. Rozycki wrote:
> On Tue, 15 Jun 2021, Greg Kroah-Hartman wrote:

> 

> > This patch series causes the following build warning to be added:

> > 

> > drivers/tty/serial/8250/8250_pci.c: In function ‘pci_oxsemi_tornado_setup’:

> > drivers/tty/serial/8250/8250_pci.c:1258:32: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘-129’ to ‘127’ [-Woverflow]

> >  1258 |                 up->mcr_mask = ~UART_MCR_CLKSEL;

> >       |                                ^

> > 

> > 

> > Can you fix this up and resend?

> 

>  I've seen that, but that's not a problem with my change, but rather with 

> <linux/serial_reg.h> making this macro (and the remaining ones from this 

> group) expand to a signed constant (0x80 rather than 0x80u).


As your change causes it to show up, it must have something to do with
it :)

>  I can fix the header, but that would be a separate change, and mind too 

> that this is a user header, so it's not clear to me what the impact might 

> be on user apps making use of it.


You can not change the uapi header, why would you want to?

>  We could use a GCC pragma to suppress the warning temporarily across this 

> piece of code, but it's not clear to me either what our policy has been on 

> such approach.


What pragma?

>  Thoughts?


Why does your change cause this to show up?

>  NB casting UART_MCR_CLKSEL here to an unsigned type does not help as GCC

> still sees the original constant through the cast; I've already tried that 

> of course.

> 

>  Last but not least: do we need to have this warning enabled in the first 

> place?


No idea, but that's a different discussion, with a different group of
people :)

thanks,

greg k-h
Maciej W. Rozycki June 15, 2021, 5:12 p.m. UTC | #4
On Tue, 15 Jun 2021, Greg Kroah-Hartman wrote:

> > > This patch series causes the following build warning to be added:

> > > 

> > > drivers/tty/serial/8250/8250_pci.c: In function ‘pci_oxsemi_tornado_setup’:

> > > drivers/tty/serial/8250/8250_pci.c:1258:32: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘-129’ to ‘127’ [-Woverflow]

> > >  1258 |                 up->mcr_mask = ~UART_MCR_CLKSEL;

> > >       |                                ^

> > > 

> > > 

> > > Can you fix this up and resend?

> > 

> >  I've seen that, but that's not a problem with my change, but rather with 

> > <linux/serial_reg.h> making this macro (and the remaining ones from this 

> > group) expand to a signed constant (0x80 rather than 0x80u).

> 

> As your change causes it to show up, it must have something to do with

> it :)


 Of course it does, but the problem comes from the data type signedness 
difference between the `mcr_mask' member of `struct uart_8250_port', the 
type of which is (rightfully IMO) `unsigned char' (rather than `char' or 
`signed char') and the UART_MCR_CLKSEL macro, which expands to a signed 
int.  My change does not introduce this data type difference, hence it's 
not responsible for the problem, though it does expose it.

> >  I can fix the header, but that would be a separate change, and mind too 

> > that this is a user header, so it's not clear to me what the impact might 

> > be on user apps making use of it.

> 

> You can not change the uapi header, why would you want to?


 To make the data type of the constants it defines such that they can be 
assigned to program entities they are supposed to be used with without 
changing the sign at truncation time?

> >  We could use a GCC pragma to suppress the warning temporarily across this 

> > piece of code, but it's not clear to me either what our policy has been on 

> > such approach.

> 

> What pragma?


#pragma GCC diagnostic ignored "-Woverflow"

> >  Thoughts?

> 

> Why does your change cause this to show up?


 As I have noted above there is a data type signedness difference between 
`mcr_mask' and UART_MCR_CLKSEL.  So we have the value of 0x80 (128).  
Once bitwise-complemented it becomes 0xffffff7f (-129).  Once assigned to 
`mcr_mask' however it becomes 0x7f (127), which is considered an unsafe 
conversion between signed and unsigned integers by GCC, which is why the 
compiler complains about it.

 The same difference exists with say UART_MCR_OUT2 used in a similar
manner for ALPHA_KLUDGE_MCR, but GCC does not get noisy about it because 
the constant UART_MCR_OUT2 expands to is 0x08 and therefore the position 
of the bit set there does not coincide with the sign bit once truncated to 
8 bits, so the truncation does not cause a sign change.  The same warning 
would trigger however if the constant were left-shifted by 4 before the 
bitwise complement operation, so all these constants should be unsigned.  

 It does not make sense IMO to operate on signed values in the context of 
bit patterns for peripheral hardware registers.

 I'll find a way to paper it over if this is what is desired here, e.g. I 
guess this piece:

		up->mcr_mask = ~0;
		up->mcr_mask ^= UART_MCR_CLKSEL;

will do, although I find it obscurer than my original proposal, and surely 
asking for a comment (which I think is a sign of a problem).

  Maciej
David Laight June 15, 2021, 9:45 p.m. UTC | #5
From: Maciej W. Rozycki

> Sent: 15 June 2021 18:13

...
>  As I have noted above there is a data type signedness difference between

> `mcr_mask' and UART_MCR_CLKSEL.  So we have the value of 0x80 (128).

> Once bitwise-complemented it becomes 0xffffff7f (-129).  Once assigned to

> `mcr_mask' however it becomes 0x7f (127), which is considered an unsafe

> conversion between signed and unsigned integers by GCC, which is why the

> compiler complains about it.


Blame the iso C standards committee for making integer promotions
'value preserving' instead of 'sign preserving' as they were in K&R C.

Try using ^ 0xffu instead of ~.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Maciej W. Rozycki June 26, 2021, 4:13 a.m. UTC | #6
On Tue, 15 Jun 2021, David Laight wrote:

> >  As I have noted above there is a data type signedness difference between

> > `mcr_mask' and UART_MCR_CLKSEL.  So we have the value of 0x80 (128).

> > Once bitwise-complemented it becomes 0xffffff7f (-129).  Once assigned to

> > `mcr_mask' however it becomes 0x7f (127), which is considered an unsafe

> > conversion between signed and unsigned integers by GCC, which is why the

> > compiler complains about it.

> 

> Blame the iso C standards committee for making integer promotions

> 'value preserving' instead of 'sign preserving' as they were in K&R C.

> 

> Try using ^ 0xffu instead of ~.


 Hmm, this is probably the least horrible way to paper it over, thanks.  
Even using a temporary variable (let alone a cast) does not help as GCC 
sees through it, and I've given up on converting <linux/serial_reg.h> to 
have e.g.:

#define UART_MCR_CLKSEL		_UL(0x80) /* Divide clock by 4 (TI16C752, EFR[4]=1) */

as I find that too messy with many of the comments wrapping up.  And using 
a GCC pragma would require a messy compiler version check.

 I have posted an update with a path-of-least-resistance fix then along 
with the 4/4 as v2 of this series.

  Maciej