Message ID | alpine.DEB.2.21.2106071700090.1601@angie.orcam.me.uk |
---|---|
Headers | show |
Series | serial: 8250: Fixes for Oxford Semiconductor 950 UARTs | expand |
On Thu, Jun 10, 2021 at 08:38:55PM +0200, Maciej W. Rozycki wrote: > Hi, > > In the course of verifying support for SMSC FDC37M817 Super I/O chip's > data rate of 460800bps with an EXSYS EX-44072 option card based on the > Oxford Semiconductor OXPCIe952 device I have observed the inability to > transfer any data correctly at this rate between the two devices. Lower > rates, including 230400bps, appeared to work correctly, also with other > kinds of serial ports referred to with my previous patch series, which > strongly indicated something being wrong with the Oxford device. > > In the end I have tracked the issue down to our baud base set to 4000000 > for the device being off by 2.4%. Enough for an incorrect divisor value > of 9 to be chosen to yield the bit rate of 460800bps being particularly > inaccurate for the baud base selected and caused the actual rate of > 434027.78bps of being used, -5.8% off. Consequently the ten bits of data > sent with every 8-bit character were enough for the drift to accumulate up > to the two ends to get out of sync, with the stop bit interpreted as bit 7 > of data. Obviously whoever wrote this code never actually tried higher > data rates, or only connected Oxford devices to each other causing the > systematic errors at both ends to cancel each other. > > With the baud base corrected to the chip's default of 3906250 for the 650 > mode which we use (the reset default for the initial 450 mode is 115314, > approximated for maximum backwards compatibility with legacy OS drivers by > dividing the device's 62.5MHz clock by 33.875), the new calculated divisor > value and the new actual bit rate became 8 and 488281.25bps respectively. > Now +5.96% off, so the stop bit could be missed causing data corruption > with continuous data streams, but at least that could be worked around by > using two stop bits instead. Not a good solution though. > > So I chose to implement proper clock handling for the chip. The bit rate > with this device is worked out from the 62.5MHz clock first by choosing an > oversampling rate between 4 and 16 inclusive, then by the clock prescaler > between 1 and 63.875 in increments of 0.125, and finally a 16-bit unsigned > divisor, all of which divide the input clock by the respective value. > > By choosing the right values of these three parameters either exact or > highly-accurate actual bit rates can be programmed for standard and many > non-standard rates from 1bps up to 15625000bps, e.g. for the data rate of > 460800bps concerned here I was able to get the accuracy of 0.0064% by > choosing the values of 7, 3.875, and 5 respectively for the oversampling > rate, the clock prescaler, and the clock divisor. > > Additionally even with my considerably mighty POWER9 box I have observed > frequent input overruns with the bit rates of 460800bps and higher, and I > have noticed we have the receive interrupt trigger level set particularly > high in terms of FIFO usage percentage for 16C950 UARTs and then we don't > make the levels configurable. Lowering the default to a saner value made > the overruns go away altogether for rates below 921600bps. As I've only > verified these changes in terminal environment rather than with modems I > could not make use of hardware flow control which this chip supports and > which I presume would prevent overruns from happening even with higher bit > rates. > > There's more that could be done here, for example we don't make use of > the 950 mode where FIFO trigger levels can be fine-tuned in increments of > 1, which, interestingly, could help with the lower rate modes as reception > is quite choppy with them, owing to the minimum receive interrupt trigger > level of 16 in the 650 mode. I gave the 950 mode a try, but it made the > chip freeze frequently until new data was received, so clearly I must have > missed something in the chip's configuration, which I did not investigate. > Something for a different time perhaps then. > > I have verified these changes for standard termios rates between 300bps > and 460800bps with my WTI CPM-800 site manager device and my Malta board's > serial ports, as suitable, all working flawlessly in terminal mode now. > I have verified standard termios rates between 500000bps and 4000000bps as > well, however for the lack of other high-speed hardware with a pair of > Oxford devices only. Except for input overruns noted above and growing in > numbers as the rate increased rates of up to 3500000bps worked flawlessly. > In particular the rate of 576000bps, still without input overruns, gave > this nice feeling as if working with a virtual terminal rather than over a > serial line! > > Conversely the rate of 4000000bps showed significant data corruption, > happening randomly, i.e. some characters went through just fine, while > other ones became garbled in no particular pattern, unlike with the rate > inaccuracy described above. Also with no input overruns whatsoever. I > have double-checked that all the three parameters making up the bit rate > from the clock rate have been programmed correctly. > > Therefore I have concluded this is not an issue with my change (or indeed > any other part the driver) and it is simply that the rate has exceeded > either the maximum frequency the EX-44072 board's serial transceivers > support (I haven't checked the chip types used and I can't persuade myself > to disassemble the system just to have a look at the board again), or the > bandwidth of the transmission line used (a flat 8-core telephone cable of > a standard Cisco console cable assembly). Not an issue to be addressed in > software and I find it rather astonishing anyway it worked so well for up > to 3.5MHz already! > > I have no modems, so I couldn't verify DCE interoperation, but I don't > expect issues with the bit rates being more accurate now, or the default > FIFO receiver trigger level tweaked to be more conservative. > > Finally the 16-bit UART_DIV_MAX limitation of the baud rate requested > with `serial8250_get_baud_rate' makes the standard rates of 200bps and > lower inaccessible in the regular way with the baud base of 15625000. > That could be avoided by tweaking our 8250 driver core appropriately, but > I have figured out with modern serial port usage that would not be the > best use of my time. Someone who does have a real need to use an Oxford > device at these low rates can step in and make the necessary chances. > > Meanwhile I took advantage of the ancient spd_cust feature we thankfully > continue supporting and actually did verify not only the standard rates > between 50bps and 200bps, but the rates of 4bps and 2bps as well, using my > old x86 server's serial port with the baud base of 115200. That was, > ahem, an interesting experience both by itself and also with 2bps, which > revealed a phenomenon with SMSC Super I/O ports not working as documented > (already noted in the preceding patch series). Eventually I verified the > 2bps rate with a plain ISA multi I/O card and its 16450 UART my EISA 486 > box has as the remote console, which does support the divisor value of > 57600 required. > > See individual change descriptions for further details including figures. > > Please apply. This patch series causes the following build warning to be added: drivers/tty/serial/8250/8250_pci.c: In function ‘pci_oxsemi_tornado_setup’: drivers/tty/serial/8250/8250_pci.c:1258:32: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘-129’ to ‘127’ [-Woverflow] 1258 | up->mcr_mask = ~UART_MCR_CLKSEL; | ^ Can you fix this up and resend? thanks, greg k-h
On Tue, 15 Jun 2021, Greg Kroah-Hartman wrote: > This patch series causes the following build warning to be added: > > drivers/tty/serial/8250/8250_pci.c: In function ‘pci_oxsemi_tornado_setup’: > drivers/tty/serial/8250/8250_pci.c:1258:32: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘-129’ to ‘127’ [-Woverflow] > 1258 | up->mcr_mask = ~UART_MCR_CLKSEL; > | ^ > > > Can you fix this up and resend? I've seen that, but that's not a problem with my change, but rather with <linux/serial_reg.h> making this macro (and the remaining ones from this group) expand to a signed constant (0x80 rather than 0x80u). I can fix the header, but that would be a separate change, and mind too that this is a user header, so it's not clear to me what the impact might be on user apps making use of it. We could use a GCC pragma to suppress the warning temporarily across this piece of code, but it's not clear to me either what our policy has been on such approach. Thoughts? NB casting UART_MCR_CLKSEL here to an unsigned type does not help as GCC still sees the original constant through the cast; I've already tried that of course. Last but not least: do we need to have this warning enabled in the first place? Maciej
On Tue, Jun 15, 2021 at 04:19:03PM +0200, Maciej W. Rozycki wrote: > On Tue, 15 Jun 2021, Greg Kroah-Hartman wrote: > > > This patch series causes the following build warning to be added: > > > > drivers/tty/serial/8250/8250_pci.c: In function ‘pci_oxsemi_tornado_setup’: > > drivers/tty/serial/8250/8250_pci.c:1258:32: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘-129’ to ‘127’ [-Woverflow] > > 1258 | up->mcr_mask = ~UART_MCR_CLKSEL; > > | ^ > > > > > > Can you fix this up and resend? > > I've seen that, but that's not a problem with my change, but rather with > <linux/serial_reg.h> making this macro (and the remaining ones from this > group) expand to a signed constant (0x80 rather than 0x80u). As your change causes it to show up, it must have something to do with it :) > I can fix the header, but that would be a separate change, and mind too > that this is a user header, so it's not clear to me what the impact might > be on user apps making use of it. You can not change the uapi header, why would you want to? > We could use a GCC pragma to suppress the warning temporarily across this > piece of code, but it's not clear to me either what our policy has been on > such approach. What pragma? > Thoughts? Why does your change cause this to show up? > NB casting UART_MCR_CLKSEL here to an unsigned type does not help as GCC > still sees the original constant through the cast; I've already tried that > of course. > > Last but not least: do we need to have this warning enabled in the first > place? No idea, but that's a different discussion, with a different group of people :) thanks, greg k-h
On Tue, 15 Jun 2021, Greg Kroah-Hartman wrote: > > > This patch series causes the following build warning to be added: > > > > > > drivers/tty/serial/8250/8250_pci.c: In function ‘pci_oxsemi_tornado_setup’: > > > drivers/tty/serial/8250/8250_pci.c:1258:32: warning: unsigned conversion from ‘int’ to ‘unsigned char’ changes value from ‘-129’ to ‘127’ [-Woverflow] > > > 1258 | up->mcr_mask = ~UART_MCR_CLKSEL; > > > | ^ > > > > > > > > > Can you fix this up and resend? > > > > I've seen that, but that's not a problem with my change, but rather with > > <linux/serial_reg.h> making this macro (and the remaining ones from this > > group) expand to a signed constant (0x80 rather than 0x80u). > > As your change causes it to show up, it must have something to do with > it :) Of course it does, but the problem comes from the data type signedness difference between the `mcr_mask' member of `struct uart_8250_port', the type of which is (rightfully IMO) `unsigned char' (rather than `char' or `signed char') and the UART_MCR_CLKSEL macro, which expands to a signed int. My change does not introduce this data type difference, hence it's not responsible for the problem, though it does expose it. > > I can fix the header, but that would be a separate change, and mind too > > that this is a user header, so it's not clear to me what the impact might > > be on user apps making use of it. > > You can not change the uapi header, why would you want to? To make the data type of the constants it defines such that they can be assigned to program entities they are supposed to be used with without changing the sign at truncation time? > > We could use a GCC pragma to suppress the warning temporarily across this > > piece of code, but it's not clear to me either what our policy has been on > > such approach. > > What pragma? #pragma GCC diagnostic ignored "-Woverflow" > > Thoughts? > > Why does your change cause this to show up? As I have noted above there is a data type signedness difference between `mcr_mask' and UART_MCR_CLKSEL. So we have the value of 0x80 (128). Once bitwise-complemented it becomes 0xffffff7f (-129). Once assigned to `mcr_mask' however it becomes 0x7f (127), which is considered an unsafe conversion between signed and unsigned integers by GCC, which is why the compiler complains about it. The same difference exists with say UART_MCR_OUT2 used in a similar manner for ALPHA_KLUDGE_MCR, but GCC does not get noisy about it because the constant UART_MCR_OUT2 expands to is 0x08 and therefore the position of the bit set there does not coincide with the sign bit once truncated to 8 bits, so the truncation does not cause a sign change. The same warning would trigger however if the constant were left-shifted by 4 before the bitwise complement operation, so all these constants should be unsigned. It does not make sense IMO to operate on signed values in the context of bit patterns for peripheral hardware registers. I'll find a way to paper it over if this is what is desired here, e.g. I guess this piece: up->mcr_mask = ~0; up->mcr_mask ^= UART_MCR_CLKSEL; will do, although I find it obscurer than my original proposal, and surely asking for a comment (which I think is a sign of a problem). Maciej
From: Maciej W. Rozycki > Sent: 15 June 2021 18:13 ... > As I have noted above there is a data type signedness difference between > `mcr_mask' and UART_MCR_CLKSEL. So we have the value of 0x80 (128). > Once bitwise-complemented it becomes 0xffffff7f (-129). Once assigned to > `mcr_mask' however it becomes 0x7f (127), which is considered an unsafe > conversion between signed and unsigned integers by GCC, which is why the > compiler complains about it. Blame the iso C standards committee for making integer promotions 'value preserving' instead of 'sign preserving' as they were in K&R C. Try using ^ 0xffu instead of ~. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, 15 Jun 2021, David Laight wrote: > > As I have noted above there is a data type signedness difference between > > `mcr_mask' and UART_MCR_CLKSEL. So we have the value of 0x80 (128). > > Once bitwise-complemented it becomes 0xffffff7f (-129). Once assigned to > > `mcr_mask' however it becomes 0x7f (127), which is considered an unsafe > > conversion between signed and unsigned integers by GCC, which is why the > > compiler complains about it. > > Blame the iso C standards committee for making integer promotions > 'value preserving' instead of 'sign preserving' as they were in K&R C. > > Try using ^ 0xffu instead of ~. Hmm, this is probably the least horrible way to paper it over, thanks. Even using a temporary variable (let alone a cast) does not help as GCC sees through it, and I've given up on converting <linux/serial_reg.h> to have e.g.: #define UART_MCR_CLKSEL _UL(0x80) /* Divide clock by 4 (TI16C752, EFR[4]=1) */ as I find that too messy with many of the comments wrapping up. And using a GCC pragma would require a messy compiler version check. I have posted an update with a path-of-least-resistance fix then along with the 4/4 as v2 of this series. Maciej