Message ID | 20120912033209.GA32156@lizard |
---|---|
State | New |
Headers | show |
On Tue, Sep 11, 2012 at 8:32 PM, Anton Vorontsov <anton.vorontsov@linaro.org> wrote: > On Tue, Sep 11, 2012 at 03:15:40PM +0100, Alan Cox wrote: >> Anton Vorontsov <anton.vorontsov@linaro.org> wrote: >> > This patch implements a new callback: clear_irqs. It is used for the >> >> This bit I still really don't like. I would like to know what the generic >> IRQ folks thing about it and if Thomas Gleixner has any brilliant ideas >> here. I don't think its a show stopper it would just be nice if there was >> a better solution first. > > Yup, good idea, Cc'ing. > > Hello Thomas, > > We're dissussing a patch that adds a clear_irq callback into UART > drivers. For convenience, the particular patch is inlined at the end of > this email. The rationale and the background for the whole thing can be > found here: http://lkml.org/lkml/2012/9/10/2 > > So, just for visual clearness, and for the fun of it, here is some > glorious ascii art of what we have: > > ,---NMI-|`````| > UART_IRQ---INT_controller | CPU | > `---IRQ-|,,,,,| > > Pretty much standard scheme. That is, on the interrupt controller level > we can reroute any IRQ to NMI, and back in 2008 folks at Google found > that rerouting the UART IRQ to NMI brings some really cool features: we > can have a very reliable and powerful debugger pretty much on every > embedded machine, and without loosing the UART/console port itself. > > So, it works like this: > > - At boot time, Linux arch/ code reroutes UART IRQ to NMI, we connect > the port to the KGDBoC, and so forth...; > - User starts typing on the serial port; > - UART raises its IRQ line; > - Through the controller, one of CPUs gets an NMI; > - In NMI context, CPU reads a character from UART; > - Then it checks if the character resulted into the special 'enter > KGDB' magic sequence: > - If yes, then the CPU invites other CPUs into the KGDB, and thus > kernel enters the debugger; > - If the character wasn't part of the magic command (or the sequence is > yet incomplete), then CPU exits NMI and continues as normal. > > The "problem" is in the last step. If we exit NMI without making UART > know that we're done with the interrupt, we will reenter the NMI > immediately, even without any new characters from the UART. The UART irq line should go low when you read the character out of the receive buffer, or the polling rx function should clear the interrupt for you. If you use a clear_irqs callback, you can drop characters if one arrives between the last character buffer read and calling clear_irqs. > The obvious solution would be to "mask/reroute NMI at INT_controller > level or queue serial port's IRQ routine from somewhere, e.g. a tasklet > or software raised IRQ". Unfortunately, this means that we have to keep > NMI disabled for this long: > > 1. We exit the NMI context with NMI source disabled/rerouted; > 2. CPU continues to execute the kernel; > 3. Kernel receives a timer interrupt, or software-raised interrupt, or > UART IRQ, which was temporary rerouted back to normal interrupts; > 4. It executes normal IRQ-entry, tracing, lots of other stuff, > interrupt handlers, softirq handlers, and thus we clear the UART > interrupt; > 5. Once the UART is cleared, we reenable NMI (in the arch-code, we can > do that in our do_IRQ()); > > As you can see, with this solution the NMI debugger will be effectively > disabled from 1. to 5., thus shall the hang happen in this sensitive > code, we would no longer able to debug it. > > And this defeats the main purpose of the NMI debugger: we must keep NMI > enabled all the time when we're not in the debugger, the NMI debugger > is always available (unless the debugger crashed :-) > > That's why I came with the clear_irq callback in the serial drivers > that we call from the NMI context, it's much simpler and keeps the > debugger robust. So, personally I too can't think of any other > plausible solution that would keep all the features intact. > > > Thanks, > > Anton. > > > - - - - > [PATCH] tty/serial/kgdboc: Add and wire up clear_irqs callback > > This patch implements a new callback: clear_irqs. It is used for the > cases when KDB-entry (e.g. NMI) and KDB IO (e.g. serial port) shares the > same interrupt. To get the idea, let's take some real example (ARM > machine): we have a serial port which interrupt is routed to an NMI, and > the interrupt is used to enter KDB. Once there is some activity on the > serial port, the CPU receives NMI exception, and we fall into KDB shell. > So, it is our "debug console", and it is able to interrupt (and thus > debug) even IRQ handlers themselves. > > When used that way, the interrupt never reaches serial driver's IRQ > handler routine, which means that serial driver will not silence the > interrupt. NMIs behaviour are quite arch-specific, and we can't assume > that we can use them as ordinary IRQs, e.g. on some arches (like ARM) we > can't handle data aborts, the behaviour is undefined then. So we can't > just handle execution to serial driver's IRQ handler from the NMI > context once we're done with KDB (plus this would defeat the debugger's > purpose: we want the NMI handler be as simple as possible, so it will > have less chances to hang). > > So, given that have to deal with it somehow, we have two options: > > 1. Implement something that clears the interrupt; 2. Implement a whole > new concept of grabbing tty for exclusive KDB use, plus implement > mask/unmask callbacks, i.e.: > - Since consoles might use ttys w/o opending them, we would have to > make kdb respect CON_ENABLED flag (maybe a good idea to do it > anyway); > - Add 'bool exclusive' argument to tty_find_polling_driver(), if set > to 1, the function will refuse to return an already opened tty; and > will use the flag in tty_reopen() to not allow multiple users > (there are already checks for pty masters, which are "open once" > ttys); > - Once we got the tty exclusively, we would need to call some new > uart->mask_all_but_rx_interrupts call before we want to use the > port for NMI/KDB, and unmask_all_but_rx_interrupts after we're done > with it. > > The second option is obviously more complex, needlessly so, and less > generic. So I went with the first one: we just consume all the > interrupts. The tty becomes silently unusable for the rest of the world > when we use it with KDB; but once we reroute the serial IRQ source back > from NMI to an ordinary IRQ (in KDB this can be done with 'disable_nmi' > command), it will behave as normal. > > p.s. Since the callback is so far used only by polling users, we place > it under the appropriate #ifdef. > > Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> > --- > drivers/tty/serial/kgdboc.c | 10 ++++++++++ > drivers/tty/serial/serial_core.c | 15 +++++++++++++++ > include/linux/kgdb.h | 1 + > include/linux/serial_core.h | 1 + > include/linux/tty_driver.h | 1 + > 5 files changed, 28 insertions(+) > > diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c > index 2b42a01..0aa08c8 100644 > --- a/drivers/tty/serial/kgdboc.c > +++ b/drivers/tty/serial/kgdboc.c > @@ -227,6 +227,15 @@ static int kgdboc_get_char(void) > kgdb_tty_line); > } > > +static void kgdboc_clear_irqs(void) > +{ > + if (!kgdb_tty_driver) > + return; > + if (kgdb_tty_driver->ops->clear_irqs) > + kgdb_tty_driver->ops->clear_irqs(kgdb_tty_driver, > + kgdb_tty_line); > +} > + > static void kgdboc_put_char(u8 chr) > { > if (!kgdb_tty_driver) > @@ -298,6 +307,7 @@ static struct kgdb_io kgdboc_io_ops = { > .name = "kgdboc", > .read_char = kgdboc_get_char, > .write_char = kgdboc_put_char, > + .clear_irqs = kgdboc_clear_irqs, > .pre_exception = kgdboc_pre_exp_handler, > .post_exception = kgdboc_post_exp_handler, > }; > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c > index dcb2d5a..93c36cb 100644 > --- a/drivers/tty/serial/serial_core.c > +++ b/drivers/tty/serial/serial_core.c > @@ -2187,6 +2187,20 @@ static void uart_poll_put_char(struct tty_driver *driver, int line, char ch) > port = state->uart_port; > port->ops->poll_put_char(port, ch); > } > + > +static void uart_clear_irqs(struct tty_driver *driver, int line) > +{ > + struct uart_driver *drv = driver->driver_state; > + struct uart_state *state = drv->state + line; > + struct uart_port *port; > + > + if (!state || !state->uart_port) > + return; > + > + port = state->uart_port; > + if (port->ops->clear_irqs) > + port->ops->clear_irqs(port); > +} > #endif > > static const struct tty_operations uart_ops = { > @@ -2219,6 +2233,7 @@ static const struct tty_operations uart_ops = { > .poll_init = uart_poll_init, > .poll_get_char = uart_poll_get_char, > .poll_put_char = uart_poll_put_char, > + .clear_irqs = uart_clear_irqs, > #endif > }; > > diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h > index 3b111a6..1fd1cf0 100644 > --- a/include/linux/kgdb.h > +++ b/include/linux/kgdb.h > @@ -295,6 +295,7 @@ struct kgdb_io { > const char *name; > int (*read_char) (void); > void (*write_char) (u8); > + void (*clear_irqs) (void); > void (*flush) (void); > int (*init) (void); > void (*pre_exception) (void); > diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h > index 822c887..855fb6e 100644 > --- a/include/linux/serial_core.h > +++ b/include/linux/serial_core.h > @@ -277,6 +277,7 @@ struct uart_ops { > int (*poll_init)(struct uart_port *); > void (*poll_put_char)(struct uart_port *, unsigned char); > int (*poll_get_char)(struct uart_port *); > + void (*clear_irqs)(struct uart_port *); > #endif > }; > > diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h > index dd976cf..42f8a87 100644 > --- a/include/linux/tty_driver.h > +++ b/include/linux/tty_driver.h > @@ -282,6 +282,7 @@ struct tty_operations { > int (*poll_init)(struct tty_driver *driver, int line, char *options); > int (*poll_get_char)(struct tty_driver *driver, int line); > void (*poll_put_char)(struct tty_driver *driver, int line, char ch); > + void (*clear_irqs)(struct tty_driver *driver, int line); > #endif > const struct file_operations *proc_fops; > }; > -- > 1.7.11.5
On Tue, Sep 11, 2012 at 08:42:46PM -0700, Colin Cross wrote: [...] > > The "problem" is in the last step. If we exit NMI without making UART > > know that we're done with the interrupt, we will reenter the NMI > > immediately, even without any new characters from the UART. > > The UART irq line should go low when you read the character out of the Probably some controllers may lower the line by themselves, but not all, and probably most of them need an explicit clear. > receive buffer, or the polling rx function should clear the interrupt > for you. Yes, that's an option. But that way we add a new semantic for the polling routines, and effecitvely we just merge the two callbacks. Of course, if Alan is OK with this, I'm more than OK too. :-) (But the polling routines would need to clear all interrupts, not just rx/tx. For example, if the controller indicated some error, and nobody clears it, then we'll start reentering infinitely.) > If you use a clear_irqs callback, you can drop characters if > one arrives between the last character buffer read and calling > clear_irqs. Only if we call clear_irqs() after reading the characters, but we do it before. So if new characters are available, we will reenter NMI, which is OK. But if used incorrectly, it truly can cause dropping (or staling) of characters, so I'd better add some comments about this. Thanks! Anton.
On Tue, Sep 11, 2012 at 9:06 PM, Anton Vorontsov <anton.vorontsov@linaro.org> wrote: > On Tue, Sep 11, 2012 at 08:42:46PM -0700, Colin Cross wrote: > [...] >> > The "problem" is in the last step. If we exit NMI without making UART >> > know that we're done with the interrupt, we will reenter the NMI >> > immediately, even without any new characters from the UART. >> >> The UART irq line should go low when you read the character out of the > > Probably some controllers may lower the line by themselves, but not > all, and probably most of them need an explicit clear. Anything 8250-based will clear the interrupt automatically, assuming you read the status registers as well as the character register. >> receive buffer, or the polling rx function should clear the interrupt >> for you. > > Yes, that's an option. But that way we add a new semantic for the > polling routines, and effecitvely we just merge the two callbacks. > > Of course, if Alan is OK with this, I'm more than OK too. :-) > > (But the polling routines would need to clear all interrupts, not > just rx/tx. For example, if the controller indicated some error, and > nobody clears it, then we'll start reentering infinitely.) For exynos5, the only non-8250 based serial port I've come across, we clear all interrupts in the rx poll function (see https://android.googlesource.com/kernel/exynos/+/ef427aafffb7153dde59745e440fd7ec41ea969d/arch/arm/mach-exynos/exynos_fiq_debugger.c). >> If you use a clear_irqs callback, you can drop characters if >> one arrives between the last character buffer read and calling >> clear_irqs. > > Only if we call clear_irqs() after reading the characters, but we do > it before. So if new characters are available, we will reenter NMI, > which is OK. > > But if used incorrectly, it truly can cause dropping (or staling) of > characters, so I'd better add some comments about this. What does clear_irqs() mean for a status or tx interrupt? The tx interrupt will generally re-assert as long as the tx fifo is empty, which would require disabling it. On 8250 ports, status interrupts will re-assert until the corresponding status register is read.
On Tue, Sep 11, 2012 at 09:40:35PM -0700, Colin Cross wrote: [..] > >> the polling rx function should clear the interrupt > >> for you. > > > > Yes, that's an option. But that way we add a new semantic for the > > polling routines, and effecitvely we just merge the two callbacks. > > > > Of course, if Alan is OK with this, I'm more than OK too. :-) > > > > (But the polling routines would need to clear all interrupts, not > > just rx/tx. For example, if the controller indicated some error, and > > nobody clears it, then we'll start reentering infinitely.) > > For exynos5, the only non-8250 based serial port I've come across, we > clear all interrupts in the rx poll function (see > https://android.googlesource.com/kernel/exynos/+/ef427aafffb7153dde59745e440fd7ec41ea969d/arch/arm/mach-exynos/exynos_fiq_debugger.c). Yes, but if you'd like to merge your code, some might ask you: why? You'd answer that you need to clear the interrupts, otherwise you'll keep reentering NMI. The next that you might get is this: "this does not belong to the getc callback, it's better to factor it out". :-) And here comes clear_irq() (or alike, see below). > >> If you use a clear_irqs callback, you can drop characters if > >> one arrives between the last character buffer read and calling > >> clear_irqs. > > > > Only if we call clear_irqs() after reading the characters, but we do > > it before. So if new characters are available, we will reenter NMI, > > which is OK. > > > > But if used incorrectly, it truly can cause dropping (or staling) of > > characters, so I'd better add some comments about this. > > What does clear_irqs() mean for a status or tx interrupt? The tx > interrupt will generally re-assert as long as the tx fifo is empty, > which would require disabling it. Yup, and that's exactly what we do: http://lkml.org/lkml/2012/9/11/119 Your words made me think that clear_irq() might be indeed a somewhat inappropriate name. We have to be even stricter on its definition and behaviour. So, returning to your question "What does clear_irqs() mean", I'd answer that: the function must do whatever needed to lower the IRQ line, plus the function must leave the port in the state that it's still able to throw RX interrupts after the call. So, the 100% proper name for this function would be this: quiesce_irqs_but_rx() It's a bit long, but does exactly what the name states. Thanks! Anton.
> Of course, if Alan is OK with this, I'm more than OK too. :-) It may well be better. > (But the polling routines would need to clear all interrupts, not > just rx/tx. For example, if the controller indicated some error, and > nobody clears it, then we'll start reentering infinitely.) For a lot of devices and platforms you'd probably mask them instead ? > > > If you use a clear_irqs callback, you can drop characters if > > one arrives between the last character buffer read and calling > > clear_irqs. > > Only if we call clear_irqs() after reading the characters, but we do > it before. So if new characters are available, we will reenter NMI, > which is OK. Recursively or not... again you get platform specific magic in places we don't want. In the driver poll method for most uarts is going to be at least buried in what is usually arch specific uarts.
On Wed, Sep 12, 2012 at 10:44:20AM +0100, Alan Cox wrote: > > Of course, if Alan is OK with this, I'm more than OK too. :-) > > It may well be better. > > > (But the polling routines would need to clear all interrupts, not > > just rx/tx. For example, if the controller indicated some error, and > > nobody clears it, then we'll start reentering infinitely.) > > For a lot of devices and platforms you'd probably mask them instead ? If there is no way to clear them, yes, we obviously would want to mask them before using the port for NMI debugger. Then we'd need three callbacks: - mask_all_irqs_but_rx() -- used before we want to start using the port for the NMI debugger; - clear_rx_irq() -- (optional) clears rx IRQ for controllers that need it; - restore_irqs() -- unmasks interrupts that were previously masked. If we ever encounter a case when just clearing interrupts doesn't work, we can surely implement the above scheme... It's just so far I don't see any need to over-design this, but again, it's your call, I told my opinion on this, but I'll do whatever you guys like more. :-) > > > If you use a clear_irqs callback, you can drop characters if > > > one arrives between the last character buffer read and calling > > > clear_irqs. > > > > Only if we call clear_irqs() after reading the characters, but we do > > it before. So if new characters are available, we will reenter NMI, > > which is OK. > > Recursively or not... again you get platform specific magic in places > we don't want. I really really don't see how this is platform-specific. All we ask the serial driver is to quiesce its interrupt. Whether we can handle NMIs/IRQs recursively or not is not serial driver's worry, since its IRQ handler is not going to fire anyway. The polling routines already gave us the power to steal/inject the data, so now we're stealing the interrupt too. How we use the callback is indeed platform-specific, but so is the whole KGDB, and that knowledge is hidden there. For serial driver it's all pretty much clear: lower the interrupt, but don't turn off rx detection. Thanks! Anton.
diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c index 2b42a01..0aa08c8 100644 --- a/drivers/tty/serial/kgdboc.c +++ b/drivers/tty/serial/kgdboc.c @@ -227,6 +227,15 @@ static int kgdboc_get_char(void) kgdb_tty_line); } +static void kgdboc_clear_irqs(void) +{ + if (!kgdb_tty_driver) + return; + if (kgdb_tty_driver->ops->clear_irqs) + kgdb_tty_driver->ops->clear_irqs(kgdb_tty_driver, + kgdb_tty_line); +} + static void kgdboc_put_char(u8 chr) { if (!kgdb_tty_driver) @@ -298,6 +307,7 @@ static struct kgdb_io kgdboc_io_ops = { .name = "kgdboc", .read_char = kgdboc_get_char, .write_char = kgdboc_put_char, + .clear_irqs = kgdboc_clear_irqs, .pre_exception = kgdboc_pre_exp_handler, .post_exception = kgdboc_post_exp_handler, }; diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index dcb2d5a..93c36cb 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -2187,6 +2187,20 @@ static void uart_poll_put_char(struct tty_driver *driver, int line, char ch) port = state->uart_port; port->ops->poll_put_char(port, ch); } + +static void uart_clear_irqs(struct tty_driver *driver, int line) +{ + struct uart_driver *drv = driver->driver_state; + struct uart_state *state = drv->state + line; + struct uart_port *port; + + if (!state || !state->uart_port) + return; + + port = state->uart_port; + if (port->ops->clear_irqs) + port->ops->clear_irqs(port); +} #endif static const struct tty_operations uart_ops = { @@ -2219,6 +2233,7 @@ static const struct tty_operations uart_ops = { .poll_init = uart_poll_init, .poll_get_char = uart_poll_get_char, .poll_put_char = uart_poll_put_char, + .clear_irqs = uart_clear_irqs, #endif }; diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h index 3b111a6..1fd1cf0 100644 --- a/include/linux/kgdb.h +++ b/include/linux/kgdb.h @@ -295,6 +295,7 @@ struct kgdb_io { const char *name; int (*read_char) (void); void (*write_char) (u8); + void (*clear_irqs) (void); void (*flush) (void); int (*init) (void); void (*pre_exception) (void); diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 822c887..855fb6e 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -277,6 +277,7 @@ struct uart_ops { int (*poll_init)(struct uart_port *); void (*poll_put_char)(struct uart_port *, unsigned char); int (*poll_get_char)(struct uart_port *); + void (*clear_irqs)(struct uart_port *); #endif }; diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h index dd976cf..42f8a87 100644 --- a/include/linux/tty_driver.h +++ b/include/linux/tty_driver.h @@ -282,6 +282,7 @@ struct tty_operations { int (*poll_init)(struct tty_driver *driver, int line, char *options); int (*poll_get_char)(struct tty_driver *driver, int line); void (*poll_put_char)(struct tty_driver *driver, int line, char ch); + void (*clear_irqs)(struct tty_driver *driver, int line); #endif const struct file_operations *proc_fops; };