Message ID | CAMeyCbiVVTmpV8pazAPLtew0yQSYeb5ykeHzCs_A=nBUK7d1bg@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | rn5t618_wdt: WATCHDOG_NOWAYOUT not working | expand |
On 8/12/22 02:56, Kegl Rohit wrote: > Hello! > > Our board uses the RN5T567 as /dev/watchdog source via i2c communication. > RN5T567 is using the rn5t618_wdt.c driver > > Our kernel has CONFIG_WATCHDOG_NOWAYOUT=y enabled > > # Starting the wdt works as expected > echo -n '1' > /dev/watchdog > > # Stopping the wdt works as expected no reboot will be issued > echo -n 'V' > /dev/watchdog > > # Starting the wdt again will enable the wdt again > # BUT while the wdt is triggered every second the system reboots > while true; do echo -n '1' > /dev/watchdog; sleep 1; done > > Digging deeper into the issue I could find out that the remap is > initialized to cache the register accesses RN5T618_WATCHDOG (0x0b) and > RN5T618_PWRIRQ (0x13). > > So I expect that because of this caching the IRQ status bit was never reset. > The status register must not be cached because its set by the RN5T567. > Also it is not ideal to cache the access to the watchdog register > which resets the counter via read write cycle. > > debugfs shows the regmap setting for these registers: > [root@imx7d /sys/kernel/debug]# cat regmap/0-0033/access > // third column means volatile yes or no > … > 0b: y y n n > … > 13: y y n n > > After marking these registers volatile, stopping the wdt and starting > again seems to work. > > Furthermore it is not necessary to do a RN5T618_WATCHDOG read AND > write cycle to reset the wdt counter. > The source code states: > /* The counter is restarted after a R/W access to watchdog register */ > > The RN5T567 datasheet states: > “The count value of watchdog timer is cleared by accessing (R/W) to > this register.” > > Tests showed that a single read is enough. I did not check other chip > variants which use the same driver. > > In my opinion a write cycle is even dangerous if there is some strange > situation and the write cycle disables the wdt or changes the wdt > settings stored in this register. > > I still don't know why the irq status bit is cleared on every ping() > but I kept it there. > Attached is my patch tested with RN5T567. It is sent as attachment and thus unusable. In addition to that, it is not a proper patch. > Is the rn5t618_wdt.c driver maintained? Strange that such issue was > never noticed. I'd rather assume that your usage pattern is one that no one ever tried before because it is quite unusual. Why enable NOWAYOUT just to violate it continuously ? Anyway, I think you are a bit off track here. The register 0x13 should probably be volatile. However, register 0x0b is not updated by the chip and thus does not need to be volatile. Replacing the various regmap_update_bits() with regmap_write() is just papering over the problem. To force the write to the chip even if nothing changed, regmap_write_bits() should be used instead of regmap_update_bits() where appropriate. That is probably also the core of the problem: rn5t618_wdt_start() calls regmap_update_bits(). Since no bit was changed, there is actually no write to the chip. Ultimately this causes a timeout since the watchdog is never stopped and the counter is never reset. To avoid that and to make sure that the counter is reset, it is probably sufficient to replace the regmap_update_bits() to enable the watchdog with regmap_write_bits(). The read operation in rn5t618_wdt_ping() doesn't really do anything because the register value is cached. The read only happens to have a value to write. I can not follow your logic that that write could somehow be dangerous. It writes the value that is expected for the register. There is nothing dangerous about it. If there was indeed a problem, you would have to provide specifics; you can not just make such a claim without basis in fact. The interrupt status register at 0x13 should probably be declared volatile to avoid "loss" of interrupt status for other drivers. Clearing the interrupt bit is only necessary to ensure that an expired interrupt is caught if the 'ping' comes late. The same is true for the start function. In normal operation it should not be needed. Either case, if you want your code to be considered, you'll have to split it into two patches, one for mfd and one for watchdog, and you'll have to submit proper patches as outlined in Documentation/process/submitting-patches.rst. Thanks, Guenter
Thanks for clarifying some parts. > I'd rather assume that your usage pattern is one that no one ever tried > before because it is quite unusual. Why enable NOWAYOUT just to violate > it continuously ? Why violate? NOWAYOUT should be a pretty straight forward kernel config... When starting an in-place firmware update its quite useful to stop the wdt before terminating all applications. After the upgrade is done the applications and wdt are restarted. > Anyway, I think you are a bit off track here. > The register 0x13 should probably be volatile. > However, register 0x0b is not updated by the chip and thus does not need to be volatile. That right, but if regmap_read and regmap_write is cached then it would be necessary to cause some read or write. I did not find documentation about the regmap caching. I am new to regmap, therefore i asked for others input. For example this part: /* The counter is restarted after a R/W access to watchdog register */ ret = regmap_read(wdt->rn5t618->regmap, RN5T618_WATCHDOG, &val); if (ret) return ret; ret = regmap_write(wdt->rn5t618->regmap, RN5T618_WATCHDOG, val); Will regmap_read always bypass the cache? Will regmap_write always bypass the cache? Then it is not needed to set 0x0b volatile. > Replacing the various regmap_update_bits() with regmap_write() is just > papering over the problem. To force the write to the chip even if nothing > changed, regmap_write_bits() should be used instead of regmap_update_bits() > where appropriate. Will regmap_write_bits always bypass the cache? If 0x13 is set volatile regmap_write_bits will always execute an I2C read and write transmission? This I2C read would be overhead in each ping(). The datasheet says an IRQ can be reset by simply writing '0' to the corresponding bit. Writing '1' to any other bits has no effect. So I used regmap_write(wdt->rn5t618->regmap, RN5T618_PWRIRQ, ~RN5T618_PWRIRQ_IR_WDOG) to clear only the wdt irq bit and save a i2c read transmission in each ping(). > That is probably also the core of the problem: rn5t618_wdt_start() calls > regmap_update_bits(). Since no bit was changed, there is actually no write > to the chip. Ultimately this causes a timeout since the watchdog is never > stopped and the counter is never reset. To avoid that and to make > sure that the counter is reset, it is probably sufficient to replace the > regmap_update_bits() to enable the watchdog with regmap_write_bits(). You meant regmap_update_bits(wdt->rn5t618->regmap, RN5T618_PWRIRQ (0x13), RN5T618_PWRIRQ_IR_WDOG, 0) in rn5t618_wdt_ping()? No other function will reset the wdt irq status bit. And no other function accesses the volatile RN5T618_PWRIRQ (0x13) register. So regmap_update_bits RN5T618_PWRIREN (not RN5T618_PWRIRQ 0x13!) usage in rn5t618_wdt_start is fine for me and the watchdog was stopped as I stated! > The read operation in rn5t618_wdt_ping() doesn't really do anything > because the register value is cached. The read only happens to have > a value to write. So regmap_read(wdt->rn5t618->regmap, RN5T618_WATCHDOG, &val) is cached? If it is cached and regmap_write(wdt->rn5t618->regmap, RN5T618_WATCHDOG, val) is also cached then the counter will never be reset. But the ping is working and so one of them or both must be non cached. > I can not follow your logic that that write could > somehow be dangerous. It writes the value that is expected for the > register. There is nothing dangerous about it. If there was indeed > a problem, you would have to provide specifics; you can not just make > such a claim without basis in fact. My main point is the unnecessary additional I2C write or read transmission. A single read or write of the register should be enough. Dangerous is maybe somewhat overcautious. I2C is not the most reliable thing, so if the write went wrong maybe the wdt will be disabled without knowing. So a read (+checking the enable bit) is in my opinion more defensive than a read+write with zero checks. Only i2c transfer return code is checked, not sure how reliable this is. > The interrupt status register at 0x13 should probably be declared volatile > to avoid "loss" of interrupt status for other drivers. Clearing the interrupt > bit is only necessary to ensure that an expired interrupt is caught if the > 'ping' comes late. The same is true for the start function. In normal operation > it should not be needed. Ok, I have the same thought. The datasheet says something about an additional 1 second until the wdt really fires. And maybe if the irq is cleared before this additional second is expired the pending wdt reset is cleared. So it makes sense to reset the irq status in every ping to gain some extra time e.g. if the wdt timeout is set to the minimum of 1 second. > Either case, if you want your code to be considered, you'll have to split > it into two patches, one for mfd and one for watchdog, and you'll have to submit > proper patches as outlined in Documentation/process/submitting-patches.rst. As I am not an expert and do not want to break some foreign code, I only wanted to share my findings and get feedback if they are wrong. It would be great if it gets fixed for everyone. There are some reports online which are maybe facing the same issue, but could not find the real cause. On Sat, Aug 13, 2022 at 12:08 AM Guenter Roeck <linux@roeck-us.net> wrote: > > On 8/12/22 02:56, Kegl Rohit wrote: > > Hello! > > > > Our board uses the RN5T567 as /dev/watchdog source via i2c communication. > > RN5T567 is using the rn5t618_wdt.c driver > > > > Our kernel has CONFIG_WATCHDOG_NOWAYOUT=y enabled > > > > # Starting the wdt works as expected > > echo -n '1' > /dev/watchdog > > > > # Stopping the wdt works as expected no reboot will be issued > > echo -n 'V' > /dev/watchdog > > > > # Starting the wdt again will enable the wdt again > > # BUT while the wdt is triggered every second the system reboots > > while true; do echo -n '1' > /dev/watchdog; sleep 1; done > > > > Digging deeper into the issue I could find out that the remap is > > initialized to cache the register accesses RN5T618_WATCHDOG (0x0b) and > > RN5T618_PWRIRQ (0x13). > > > > So I expect that because of this caching the IRQ status bit was never reset. > > The status register must not be cached because its set by the RN5T567. > > Also it is not ideal to cache the access to the watchdog register > > which resets the counter via read write cycle. > > > > debugfs shows the regmap setting for these registers: > > [root@imx7d /sys/kernel/debug]# cat regmap/0-0033/access > > // third column means volatile yes or no > > … > > 0b: y y n n > > … > > 13: y y n n > > > > After marking these registers volatile, stopping the wdt and starting > > again seems to work. > > > > Furthermore it is not necessary to do a RN5T618_WATCHDOG read AND > > write cycle to reset the wdt counter. > > The source code states: > > /* The counter is restarted after a R/W access to watchdog register */ > > > > The RN5T567 datasheet states: > > “The count value of watchdog timer is cleared by accessing (R/W) to > > this register.” > > > > Tests showed that a single read is enough. I did not check other chip > > variants which use the same driver. > > > > In my opinion a write cycle is even dangerous if there is some strange > > situation and the write cycle disables the wdt or changes the wdt > > settings stored in this register. > > > > I still don't know why the irq status bit is cleared on every ping() > > but I kept it there. > > Attached is my patch tested with RN5T567. > > It is sent as attachment and thus unusable. In addition to that, it is > not a proper patch. > > > Is the rn5t618_wdt.c driver maintained? Strange that such issue was > > never noticed. > > I'd rather assume that your usage pattern is one that no one ever tried > before because it is quite unusual. Why enable NOWAYOUT just to violate > it continuously ? > > Anyway, I think you are a bit off track here. > > The register 0x13 should probably be volatile. However, register 0x0b > is not updated by the chip and thus does not need to be volatile. > > Replacing the various regmap_update_bits() with regmap_write() is just > papering over the problem. To force the write to the chip even if nothing > changed, regmap_write_bits() should be used instead of regmap_update_bits() > where appropriate. > That is probably also the core of the problem: rn5t618_wdt_start() calls > regmap_update_bits(). Since no bit was changed, there is actually no write > to the chip. Ultimately this causes a timeout since the watchdog is never > stopped and the counter is never reset. To avoid that and to make > sure that the counter is reset, it is probably sufficient to replace the > regmap_update_bits() to enable the watchdog with regmap_write_bits(). > > The read operation in rn5t618_wdt_ping() doesn't really do anything > because the register value is cached. The read only happens to have > a value to write. I can not follow your logic that that write could > somehow be dangerous. It writes the value that is expected for the > register. There is nothing dangerous about it. If there was indeed > a problem, you would have to provide specifics; you can not just make > such a claim without basis in fact. > > The interrupt status register at 0x13 should probably be declared volatile > to avoid "loss" of interrupt status for other drivers. Clearing the interrupt > bit is only necessary to ensure that an expired interrupt is caught if the > 'ping' comes late. The same is true for the start function. In normal operation > it should not be needed. > > Either case, if you want your code to be considered, you'll have to split > it into two patches, one for mfd and one for watchdog, and you'll have to submit > proper patches as outlined in Documentation/process/submitting-patches.rst. > > Thanks, > Guenter
diff --git a/drivers/mfd/rn5t618.c b/drivers/mfd/rn5t618.c index 652a5e60067f..b414abcdfecb 100644 --- a/drivers/mfd/rn5t618.c +++ b/drivers/mfd/rn5t618.c @@ -47,8 +47,10 @@ static bool rn5t618_volatile_reg(struct device *dev, unsigned int reg) case RN5T618_RTC_SECONDS ... RN5T618_RTC_YEAR: case RN5T618_CHGSTATE: case RN5T618_CHGCTRL_IRR ... RN5T618_CHGERR_MONI: case RN5T618_CONTROL ... RN5T618_CC_AVEREG0: + case RN5T618_WATCHDOG: // should not be cached because of r/w to reset counter + case RN5T618_PWRIRQ: // should not be cached because its set by hardware return true; default: return false; } diff --git a/drivers/watchdog/rn5t618_wdt.c b/drivers/watchdog/rn5t618_wdt.c index 6e524c8e26a8..83d3f006f812 100644 --- a/drivers/watchdog/rn5t618_wdt.c +++ b/drivers/watchdog/rn5t618_wdt.c @@ -75,8 +75,13 @@ static int rn5t618_wdt_start(struct watchdog_device *wdt_dev) ret = rn5t618_wdt_set_timeout(wdt_dev, wdt_dev->timeout); if (ret) return ret; + /* Clear pending watchdog interrupt */ + ret = regmap_write(wdt->rn5t618->regmap, RN5T618_PWRIRQ, ~RN5T618_PWRIRQ_IR_WDOG); + if (ret) + return ret; + /* enable repower-on */ ret = regmap_update_bits(wdt->rn5t618->regmap, RN5T618_REPCNT, RN5T618_REPCNT_REPWRON, RN5T618_REPCNT_REPWRON); @@ -114,15 +119,10 @@ static int rn5t618_wdt_ping(struct watchdog_device *wdt_dev) ret = regmap_read(wdt->rn5t618->regmap, RN5T618_WATCHDOG, &val); if (ret) return ret; - ret = regmap_write(wdt->rn5t618->regmap, RN5T618_WATCHDOG, val); - if (ret) - return ret; - /* Clear pending watchdog interrupt */ - return regmap_update_bits(wdt->rn5t618->regmap, RN5T618_PWRIRQ, - RN5T618_PWRIRQ_IR_WDOG, 0); + return regmap_write(wdt->rn5t618->regmap, RN5T618_PWRIRQ, ~RN5T618_PWRIRQ_IR_WDOG); } static const struct watchdog_info rn5t618_wdt_info = { .options = WDIOF_SETTIMEOUT | WDIOF_MAGICCLOSE | -- 2.30.2