Message ID | 1402906147-26596-1-git-send-email-ulf.hansson@linaro.org |
---|---|
State | Accepted |
Commit | 7878289b269d41c8e611aa6d4519feae706e49f3 |
Headers | show |
On Mon, Jun 16, 2014 at 1:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue > when using the ARM model of the PL181 and running QEMU. > > The bug was reported for the following QEMU version: > $ qemu-system-arm -version > QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright > (c) 2003-2008 Fabrice Bellard > > To resolve the problem, let's restore the old behavior were the DATA > irq is handled prior the CMD irq, but only for the arm_variant, which > the problem was reported for. > > Reported-by: John Stultz <john.stultz@linaro.org> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Russell King <linux@arm.linux.org.uk> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Hey Ulf, Thanks for sending these out (though I wasn't cc'ed on this 3'rd patch). The first two apply for me, but this one doesn't. What should I be basing this off of? thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 16 June 2014 20:58, John Stultz <john.stultz@linaro.org> wrote: > On Mon, Jun 16, 2014 at 1:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: >> Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue >> when using the ARM model of the PL181 and running QEMU. >> >> The bug was reported for the following QEMU version: >> $ qemu-system-arm -version >> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright >> (c) 2003-2008 Fabrice Bellard >> >> To resolve the problem, let's restore the old behavior were the DATA >> irq is handled prior the CMD irq, but only for the arm_variant, which >> the problem was reported for. >> >> Reported-by: John Stultz <john.stultz@linaro.org> >> Cc: Peter Maydell <peter.maydell@linaro.org> >> Cc: Russell King <linux@arm.linux.org.uk> >> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> > > Hey Ulf, > Thanks for sending these out (though I wasn't cc'ed on this 3'rd > patch). The first two apply for me, but this one doesn't. What should > I be basing this off of? Hi John, Thanks for helping out testing! This patch based upon my latest mmc tree and the next branch. I tried to apply it for 3.15, and I think you will be able resolve the conflict - I should be quite trivial. Patch 1 and 2 applies on 3.15 as well. So I guess testing this on 3.15 will provide us with answers if it solves our problem. Kind regards Uffe -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > This patch based upon my latest mmc tree and the next branch. I tried > to apply it for 3.15, and I think you will be able resolve the > conflict - I should be quite trivial. No worries. I just didn't want to waste time resolving it if it was logically dependent on some other change. I'll give it a shot and get back to you. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote: > On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: >> This patch based upon my latest mmc tree and the next branch. I tried >> to apply it for 3.15, and I think you will be able resolve the >> conflict - I should be quite trivial. > > No worries. I just didn't want to waste time resolving it if it was > logically dependent on some other change. > > I'll give it a shot and get back to you. So unfortunately I'm still seeing trouble.. [ 94.202843] EXT4-fs error (device mmcblk0p5): ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in gd; block bitmap corrupt. [ 94.203873] Aborting journal on device mmcblk0p5-8. [ 94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5): panic forced after error [ 94.206553] [ 94.207420] CPU: 0 PID: 1 Comm: init Not tainted 3.15.0-00002-g044f37a-dirty #589 [ 94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>] (show_stack+0x11/0x14) [ 94.208835] [<c000f3f1>] (show_stack) from [<c042d599>] (dump_stack+0x59/0x7c) [ 94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178) [ 94.209724] [<c042a57f>] (panic) from [<c0135055>] (ext4_handle_error+0x69/0x74) [ 94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>] (__ext4_grp_locked_error+0x6b/0x160) [ 94.210747] [<c01358db>] (__ext4_grp_locked_error) from [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c) [ 94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>] (ext4_mb_init_cache+0x219/0x4e0) [ 94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>] (ext4_mb_init_group+0xbb/0x13c) [ 94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>] (ext4_mb_good_group+0xf3/0xfc) [ 94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>] (ext4_mb_regular_allocator+0x153/0x2c4) [ 94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4) [ 94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>] (ext4_ext_map_blocks+0x965/0x10f0) [ 94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>] (ext4_map_blocks+0xff/0x374) [ 94.219200] [<c0126839>] (mpage_map_and_submit_extent) from [<c0127049>] (ext4_writepages+0x2b9/0x4e8) [ 94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>] (do_writepages+0x19/0x28) [ 94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>] (__filemap_fdatawrite_range+0x3d/0x44) [ 94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from [<c008cc3f>] (filemap_flush+0x23/0x28) [ 94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>] (ext4_rename+0x2f9/0x3e4) [ 94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>] (vfs_rename+0x183/0x45c) [ 94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>] (SyS_renameat2+0x22b/0x26c) [ 94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>] (SyS_rename+0x1f/0x24) [ 94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>] (ret_fast_syscall+0x1/0x5c) That said, this mirrors the behavior when I was reverting your change by hand on-top of 3.15. While git bisect pointed to your patch and reverting it from the commit seems to resolve the issue at that point, there seems to be some other commit in the 3.14->3.15-rc1 interval that is causing problems as well. Are there any sort of debugging options for mmc that I can use to try to better narrow down whats going wrong? thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote: > On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote: >> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: >>> This patch based upon my latest mmc tree and the next branch. I tried >>> to apply it for 3.15, and I think you will be able resolve the >>> conflict - I should be quite trivial. >> >> No worries. I just didn't want to waste time resolving it if it was >> logically dependent on some other change. >> >> I'll give it a shot and get back to you. > > So unfortunately I'm still seeing trouble.. > > [ 94.202843] EXT4-fs error (device mmcblk0p5): > ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in > gd; block bitmap corrupt. > [ 94.203873] Aborting journal on device mmcblk0p5-8. > [ 94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5): > panic forced after error > [ 94.206553] > [ 94.207420] CPU: 0 PID: 1 Comm: init Not tainted > 3.15.0-00002-g044f37a-dirty #589 > [ 94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>] > (show_stack+0x11/0x14) > [ 94.208835] [<c000f3f1>] (show_stack) from [<c042d599>] > (dump_stack+0x59/0x7c) > [ 94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178) > [ 94.209724] [<c042a57f>] (panic) from [<c0135055>] > (ext4_handle_error+0x69/0x74) > [ 94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>] > (__ext4_grp_locked_error+0x6b/0x160) > [ 94.210747] [<c01358db>] (__ext4_grp_locked_error) from > [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c) > [ 94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>] > (ext4_mb_init_cache+0x219/0x4e0) > [ 94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>] > (ext4_mb_init_group+0xbb/0x13c) > [ 94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>] > (ext4_mb_good_group+0xf3/0xfc) > [ 94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>] > (ext4_mb_regular_allocator+0x153/0x2c4) > [ 94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from > [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4) > [ 94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>] > (ext4_ext_map_blocks+0x965/0x10f0) > [ 94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>] > (ext4_map_blocks+0xff/0x374) > [ 94.219200] [<c0126839>] (mpage_map_and_submit_extent) from > [<c0127049>] (ext4_writepages+0x2b9/0x4e8) > [ 94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>] > (do_writepages+0x19/0x28) > [ 94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>] > (__filemap_fdatawrite_range+0x3d/0x44) > [ 94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from > [<c008cc3f>] (filemap_flush+0x23/0x28) > [ 94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>] > (ext4_rename+0x2f9/0x3e4) > [ 94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>] > (vfs_rename+0x183/0x45c) > [ 94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>] > (SyS_renameat2+0x22b/0x26c) > [ 94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>] > (SyS_rename+0x1f/0x24) > [ 94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>] > (ret_fast_syscall+0x1/0x5c) > > > That said, this mirrors the behavior when I was reverting your change > by hand on-top of 3.15. While git bisect pointed to your patch and > reverting it from the commit seems to resolve the issue at that point, > there seems to be some other commit in the 3.14->3.15-rc1 interval > that is causing problems as well. > > Are there any sort of debugging options for mmc that I can use to try > to better narrow down whats going wrong? It seems like you want to debug the mmci host driver and unfortunate the debug utilities available are only dev_dbg prints. I wouldn't be surprised if the problem goes away when you enable them. :-) I have some other locally stored debug patches for mmci, but those are not re-based and I am not sure you want to deal with them as is. I guess I need to set up the QEMU environment and run the tests myself, unless we go for the revert path. How do you perform the tests, is just a simple mounting/un-mounting that triggers the problem? Any specific things that I need to think of when running QEMU? Kind regards Uffe > > thanks > -john -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > It seems like you want to debug the mmci host driver and unfortunate > the debug utilities available are only dev_dbg prints. I wouldn't be > surprised if the problem goes away when you enable them. :-) > > I have some other locally stored debug patches for mmci, but those are > not re-based and I am not sure you want to deal with them as is. > > I guess I need to set up the QEMU environment and run the tests > myself, unless we go for the revert path. Again, a simple revert *doesn't* seem to work. There's something else in the 3.14->3.15-rc1 series that is also causing problems. Reverting the patch only seems to work very close to the point that the commit was made. I'm trying to redo the bisection reverting the first problematic change each step to see if I can narrow the second problem (but I've had some interruptions so I've not made much progress there yet). > How do you perform the tests, is just a simple mounting/un-mounting > that triggers the problem? > Any specific things that I need to think of when running QEMU? Unfortunately its not quite so simple as mounting/unmounting. My environment is the Linaro Android image: http://releases.linaro.org/14.04/android/images/armv7-lsk/vexpress.img.bz2 (using the initrd in http://releases.linaro.org/14.04/android/images/armv7-lsk/boot.tar.bz2). My qemu line is: QEMU_AUDIO_DRV=none qemu-system-arm -kernel zImage -initrd initrd -M vexpress-a9 -cpu cortex-a9 -nographic -vnc :0 -m 1G -append 'root=/dev/mmcblk0p1 rw mem=1024M raid=noautodetect console=ttyAMA0,38400n8 rootwait vmalloc=256MB devtmpfs.mount=0' -sd vexpress-android.img -redir tcp:5555::5555 The first bootup is usually *very* slow (since it has to dex everything), and I usually use the zImage kernel in the boot.tar.bz2 to let that finish so I can get a sane backup image. Connecting via vnc to localhost:0 will let you know when its booted all the way and finished dexing. I'd expect to wait ~5-10 minutes. I then kill qemu, and make a backup copy of the disk image so future boots are a bit faster. And then booting again with my kernel zImage (config attached). It most frequently triggers within ~90 seconds of boot, but not always (I'll go ahead and kill qemu again if I don't see it). Usually I can reproduce it within 4 boots. One potential clue here is that it doesn't seem to trigger on the first boot w/ the affected kernel. Every time I see corruption, I revert back to the backup dex'ed disk image. thanks -john
diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c index 5d20bfba..e4d4707 100644 --- a/drivers/mmc/host/mmci.c +++ b/drivers/mmc/host/mmci.c @@ -74,6 +74,7 @@ static unsigned int fmax = 515633; * @pwrreg_nopower: bits in MMCIPOWER don't controls ext. power supply * @explicit_mclk_control: enable explicit mclk control in driver. * @qcom_fifo: enables qcom specific fifo pio read logic. + * @reversed_irq_handling: handle data irq before cmd irq. */ struct variant_data { unsigned int clkreg; @@ -97,6 +98,7 @@ struct variant_data { bool pwrreg_nopower; bool explicit_mclk_control; bool qcom_fifo; + bool reversed_irq_handling; }; static struct variant_data variant_arm = { @@ -105,6 +107,7 @@ static struct variant_data variant_arm = { .datalength_bits = 16, .pwrreg_powerup = MCI_PWR_UP, .f_max = 100000000, + .reversed_irq_handling = true, }; static struct variant_data variant_arm_extended_fifo = { @@ -1236,8 +1239,13 @@ static irqreturn_t mmci_irq(int irq, void *dev_id) dev_dbg(mmc_dev(host->mmc), "irq0 (data+cmd) %08x\n", status); - mmci_cmd_irq(host, host->cmd, status); - mmci_data_irq(host, host->data, status); + if (host->variant->reversed_irq_handling) { + mmci_data_irq(host, host->data, status); + mmci_cmd_irq(host, host->cmd, status); + } else { + mmci_cmd_irq(host, host->cmd, status); + mmci_data_irq(host, host->data, status); + } /* Don't poll for busy completion in irq context. */ if (host->busy_status)
Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue when using the ARM model of the PL181 and running QEMU. The bug was reported for the following QEMU version: $ qemu-system-arm -version QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright (c) 2003-2008 Fabrice Bellard To resolve the problem, let's restore the old behavior were the DATA irq is handled prior the CMD irq, but only for the arm_variant, which the problem was reported for. Reported-by: John Stultz <john.stultz@linaro.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> --- drivers/mmc/host/mmci.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)