[v1] i2c: microchip-core: actually use repeated sends

Message ID	20240930-uneasy-dorsal-1acda9227b0d@spud
State	Superseded
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A1C518FDB4; Mon, 30 Sep 2024 13:38:54 +0000 (UTC) From: Conor Dooley <conor@kernel.org> To: linux-i2c@vger.kernel.org Cc: conor@kernel.org, Conor Dooley <conor.dooley@microchip.com>, Daire McNamara <daire.mcnamara@microchip.com>, Andi Shyti <andi.shyti@kernel.org>, Wolfram Sang <wsa@kernel.org>, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH v1] i2c: microchip-core: actually use repeated sends Date: Mon, 30 Sep 2024 14:38:27 +0100 Message-ID: <20240930-uneasy-dorsal-1acda9227b0d@spud> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v1] i2c: microchip-core: actually use repeated sends \| expand [v1] i2c: microchip-core: actually use repeated sends

Conor Dooley Sept. 30, 2024, 1:38 p.m. UTC

From: Conor Dooley <conor.dooley@microchip.com>

At present, where repeated sends are intended to be used, the
i2c-microchip-core driver sends a stop followed by a start. Lots of i2c
devices must not malfunction in the face of this behaviour, because the
driver has operated like this for years! Try to keep track of whether or
not a repeated send is required, and suppress sending a stop in these
cases.

Fixes: 64a6f1c4987e ("i2c: add support for microchip fpga i2c controllers")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
---
CC: Conor Dooley <conor.dooley@microchip.com>
CC: Daire McNamara <daire.mcnamara@microchip.com>
CC: Andi Shyti <andi.shyti@kernel.org>
CC: Wolfram Sang <wsa@kernel.org>
CC: linux-riscv@lists.infradead.org
CC: linux-i2c@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
 drivers/i2c/busses/i2c-microchip-corei2c.c | 124 ++++++++++++++++-----
 1 file changed, 96 insertions(+), 28 deletions(-)

Wolfram Sang Oct. 1, 2024, 8:50 a.m. UTC | #1

Hi Conor,

> At present, where repeated sends are intended to be used, the
> i2c-microchip-core driver sends a stop followed by a start. Lots of i2c

Oh, this is wrong. Was this just overlooked or was maybe older hardware
not able to generated correct repeated-starts?

> devices must not malfunction in the face of this behaviour, because the
> driver has operated like this for years! Try to keep track of whether or
> not a repeated send is required, and suppress sending a stop in these
> cases.

? I don't get that argument. If the driver is expected to do a repeated
start, it should do a repeated start. If it didn't, it was a bug and you
were lucky that the targets could handle this. Because most controllers
can do repeated starts correctly, we can also argue that this works for
most targets for years. In the unlikely event that a target fails after
converting this driver to proper repeated starts, the target is buggy
and needs fixing. It would not work with the majority of other
controllers this way.

I didn't look at the code but reading "keeping track whether rep start
is required" looks wrong from a high level perspective. The driver
should do repeated start when it should do repeated start.

All the best,

   Wolfram

Conor Dooley Oct. 1, 2024, 10:16 a.m. UTC | #2

On Tue, Oct 01, 2024 at 10:50:56AM +0200, Wolfram Sang wrote:
> > At present, where repeated sends are intended to be used, the
> > i2c-microchip-core driver sends a stop followed by a start. Lots of i2c
> 
> Oh, this is wrong. Was this just overlooked or was maybe older hardware
> not able to generated correct repeated-starts?

Overlooked, because the devices that had been used until recently didn't
care about whether they got a repeated start or stop + start. The bare
metal driver upon which the Linux one was originally based had a trivial
time of supporting repeated starts because it only allows specific sorts
of transfers. I kinda doubt you care, but the bare metal implementation
is here:
https://github.com/polarfire-soc/polarfire-soc-bare-metal-library/blob/614a67abb3023ba47ea6d1b8d7b9a9997353e007/src/platform/drivers/mss/mss_i2c/mss_i2c.c#L737

It just must have been missed that the bare metal method was not replaced.

> > devices must not malfunction in the face of this behaviour, because the
> > driver has operated like this for years! Try to keep track of whether or
> > not a repeated send is required, and suppress sending a stop in these
> > cases.
> 
> ? I don't get that argument. If the driver is expected to do a repeated
> start, it should do a repeated start. If it didn't, it was a bug and you
> were lucky that the targets could handle this. Because most controllers
> can do repeated starts correctly, we can also argue that this works for
> most targets for years. In the unlikely event that a target fails after
> converting this driver to proper repeated starts, the target is buggy
> and needs fixing. It would not work with the majority of other
> controllers this way.
> 
> I didn't look at the code but reading "keeping track whether rep start
> is required" looks wrong from a high level perspective.

I think if you had looked at the code, you'd (hopefully) understand what
I meant w.r.t. tracking that.
The design of this IP is pretty old, and intended for use with other
logic implemented in FPGA fabric where each interrupt generated by
the core would be the stimulus for the state machine controlling it to
transition state. Cos of that, when controlling it from software, the
interrupt handler assumes the role of that state machine. When I talk
about tracking whether or not a repeated send is required, that's
whether or not a particular message in a transfer requires it, not
whether or not the target device requires them or not.

Currently the driver operates by iterating over a list of messages in a
transfer, and calling send() for each one, and then effectively "looping"
in the interrupt handler until the message has been sent. By looking at
the current code, you can see that the completion's "lifecycle" matches
that. Currently, at the end of each message being sent
	static irqreturn_t mchp_corei2c_handle_isr(struct mchp_corei2c_dev *idev)
	{

		<snip>

		/* On the last byte to be transmitted, send STOP */
		if (last_byte)
			mchp_corei2c_stop(idev);

		if (last_byte || finished)
			complete(&idev->msg_complete);

		return IRQ_HANDLED;
	}
a stop is put on the bus, unless !last_byte, which is only true in error
cases. Clearly I don't need to explain why that is a problem to you...
You'd think that we could do something like moving the stop out of the
interrupt handler, and to the loop in mchp_corei2c_xfer(), where we have
access to the transfer's message list and can check if a stop should be
sent or not - that's not really possible with the hardware we have.

When the interrupt handler completes, it clears the interrupt bit in the
IP, as you might expect. The controller IP uses that as the trigger to
transition state in its state machine, which is detailed in
https://ww1.microchip.com/downloads/aemDocuments/documents/FPGA/ProductDocuments/UserGuides/ip_cores/directcores/CoreI2C_HB.pdf
On page 23, row 0x28, you can see the case that (IIRC) is the
problematic one. It is impossible to leave this state without triggering
some sort of action.
The only way that I could see to make this work correctly was to get the
driver track whether or not the next message required a repeated start or
not, so as to transition out of state 0x28 correctly.

Unfortunately, then the clearing of the interrupt bit causing state
transitions kicked in again - after sending a repeated start, it will
immediately attempt to act (see state 0x10 on page 23). Without
reworking the driver to send entire transfers "in one go" (where the
completion is that of the transfer rather than the message as it
currently is) the controller will re-send the last target address +
read/write command it sent, instead of the next one. That's why there's
so many changes outside of the interrupt handler and so many additional
members in the controller's private data structure.

I hope that that at least makes some sense..

> The driver
> should do repeated start when it should do repeated start.

Yup, that's what I'm trying to do here :)

Andi Shyti Oct. 1, 2024, 12:45 p.m. UTC | #3

Hi Conor,

On Mon, Sep 30, 2024 at 02:38:27PM GMT, Conor Dooley wrote:
> From: Conor Dooley <conor.dooley@microchip.com>
> 
> At present, where repeated sends are intended to be used, the
> i2c-microchip-core driver sends a stop followed by a start. Lots of i2c
> devices must not malfunction in the face of this behaviour, because the
> driver has operated like this for years! Try to keep track of whether or
> not a repeated send is required, and suppress sending a stop in these
> cases.
> 
> Fixes: 64a6f1c4987e ("i2c: add support for microchip fpga i2c controllers")

I don't think the Fixes tag is needed here if everything worked
until now, unless you got some other device that requires this
change and you need to explain it.

If this is more an improvement (because it has worked), then we
shouldn't add the Fixes tag.

In any case, when patches are going to stable, we need to Cc
stable too.

Cc: <stable@vger.kernel.org> # v6.0+

(This is specified in the
Documentation/process/stable-kernel-rules.rst and I'm starting to
enforce it here).

> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

...

> +	/*
> +	 * If there's been an error, the isr needs to return control
> +	 * to the "main" part of the driver, so as not to keep sending
> +	 * messages once it completes and clears the SI bit.
> +	 */
> +	if (idev->msg_err) {
> +		complete(&idev->msg_complete);
> +		return;
> +	}
> +
> +	this_msg = (idev->msg_queue)++;

do we need parenthesis here?

...

> +	/*
> +	 * The isr controls the flow of a transfer, this info needs to be saved
> +	 * to a location that it can access the queue information from.
> +	 */
> +	idev->restart_needed = false;
> +	idev->msg_queue = msgs;
> +	idev->total_num = num;
> +	idev->current_num = 0;
> +
> +	/*
> +	 * But the first entry to the isr is triggered by the start in this
> +	 * function, so the first message needs to be "dequeued".
> +	 */
> +	idev->addr = i2c_8bit_addr_from_msg(this_msg);
> +	idev->msg_len = this_msg->len;
> +	idev->buf = this_msg->buf;
> +	idev->msg_err = 0;
> +
> +	if (idev->total_num > 1) {
> +		struct i2c_msg *next_msg = msgs + 1;
> +
> +		idev->restart_needed = next_msg->flags & I2C_M_RD;
> +	}
> +
> +	idev->current_num++;
> +	idev->msg_queue++;

Can we initialize only once? This part is just adding extra code.

The rest looks good. I just need to know if Wolfram has some more
observations here.

Thanks,
Andi

Conor Dooley Oct. 1, 2024, 1:02 p.m. UTC | #4

On Tue, Oct 01, 2024 at 02:45:20PM +0200, Andi Shyti wrote:
> Hi Conor,
> 
> On Mon, Sep 30, 2024 at 02:38:27PM GMT, Conor Dooley wrote:
> > From: Conor Dooley <conor.dooley@microchip.com>
> > 
> > At present, where repeated sends are intended to be used, the
> > i2c-microchip-core driver sends a stop followed by a start. Lots of i2c
> > devices must not malfunction in the face of this behaviour, because the
> > driver has operated like this for years! Try to keep track of whether or
> > not a repeated send is required, and suppress sending a stop in these
> > cases.
> > 
> > Fixes: 64a6f1c4987e ("i2c: add support for microchip fpga i2c controllers")
> 
> I don't think the Fixes tag is needed here if everything worked
> until now, unless you got some other device that requires this
> change and you need to explain it.

I think the fixes tag is accurate, because it only happened to work on
the limited set of devices I and others tried. This patch came about cos
I got reports of it being broken in 6.6

> If this is more an improvement (because it has worked), then we
> shouldn't add the Fixes tag.
> 
> In any case, when patches are going to stable, we need to Cc
> stable too.
> 
> Cc: <stable@vger.kernel.org> # v6.0+
> 
> (This is specified in the
> Documentation/process/stable-kernel-rules.rst and I'm starting to
> enforce it here).

Yah, some maintainers want to add the tags themselves, so got into a
(bad?) habit of leaving them out. I can add it if there's a v2.

> 
> > Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
> 
> ...
> 
> > +	/*
> > +	 * If there's been an error, the isr needs to return control
> > +	 * to the "main" part of the driver, so as not to keep sending
> > +	 * messages once it completes and clears the SI bit.
> > +	 */
> > +	if (idev->msg_err) {
> > +		complete(&idev->msg_complete);
> > +		return;
> > +	}
> > +
> > +	this_msg = (idev->msg_queue)++;
> 
> do we need parenthesis here?

I suppose not, do you want a v2 if that's the only change?

> 
> ...
> 
> > +	/*
> > +	 * The isr controls the flow of a transfer, this info needs to be saved
> > +	 * to a location that it can access the queue information from.
> > +	 */
> > +	idev->restart_needed = false;
> > +	idev->msg_queue = msgs;
> > +	idev->total_num = num;
> > +	idev->current_num = 0;
> > +
> > +	/*
> > +	 * But the first entry to the isr is triggered by the start in this
> > +	 * function, so the first message needs to be "dequeued".
> > +	 */
> > +	idev->addr = i2c_8bit_addr_from_msg(this_msg);
> > +	idev->msg_len = this_msg->len;
> > +	idev->buf = this_msg->buf;
> > +	idev->msg_err = 0;
> > +
> > +	if (idev->total_num > 1) {
> > +		struct i2c_msg *next_msg = msgs + 1;
> > +
> > +		idev->restart_needed = next_msg->flags & I2C_M_RD;
> > +	}
> > +
> > +	idev->current_num++;
> > +	idev->msg_queue++;
> 
> Can we initialize only once? This part is just adding extra code.

I don't agree that it is extra code, I think it is clearer like this as
I intentionally wrote it this way.

> The rest looks good. I just need to know if Wolfram has some more
> observations here.
> 
> Thanks,
> Andi

Conor Dooley Oct. 24, 2024, 9:36 a.m. UTC | #5

Yo,

On Tue, Oct 01, 2024 at 11:16:24AM +0100, Conor Dooley wrote:
> On Tue, Oct 01, 2024 at 10:50:56AM +0200, Wolfram Sang wrote:
> > > At present, where repeated sends are intended to be used, the
> > > i2c-microchip-core driver sends a stop followed by a start. Lots of i2c
> > 
> > Oh, this is wrong. Was this just overlooked or was maybe older hardware
> > not able to generated correct repeated-starts?
> 
> Overlooked, because the devices that had been used until recently didn't
> care about whether they got a repeated start or stop + start. The bare
> metal driver upon which the Linux one was originally based had a trivial
> time of supporting repeated starts because it only allows specific sorts
> of transfers. I kinda doubt you care, but the bare metal implementation
> is here:
> https://github.com/polarfire-soc/polarfire-soc-bare-metal-library/blob/614a67abb3023ba47ea6d1b8d7b9a9997353e007/src/platform/drivers/mss/mss_i2c/mss_i2c.c#L737
> 
> It just must have been missed that the bare metal method was not replaced.
> 
> > > devices must not malfunction in the face of this behaviour, because the
> > > driver has operated like this for years! Try to keep track of whether or
> > > not a repeated send is required, and suppress sending a stop in these
> > > cases.
> > 
> > ? I don't get that argument. If the driver is expected to do a repeated
> > start, it should do a repeated start. If it didn't, it was a bug and you
> > were lucky that the targets could handle this. Because most controllers
> > can do repeated starts correctly, we can also argue that this works for
> > most targets for years. In the unlikely event that a target fails after
> > converting this driver to proper repeated starts, the target is buggy
> > and needs fixing. It would not work with the majority of other
> > controllers this way.
> > 
> > I didn't look at the code but reading "keeping track whether rep start
> > is required" looks wrong from a high level perspective.
> 
> I think if you had looked at the code, you'd (hopefully) understand what
> I meant w.r.t. tracking that.
> The design of this IP is pretty old, and intended for use with other
> logic implemented in FPGA fabric where each interrupt generated by
> the core would be the stimulus for the state machine controlling it to
> transition state. Cos of that, when controlling it from software, the
> interrupt handler assumes the role of that state machine. When I talk
> about tracking whether or not a repeated send is required, that's
> whether or not a particular message in a transfer requires it, not
> whether or not the target device requires them or not.
> 
> Currently the driver operates by iterating over a list of messages in a
> transfer, and calling send() for each one, and then effectively "looping"
> in the interrupt handler until the message has been sent. By looking at
> the current code, you can see that the completion's "lifecycle" matches
> that. Currently, at the end of each message being sent
> 	static irqreturn_t mchp_corei2c_handle_isr(struct mchp_corei2c_dev *idev)
> 	{
> 	
> 		<snip>
> 	
> 		/* On the last byte to be transmitted, send STOP */
> 		if (last_byte)
> 			mchp_corei2c_stop(idev);
> 	
> 		if (last_byte || finished)
> 			complete(&idev->msg_complete);
> 	
> 		return IRQ_HANDLED;
> 	}
> a stop is put on the bus, unless !last_byte, which is only true in error
> cases. Clearly I don't need to explain why that is a problem to you...
> You'd think that we could do something like moving the stop out of the
> interrupt handler, and to the loop in mchp_corei2c_xfer(), where we have
> access to the transfer's message list and can check if a stop should be
> sent or not - that's not really possible with the hardware we have.
> 
> When the interrupt handler completes, it clears the interrupt bit in the
> IP, as you might expect. The controller IP uses that as the trigger to
> transition state in its state machine, which is detailed in
> https://ww1.microchip.com/downloads/aemDocuments/documents/FPGA/ProductDocuments/UserGuides/ip_cores/directcores/CoreI2C_HB.pdf
> On page 23, row 0x28, you can see the case that (IIRC) is the
> problematic one. It is impossible to leave this state without triggering
> some sort of action.
> The only way that I could see to make this work correctly was to get the
> driver track whether or not the next message required a repeated start or
> not, so as to transition out of state 0x28 correctly.
> 
> Unfortunately, then the clearing of the interrupt bit causing state
> transitions kicked in again - after sending a repeated start, it will
> immediately attempt to act (see state 0x10 on page 23). Without
> reworking the driver to send entire transfers "in one go" (where the
> completion is that of the transfer rather than the message as it
> currently is) the controller will re-send the last target address +
> read/write command it sent, instead of the next one. That's why there's
> so many changes outside of the interrupt handler and so many additional
> members in the controller's private data structure.
> 
> I hope that that at least makes some sense..
> 
> > The driver
> > should do repeated start when it should do repeated start.
> 
> Yup, that's what I'm trying to do here :)

I'd like to get this fix in, and Andi only had some minor comments that
didn't require a respin. I don't want to respin or resend while this
conversation remains unresolved.

Thanks,
Conor.

Conor Dooley Dec. 17, 2024, 6:35 p.m. UTC | #6

Andi, Wolfram,

I'm just going to resend this tomorrow, it's 2.5 months since this
conversation started and I'd really like to get this fix included.

Cheers,
conor.

On Fri, Dec 06, 2024 at 11:48:07AM +0000, Conor Dooley wrote:
> On Thu, Oct 24, 2024 at 10:36:33AM +0100, Conor Dooley wrote:
> > On Tue, Oct 01, 2024 at 11:16:24AM +0100, Conor Dooley wrote:
> > > On Tue, Oct 01, 2024 at 10:50:56AM +0200, Wolfram Sang wrote:
> > > > > At present, where repeated sends are intended to be used, the
> > > > > i2c-microchip-core driver sends a stop followed by a start. Lots of i2c
> > > > 
> > > > Oh, this is wrong. Was this just overlooked or was maybe older hardware
> > > > not able to generated correct repeated-starts?
> > > 
> > > Overlooked, because the devices that had been used until recently didn't
> > > care about whether they got a repeated start or stop + start. The bare
> > > metal driver upon which the Linux one was originally based had a trivial
> > > time of supporting repeated starts because it only allows specific sorts
> > > of transfers. I kinda doubt you care, but the bare metal implementation
> > > is here:
> > > https://github.com/polarfire-soc/polarfire-soc-bare-metal-library/blob/614a67abb3023ba47ea6d1b8d7b9a9997353e007/src/platform/drivers/mss/mss_i2c/mss_i2c.c#L737
> > > 
> > > It just must have been missed that the bare metal method was not replaced.
> > > 
> > > > > devices must not malfunction in the face of this behaviour, because the
> > > > > driver has operated like this for years! Try to keep track of whether or
> > > > > not a repeated send is required, and suppress sending a stop in these
> > > > > cases.
> > > > 
> > > > ? I don't get that argument. If the driver is expected to do a repeated
> > > > start, it should do a repeated start. If it didn't, it was a bug and you
> > > > were lucky that the targets could handle this. Because most controllers
> > > > can do repeated starts correctly, we can also argue that this works for
> > > > most targets for years. In the unlikely event that a target fails after
> > > > converting this driver to proper repeated starts, the target is buggy
> > > > and needs fixing. It would not work with the majority of other
> > > > controllers this way.
> > > > 
> > > > I didn't look at the code but reading "keeping track whether rep start
> > > > is required" looks wrong from a high level perspective.
> > > 
> > > I think if you had looked at the code, you'd (hopefully) understand what
> > > I meant w.r.t. tracking that.
> > > The design of this IP is pretty old, and intended for use with other
> > > logic implemented in FPGA fabric where each interrupt generated by
> > > the core would be the stimulus for the state machine controlling it to
> > > transition state. Cos of that, when controlling it from software, the
> > > interrupt handler assumes the role of that state machine. When I talk
> > > about tracking whether or not a repeated send is required, that's
> > > whether or not a particular message in a transfer requires it, not
> > > whether or not the target device requires them or not.
> > > 
> > > Currently the driver operates by iterating over a list of messages in a
> > > transfer, and calling send() for each one, and then effectively "looping"
> > > in the interrupt handler until the message has been sent. By looking at
> > > the current code, you can see that the completion's "lifecycle" matches
> > > that. Currently, at the end of each message being sent
> > > 	static irqreturn_t mchp_corei2c_handle_isr(struct mchp_corei2c_dev *idev)
> > > 	{
> > > 	
> > > 		<snip>
> > > 	
> > > 		/* On the last byte to be transmitted, send STOP */
> > > 		if (last_byte)
> > > 			mchp_corei2c_stop(idev);
> > > 	
> > > 		if (last_byte || finished)
> > > 			complete(&idev->msg_complete);
> > > 	
> > > 		return IRQ_HANDLED;
> > > 	}
> > > a stop is put on the bus, unless !last_byte, which is only true in error
> > > cases. Clearly I don't need to explain why that is a problem to you...
> > > You'd think that we could do something like moving the stop out of the
> > > interrupt handler, and to the loop in mchp_corei2c_xfer(), where we have
> > > access to the transfer's message list and can check if a stop should be
> > > sent or not - that's not really possible with the hardware we have.
> > > 
> > > When the interrupt handler completes, it clears the interrupt bit in the
> > > IP, as you might expect. The controller IP uses that as the trigger to
> > > transition state in its state machine, which is detailed in
> > > https://ww1.microchip.com/downloads/aemDocuments/documents/FPGA/ProductDocuments/UserGuides/ip_cores/directcores/CoreI2C_HB.pdf
> > > On page 23, row 0x28, you can see the case that (IIRC) is the
> > > problematic one. It is impossible to leave this state without triggering
> > > some sort of action.
> > > The only way that I could see to make this work correctly was to get the
> > > driver track whether or not the next message required a repeated start or
> > > not, so as to transition out of state 0x28 correctly.
> > > 
> > > Unfortunately, then the clearing of the interrupt bit causing state
> > > transitions kicked in again - after sending a repeated start, it will
> > > immediately attempt to act (see state 0x10 on page 23). Without
> > > reworking the driver to send entire transfers "in one go" (where the
> > > completion is that of the transfer rather than the message as it
> > > currently is) the controller will re-send the last target address +
> > > read/write command it sent, instead of the next one. That's why there's
> > > so many changes outside of the interrupt handler and so many additional
> > > members in the controller's private data structure.
> > > 
> > > I hope that that at least makes some sense..
> > > 
> > > > The driver
> > > > should do repeated start when it should do repeated start.
> > > 
> > > Yup, that's what I'm trying to do here :)
> > 
> > I'd like to get this fix in, and Andi only had some minor comments that
> > didn't require a respin. I don't want to respin or resend while this
> > conversation remains unresolved.
> 
> Could you please respond to this thread? I don't want to respin without
> resolving this conversation since I feel like we'd just end up having it
> all over again.
> 
> Thanks,
> Conor.

[v1] i2c: microchip-core: actually use repeated sends

Commit Message

Comments

Patch