diff mbox series

Bluetooth: Work around SCO over USB HCI design defect

Message ID 20221004145351.13066-1-nicolas.cavallari@green-communications.fr
State New
Headers show
Series Bluetooth: Work around SCO over USB HCI design defect | expand

Commit Message

Nicolas Cavallari Oct. 4, 2022, 2:53 p.m. UTC
The USB interface between the host and the bluetooth adapter used for
SCO packets uses an USB isochronous endpoint with a fragmentation scheme
that does not tolerate errors.  Except USB isochronous transfers do
not provide a reliable stream with guaranteed delivery. (There is no
retry on error, see USB spec v2.0 5.6 and 8.5.5.)

To fragment a packet, the bluetooth HCI simply splits it in parts and
transfer them as-is.  The receiver is expected to reconstruct the packet
by assuming the first fragment contains the header and parsing its size
field.  There is no error detection either.

If a fragment is lost, the end result is that the kernel is no longer
synchronized and will pass malformed data to the upper layers, since it
has no way to tell if the first fragment is an actual first fragment or
a continuation fragment.  Resynchronization can only happen by luck and
requires an unbounded amount of time.

The typical symptom for a HSP/HFP bluetooth headset is that the
microphone stops working and dmesg contains piles of rate-limited
"Bluetooth: hci0: SCO packet for unknown connection handle XXXX"
errors for an indeterminate amount of time, until the kernel accidentally
resynchronize.

A workaround is to ask the upper layer to prevalidate the first fragment
header.  This is not possible with user channels so this workaround is
disabled in this case.

This problem is the most severe when using an ath3k adapter on an i.MX 6
board, where packet loss occur regularly, possibly because it is an USB1
device connected on an USB2 hub and this is a special case requiring
split transactions.

Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>

---
v2: fix typos in commit description and expand it

 drivers/bluetooth/btusb.c        |  7 +++++--
 include/net/bluetooth/hci_core.h | 20 ++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

Comments

bluez.test.bot@gmail.com Oct. 4, 2022, 3:24 p.m. UTC | #1
This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=683026

---Test result---

Test Summary:
CheckPatch                    PASS      1.78 seconds
GitLint                       PASS      1.03 seconds
SubjectPrefix                 PASS      0.88 seconds
BuildKernel                   PASS      34.91 seconds
BuildKernel32                 PASS      31.38 seconds
Incremental Build with patchesPASS      43.37 seconds
TestRunner: Setup             PASS      517.31 seconds
TestRunner: l2cap-tester      PASS      17.56 seconds
TestRunner: iso-tester        PASS      16.40 seconds
TestRunner: bnep-tester       PASS      6.68 seconds
TestRunner: mgmt-tester       PASS      105.96 seconds
TestRunner: rfcomm-tester     PASS      10.40 seconds
TestRunner: sco-tester        PASS      9.83 seconds
TestRunner: ioctl-tester      PASS      11.06 seconds
TestRunner: smp-tester        PASS      9.94 seconds
TestRunner: userchan-tester   PASS      6.93 seconds



---
Regards,
Linux Bluetooth
Luiz Augusto von Dentz Oct. 4, 2022, 11:46 p.m. UTC | #2
Hi Nicolas,

On Tue, Oct 4, 2022 at 7:54 AM Nicolas Cavallari
<nicolas.cavallari@green-communications.fr> wrote:
>
> The USB interface between the host and the bluetooth adapter used for
> SCO packets uses an USB isochronous endpoint with a fragmentation scheme
> that does not tolerate errors.  Except USB isochronous transfers do
> not provide a reliable stream with guaranteed delivery. (There is no
> retry on error, see USB spec v2.0 5.6 and 8.5.5.)
>
> To fragment a packet, the bluetooth HCI simply splits it in parts and
> transfer them as-is.  The receiver is expected to reconstruct the packet
> by assuming the first fragment contains the header and parsing its size
> field.  There is no error detection either.
>
> If a fragment is lost, the end result is that the kernel is no longer
> synchronized and will pass malformed data to the upper layers, since it
> has no way to tell if the first fragment is an actual first fragment or
> a continuation fragment.  Resynchronization can only happen by luck and
> requires an unbounded amount of time.
>
> The typical symptom for a HSP/HFP bluetooth headset is that the
> microphone stops working and dmesg contains piles of rate-limited
> "Bluetooth: hci0: SCO packet for unknown connection handle XXXX"
> errors for an indeterminate amount of time, until the kernel accidentally
> resynchronize.
>
> A workaround is to ask the upper layer to prevalidate the first fragment
> header.  This is not possible with user channels so this workaround is
> disabled in this case.
>
> This problem is the most severe when using an ath3k adapter on an i.MX 6
> board, where packet loss occur regularly, possibly because it is an USB1
> device connected on an USB2 hub and this is a special case requiring
> split transactions.

Interesting, but does this actually make it work if with the packet losses?

> Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
>
> ---
> v2: fix typos in commit description and expand it
>
>  drivers/bluetooth/btusb.c        |  7 +++++--
>  include/net/bluetooth/hci_core.h | 20 ++++++++++++++++++++
>  2 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> index 15caa6469538..f6256af98233 100644
> --- a/drivers/bluetooth/btusb.c
> +++ b/drivers/bluetooth/btusb.c
> @@ -980,9 +980,12 @@ static int btusb_recv_isoc(struct btusb_data *data, void *buffer, int count)
>
>                 if (skb->len == HCI_SCO_HDR_SIZE) {
>                         /* Complete SCO header */
> -                       hci_skb_expect(skb) = hci_sco_hdr(skb)->dlen;
> +                       struct hci_sco_hdr *hdr = hci_sco_hdr(skb);
>
> -                       if (skb_tailroom(skb) < hci_skb_expect(skb)) {
> +                       hci_skb_expect(skb) = hdr->dlen;
> +
> +                       if (skb_tailroom(skb) < hci_skb_expect(skb) ||
> +                           !hci_conn_prevalidate_sco_hdr(data->hdev, hdr)) {
>                                 kfree_skb(skb);
>                                 skb = NULL;
>
> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> index e7862903187d..d589b54e89e6 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -1286,6 +1286,26 @@ static inline struct hci_conn *hci_lookup_le_connect(struct hci_dev *hdev)
>         return NULL;
>  }
>
> +static inline bool hci_conn_prevalidate_sco_hdr(struct hci_dev *hdev,
> +                                               struct hci_sco_hdr *hdr)
> +{
> +       __u16 handle;
> +
> +       if (hci_dev_test_flag(hdev, HCI_USER_CHANNEL))
> +               // Can't validate, userspace controls everything.
> +               return true;
> +
> +       handle = hci_handle(__le16_to_cpu(hdr->handle));
> +
> +       switch (hci_conn_lookup_type(hdev, handle)) {
> +       case SCO_LINK:
> +       case ESCO_LINK:
> +               return true;
> +       default:
> +               return false;
> +       }
> +}

Don't really like to have this in hci_core.h, it is sort of messy
already beside this is probably too specific to USB so I'd go with
something like btusb_validate_sco_handle and add a comment explaining
why this is necessary.

>  int hci_disconnect(struct hci_conn *conn, __u8 reason);
>  bool hci_setup_sync(struct hci_conn *conn, __u16 handle);
>  void hci_sco_setup(struct hci_conn *conn, __u8 status);
> --
> 2.37.2
>
Nicolas Cavallari Oct. 5, 2022, 9:25 a.m. UTC | #3
On 05/10/2022 01:46, Luiz Augusto von Dentz wrote:
> Hi Nicolas,
> 
> On Tue, Oct 4, 2022 at 7:54 AM Nicolas Cavallari
> <nicolas.cavallari@green-communications.fr> wrote:
>>
>> The USB interface between the host and the bluetooth adapter used for
>> SCO packets uses an USB isochronous endpoint with a fragmentation scheme
>> that does not tolerate errors.  Except USB isochronous transfers do
>> not provide a reliable stream with guaranteed delivery. (There is no
>> retry on error, see USB spec v2.0 5.6 and 8.5.5.)
>>
>> To fragment a packet, the bluetooth HCI simply splits it in parts and
>> transfer them as-is.  The receiver is expected to reconstruct the packet
>> by assuming the first fragment contains the header and parsing its size
>> field.  There is no error detection either.
>>
>> If a fragment is lost, the end result is that the kernel is no longer
>> synchronized and will pass malformed data to the upper layers, since it
>> has no way to tell if the first fragment is an actual first fragment or
>> a continuation fragment.  Resynchronization can only happen by luck and
>> requires an unbounded amount of time.
>>
>> The typical symptom for a HSP/HFP bluetooth headset is that the
>> microphone stops working and dmesg contains piles of rate-limited
>> "Bluetooth: hci0: SCO packet for unknown connection handle XXXX"
>> errors for an indeterminate amount of time, until the kernel accidentally
>> resynchronize.
>>
>> A workaround is to ask the upper layer to prevalidate the first fragment
>> header.  This is not possible with user channels so this workaround is
>> disabled in this case.
>>
>> This problem is the most severe when using an ath3k adapter on an i.MX 6
>> board, where packet loss occur regularly, possibly because it is an USB1
>> device connected on an USB2 hub and this is a special case requiring
>> split transactions.
> 
> Interesting, but does this actually make it work if with the packet losses?

All userspace hsp/hfp implementations have packet loss concealment, I 
think, so they can tolerate losing one or two packets there and there. 
In any case it is much more usable than before. Without this patch, the 
number of packet lost is in the 10-1000 range and it is much harder to 
conceal that.

With this patch, given this sequence 01230123, if the first fragment (0) 
is lost, then only one packet is lost. If anything else than the first 
fragment, say (2) is lost, then 0130 is forwarded to the upper layer and 
  the remaining 123 is dropped.

With my ath3k I see that dropped fragments are replaced by zero-length 
fragments. I have a patch that treats zero-length fragments as an error 
and drops the current packet when it occurs, but I can't be certain that 
this won't cause any regression for other adapters, so didn't submit it.

>> [...]
>>
>> +static inline bool hci_conn_prevalidate_sco_hdr(struct hci_dev *hdev,
>> +                                               struct hci_sco_hdr *hdr)
>> +{
>> +       __u16 handle;
>> +
>> +       if (hci_dev_test_flag(hdev, HCI_USER_CHANNEL))
>> +               // Can't validate, userspace controls everything.
>> +               return true;
>> +
>> +       handle = hci_handle(__le16_to_cpu(hdr->handle));
>> +
>> +       switch (hci_conn_lookup_type(hdev, handle)) {
>> +       case SCO_LINK:
>> +       case ESCO_LINK:
>> +               return true;
>> +       default:
>> +               return false;
>> +       }
>> +}
> 
> Don't really like to have this in hci_core.h, it is sort of messy
> already beside this is probably too specific to USB so I'd go with
> something like btusb_validate_sco_handle and add a comment explaining
> why this is necessary.

Will move that into btusb.c then.
diff mbox series

Patch

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 15caa6469538..f6256af98233 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -980,9 +980,12 @@  static int btusb_recv_isoc(struct btusb_data *data, void *buffer, int count)
 
 		if (skb->len == HCI_SCO_HDR_SIZE) {
 			/* Complete SCO header */
-			hci_skb_expect(skb) = hci_sco_hdr(skb)->dlen;
+			struct hci_sco_hdr *hdr = hci_sco_hdr(skb);
 
-			if (skb_tailroom(skb) < hci_skb_expect(skb)) {
+			hci_skb_expect(skb) = hdr->dlen;
+
+			if (skb_tailroom(skb) < hci_skb_expect(skb) ||
+			    !hci_conn_prevalidate_sco_hdr(data->hdev, hdr)) {
 				kfree_skb(skb);
 				skb = NULL;
 
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index e7862903187d..d589b54e89e6 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -1286,6 +1286,26 @@  static inline struct hci_conn *hci_lookup_le_connect(struct hci_dev *hdev)
 	return NULL;
 }
 
+static inline bool hci_conn_prevalidate_sco_hdr(struct hci_dev *hdev,
+						struct hci_sco_hdr *hdr)
+{
+	__u16 handle;
+
+	if (hci_dev_test_flag(hdev, HCI_USER_CHANNEL))
+		// Can't validate, userspace controls everything.
+		return true;
+
+	handle = hci_handle(__le16_to_cpu(hdr->handle));
+
+	switch (hci_conn_lookup_type(hdev, handle)) {
+	case SCO_LINK:
+	case ESCO_LINK:
+		return true;
+	default:
+		return false;
+	}
+}
+
 int hci_disconnect(struct hci_conn *conn, __u8 reason);
 bool hci_setup_sync(struct hci_conn *conn, __u16 handle);
 void hci_sco_setup(struct hci_conn *conn, __u8 status);