Message ID | 1410700015-23791-3-git-send-email-lersek@redhat.com |
---|---|
State | New |
Headers | show |
Il 14/09/2014 15:06, Laszlo Ersek ha scritto: > + } else { > + Accumulator -= gMetronome->TickPeriod; > + if (Counter == MAX_UINT64) { > + CoreInternalWaitForTick (Counter); > + CoreInternalWaitForTick (1); > + } else { > + CoreInternalWaitForTick (Counter + 1); > + } Counter is the result of a division, so it is at most 0x1999999999999999ULL. You do not need the "if", I think. Paolo > + } > + } > > - if (Remainder != 0) { > - // > - // If Remainder was not zero, then normally, Counter would be rounded > - // up by 1 tick. In this case, since a loop for 10 counts was used > - // to emulate the multiply by 10 operation, Counter needs to be rounded > - // up by 10 counts. > - // > - CoreInternalWaitForTick (10); > + // > + // Flush any remaining fractional ticks. > + // > + if (Accumulator > 0) { > + CoreInternalWaitForTick (1); > } ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Il 14/09/2014 16:45, Paolo Bonzini ha scritto: > Il 14/09/2014 15:06, Laszlo Ersek ha scritto: >> + } else { >> + Accumulator -= gMetronome->TickPeriod; >> + if (Counter == MAX_UINT64) { >> + CoreInternalWaitForTick (Counter); >> + CoreInternalWaitForTick (1); >> + } else { >> + CoreInternalWaitForTick (Counter + 1); >> + } > > Counter is the result of a division, so it is at most > 0x1999999999999999ULL. You do not need the "if", I think. Oops---sorry, the division is not by 10. Still, if you count the extra number of ticks, and do a final CoreInternalWaitForTick (together with the extra wait for Accumulator > 0), I think the code is cleaner. Paolo > Paolo > >> + } >> + } >> >> - if (Remainder != 0) { >> - // >> - // If Remainder was not zero, then normally, Counter would be rounded >> - // up by 1 tick. In this case, since a loop for 10 counts was used >> - // to emulate the multiply by 10 operation, Counter needs to be rounded >> - // up by 10 counts. >> - // >> - CoreInternalWaitForTick (10); >> + // >> + // Flush any remaining fractional ticks. >> + // >> + if (Accumulator > 0) { >> + CoreInternalWaitForTick (1); >> } > > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce > Perforce version control. Predictably reliable. > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
On 09/14/14 16:49, Paolo Bonzini wrote: > Il 14/09/2014 16:45, Paolo Bonzini ha scritto: >> Il 14/09/2014 15:06, Laszlo Ersek ha scritto: >>> + } else { >>> + Accumulator -= gMetronome->TickPeriod; >>> + if (Counter == MAX_UINT64) { >>> + CoreInternalWaitForTick (Counter); >>> + CoreInternalWaitForTick (1); >>> + } else { >>> + CoreInternalWaitForTick (Counter + 1); >>> + } >> >> Counter is the result of a division, so it is at most >> 0x1999999999999999ULL. You do not need the "if", I think. > > Oops---sorry, the division is not by 10. > > Still, if you count the extra number of ticks, and do a final > CoreInternalWaitForTick (together with the extra wait for Accumulator > > 0), I think the code is cleaner. That did cross my mind, but I didn't realize that it would make the code cleaner (ie. eliminate the innermost if). If the idea of accumulating the fractions proves acceptable for the maintainers in general, then I'll spin a v2 the way you suggest. Thanks! Laszlo ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
On 09/15/14 09:00, Tian, Feng wrote: > Hi, Laszlo > > Here is our intention for the logic: > > If Microseconds is greater than or equal to > 0x1999999999999999ULL(about 58,494 years), then the divide operation > is performed first, which means we will lose some accuracy, but this > is a very large delay request, so waiting a little bit extra in this > extreme case is an acceptable compromise. > > So I would prefer not to add this patch if there is no real impact. Fair enough. Thanks, Laszlo ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote: > Hi, Laszlo > > Here is our intention for the logic: > > If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this extreme case is an acceptable compromise. > > So I would prefer not to add this patch if there is no real impact. > Was this issue caught by some static analysis tool? Why do we need to have code to deal with something that could never happen? Thanks, Andrew Fish > Thanks > Feng > > -----Original Message----- > From: Laszlo Ersek [mailto:lersek@redhat.com] > Sent: Sunday, September 14, 2014 21:07 > To: edk2-devel@lists.sourceforge.net > Subject: [edk2] [PATCH 2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch > > The iterating branch of CoreStall() adds an extra tick for each iteration if the division returned a nonzero remainder; effectively rounding up to whole ticks separately per iteration. This has the potential to waste up to 9 ticks. > > While the current code is technically correct ("the Stall() function stalls execution on the processor for *at least* the requested number of microseconds", according to the UEFI spec, emphasis mine), we can improve the accuracy by rounding up to the exact whole number of ticks. We accumulate fractional ticks during the iteration, and flush whole ticks as they become available. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Laszlo Ersek <lersek@redhat.com> > --- > MdeModulePkg/Core/Dxe/Misc/Stall.c | 44 +++++++++++++++++++++++++++++--------- > 1 file changed, 34 insertions(+), 10 deletions(-) > > diff --git a/MdeModulePkg/Core/Dxe/Misc/Stall.c b/MdeModulePkg/Core/Dxe/Misc/Stall.c > index aae54fd..4c63624 100644 > --- a/MdeModulePkg/Core/Dxe/Misc/Stall.c > +++ b/MdeModulePkg/Core/Dxe/Misc/Stall.c > @@ -51,51 +51,75 @@ CoreInternalWaitForTick ( **/ EFI_STATUS EFIAPI CoreStall ( > IN UINTN Microseconds > ) > { > UINT64 Counter; > UINT32 Remainder; > UINTN Index; > + UINT64 Accumulator; > > if (gMetronome == NULL) { > return EFI_NOT_AVAILABLE_YET; > } > > // > // Counter = Microseconds * 10 / gMetronome->TickPeriod > // 0x1999999999999999 = (2^64 - 1) / 10 > // > if (Microseconds > 0x1999999999999999ULL) { > // > // Microseconds is too large to multiple by 10 first. Perform the divide > // operation first and loop 10 times to avoid 64-bit math overflow. > // > Counter = DivU64x32Remainder ( > Microseconds, > gMetronome->TickPeriod, > &Remainder > ); > + > + Accumulator = 0; > + // > + // In each iteration we wait for Counter whole ticks, and > + // (Remainder/TickPeriod) fractional ticks (the latter being strictly less > + // than one). Clearly we can only pass whole ticks to > + // CoreInternalWaitForTick(), therefore we keep a running account of the > + // fractional ticks. Whenever enough fractional ticks accumulate, we flush > + // a whole tick from them. > + // > + // Note that Accumulator will never reach or exceed 2*TickPeriod, because > + // Remainder < TickPeriod. This has two consequences: > + // - We never have to flush more than 1 whole tick. > + // - Accumulator is always expressible as UINT64 (TickPeriod being UINT32). > + // > for (Index = 0; Index < 10; Index++) { > - CoreInternalWaitForTick (Counter); > - } > + Accumulator += Remainder; > + if (Accumulator < gMetronome->TickPeriod) { > + CoreInternalWaitForTick (Counter); > + } else { > + Accumulator -= gMetronome->TickPeriod; > + if (Counter == MAX_UINT64) { > + CoreInternalWaitForTick (Counter); > + CoreInternalWaitForTick (1); > + } else { > + CoreInternalWaitForTick (Counter + 1); > + } > + } > + } > > - if (Remainder != 0) { > - // > - // If Remainder was not zero, then normally, Counter would be rounded > - // up by 1 tick. In this case, since a loop for 10 counts was used > - // to emulate the multiply by 10 operation, Counter needs to be rounded > - // up by 10 counts. > - // > - CoreInternalWaitForTick (10); > + // > + // Flush any remaining fractional ticks. > + // > + if (Accumulator > 0) { > + CoreInternalWaitForTick (1); > } > } else { > // > // Calculate the number of ticks by dividing the number of microseconds by > // the TickPeriod. Calculation is based on 100ns unit. > // > Counter = DivU64x32Remainder ( > MultU64x32 (Microseconds, 10), > gMetronome->TickPeriod, > &Remainder > -- > 1.8.3.1 > > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce > Perforce version control. Predictably reliable. > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/edk2-devel > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce > Perforce version control. Predictably reliable. > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/edk2-devel ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
On 09/15/14 18:53, Andrew Fish wrote: > > On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote: > >> Hi, Laszlo >> >> Here is our intention for the logic: >> >> If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this extreme case is an acceptable compromise. >> >> So I would prefer not to add this patch if there is no real impact. >> > > Was this issue caught by some static analysis tool? Nikolai submitted a patch that simply caused me to look at the function in question. In 2005-2006 I had written a UDP data transfer program that was obsessed with accurate packet rate, time keeping, and preventing long-term drift (I still use it occasionally, if the link quality justifies it); hence my interest in fractional ticks. > Why do we need to have code to deal with something that could never > happen? Well, such function arguments *could* happen. How the boot service should react is a different question. Technically the implementation matches the UEFI spec. (Compare for example the select() specification in POSIX: <http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html> Implementations may place limitations on the maximum timeout interval supported. All implementations shall support a maximum timeout interval of at least 31 days. If the timeout argument specifies a timeout interval greater than the implementation- defined maximum value, the maximum value shall be used as the actual timeout value. [...] The next errata / release of the UEFI spec might want to state something like this too.) Thanks, Laszlo ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
On Sep 15, 2014, at 11:29 AM, Laszlo Ersek <lersek@redhat.com> wrote: > On 09/15/14 18:53, Andrew Fish wrote: >> >> On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote: >> >>> Hi, Laszlo >>> >>> Here is our intention for the logic: >>> >>> If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this extreme case is an acceptable compromise. >>> >>> So I would prefer not to add this patch if there is no real impact. >>> >> >> Was this issue caught by some static analysis tool? > > Nikolai submitted a patch that simply caused me to look at the function > in question. In 2005-2006 I had written a UDP data transfer program that > was obsessed with accurate packet rate, time keeping, and preventing > long-term drift (I still use it occasionally, if the link quality > justifies it); hence my interest in fractional ticks. > Sorry I slotted this in the wrong location. I wanted to ask about the initial patch that added 0x199999999999999ULL? >> Why do we need to have code to deal with something that could never >> happen? > > Well, such function arguments *could* happen. How the boot service > should react is a different question. Technically the implementation > matches the UEFI spec. > > (Compare for example the select() specification in POSIX: > <http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html> > > Implementations may place limitations on the maximum timeout > interval supported. All implementations shall support a maximum > timeout interval of at least 31 days. If the timeout argument > specifies a timeout interval greater than the implementation- > defined maximum value, the maximum value shall be used as the > actual timeout value. [...] > > The next errata / release of the UEFI spec might want to state something > like this too.) > I was thinking that capping the max value would not be a bad idea. Thanks for the POSIX reference. Thanks, Andrew Fish > Thanks, > Laszlo > > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce > Perforce version control. Predictably reliable. > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/edk2-devel ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Hi, Andrew The change was proposed to solve incorrect truncation and overflow issue on old revision. The original code is: Counter = (UINT32) DivU64x32Remainder ( Microseconds * 10, gMetronome->TickPeriod, &Remainder ); In original code, there is Microseconds * 10 which may be out of UINTN boundary. And Counter is forced to truncate to 32 bit, which may also bring issue. Thanks Feng -----Original Message----- From: Andrew Fish [mailto:afish@apple.com] Sent: Tuesday, September 16, 2014 05:10 To: edk2-devel@lists.sourceforge.net Subject: Re: [edk2] [PATCH 2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch On Sep 15, 2014, at 11:29 AM, Laszlo Ersek <lersek@redhat.com> wrote: > On 09/15/14 18:53, Andrew Fish wrote: >> >> On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote: >> >>> Hi, Laszlo >>> >>> Here is our intention for the logic: >>> >>> If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this extreme case is an acceptable compromise. >>> >>> So I would prefer not to add this patch if there is no real impact. >>> >> >> Was this issue caught by some static analysis tool? > > Nikolai submitted a patch that simply caused me to look at the > function in question. In 2005-2006 I had written a UDP data transfer > program that was obsessed with accurate packet rate, time keeping, and > preventing long-term drift (I still use it occasionally, if the link > quality justifies it); hence my interest in fractional ticks. > Sorry I slotted this in the wrong location. I wanted to ask about the initial patch that added 0x199999999999999ULL? >> Why do we need to have code to deal with something that could never >> happen? > > Well, such function arguments *could* happen. How the boot service > should react is a different question. Technically the implementation > matches the UEFI spec. > > (Compare for example the select() specification in POSIX: > <http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html > > > > Implementations may place limitations on the maximum timeout > interval supported. All implementations shall support a maximum > timeout interval of at least 31 days. If the timeout argument > specifies a timeout interval greater than the implementation- > defined maximum value, the maximum value shall be used as the > actual timeout value. [...] > > The next errata / release of the UEFI spec might want to state > something like this too.) > I was thinking that capping the max value would not be a bad idea. Thanks for the POSIX reference. Thanks, Andrew Fish > Thanks, > Laszlo > > > ---------------------------------------------------------------------- > -------- > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce Perforce version control. > Predictably reliable. > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg. > clktrk _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/edk2-devel ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Tian, Feng [mailto:feng.tian@intel.com] wrote: ]Hi, Andrew ] ]The change was proposed to solve incorrect truncation and overflow issue on old revision. ] ]The original code is: ] Counter = (UINT32) DivU64x32Remainder ( ] Microseconds * 10, ] gMetronome->TickPeriod, ] &Remainder ] ); ] ]In original code, there is Microseconds * 10 which may be out of UINTN boundary. And Counter is forced to truncate to 32 bit, ]which may also bring issue. It is good to see concern over integer overflow. So often it is just ignored. Here is an example. Which delay is longer: MicroSecondDelay (5153376776577); MicroSecondDelay (1); Answer: the second one. The first malfunctions due to integer overflow. Thanks, Scott ]Thanks ]Feng ] ]-----Original Message----- ]From: Andrew Fish [mailto:afish@apple.com] ]Sent: Tuesday, September 16, 2014 05:10 ]To: edk2-devel@lists.sourceforge.net ]Subject: Re: [edk2] [PATCH 2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch ] ] ]On Sep 15, 2014, at 11:29 AM, Laszlo Ersek <lersek@redhat.com> wrote: ] ]> On 09/15/14 18:53, Andrew Fish wrote: ]>> ]>> On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote: ]>> ]>>> Hi, Laszlo ]>>> ]>>> Here is our intention for the logic: ]>>> ]>>> If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed ]first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this ]extreme case is an acceptable compromise. ]>>> ]>>> So I would prefer not to add this patch if there is no real impact. ]>>> ]>> ]>> Was this issue caught by some static analysis tool? ]> ]> Nikolai submitted a patch that simply caused me to look at the ]> function in question. In 2005-2006 I had written a UDP data transfer ]> program that was obsessed with accurate packet rate, time keeping, and ]> preventing long-term drift (I still use it occasionally, if the link ]> quality justifies it); hence my interest in fractional ticks. ]> ] ]Sorry I slotted this in the wrong location. I wanted to ask about the initial patch that added 0x199999999999999ULL? ] ]>> Why do we need to have code to deal with something that could never ]>> happen? ]> ]> Well, such function arguments *could* happen. How the boot service ]> should react is a different question. Technically the implementation ]> matches the UEFI spec. ]> ]> (Compare for example the select() specification in POSIX: ]> <http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html ]> > ]> ]> Implementations may place limitations on the maximum timeout ]> interval supported. All implementations shall support a maximum ]> timeout interval of at least 31 days. If the timeout argument ]> specifies a timeout interval greater than the implementation- ]> defined maximum value, the maximum value shall be used as the ]> actual timeout value. [...] ]> ]> The next errata / release of the UEFI spec might want to state ]> something like this too.) ]> ] ]I was thinking that capping the max value would not be a bad idea. Thanks for the POSIX reference. ] ]Thanks, ] ]Andrew Fish ] ]> Thanks, ]> Laszlo ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
f On Sep 16, 2014, at 10:53 AM, Scott Duplichan <scott@notabs.org> wrote: > Tian, Feng [mailto:feng.tian@intel.com] wrote: > > ]Hi, Andrew > ] > ]The change was proposed to solve incorrect truncation and overflow issue on old revision. > ] > ]The original code is: > ] Counter = (UINT32) DivU64x32Remainder ( > ] Microseconds * 10, > ] gMetronome->TickPeriod, > ] &Remainder > ] ); > ] > ]In original code, there is Microseconds * 10 which may be out of UINTN boundary. And Counter is forced to truncate to 32 bit, > ]which may also bring issue. > Well checking for a delay of 58,455 years is not solving a real problem, at least not one we can test. Making sure the 7 minute stall works is probably the problem you want to solve. > It is good to see concern over integer overflow. So often it is just ignored. Here is an example. Which delay is longer: > > MicroSecondDelay (5153376776577); > MicroSecondDelay (1); > That seems like a bug, which TimerLib library instance is this? We should fix it. Scott do you have a system that can sit around for 60 days running the test? Thanks, Andrew Fish > Answer: the second one. The first malfunctions due to integer overflow. ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Andrew Fish [mailto:afish@apple.com] wrote: On Sep 16, 2014, at 10:53 AM, Scott Duplichan <scott@notabs.org <mailto:scott@notabs.org> > wrote: Tian, Feng [ <mailto:feng.tian@intel.com> mailto:feng.tian@intel.com] wrote: ]Hi, Andrew ] ]The change was proposed to solve incorrect truncation and overflow issue on old revision. ] ]The original code is: ] Counter = (UINT32) DivU64x32Remainder ( ] Microseconds * 10, ] gMetronome->TickPeriod, ] &Remainder ] ); ] ]In original code, there is Microseconds * 10 which may be out of UINTN boundary. And Counter is forced to truncate to 32 bit, ]which may also bring issue. Well checking for a delay of 58,455 years is not solving a real problem, at least not one we can test. Making sure the 7 minute stall works is probably the problem you want to solve. It is good to see concern over integer overflow. So often it is just ignored. Here is an example. Which delay is longer: MicroSecondDelay (5153376776577); MicroSecondDelay (1); That seems like a bug, which TimerLib library instance is this? We should fix it. Scott do you have a system that can sit around for 60 days running the test? Hello Andrew, The above overflow is from DuetTimerLib, one that uses the ACPI timer. Probably a similar overflow could be demonstrated for the ones using APIC timer. In many ways it is in the same category as 58,000 year delay problem. It really doesn't matter. Thanks, Scott Thanks, Andrew Fish Answer: the second one. The first malfunctions due to integer overflow. ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
On Sep 16, 2014, at 3:14 PM, Scott Duplichan <scott@notabs.org> wrote: > Hello Andrew, > > The above overflow is from DuetTimerLib, one that uses the ACPI timer. Probably a similar overflow could be demonstrated for the ones using APIC timer. In many ways it is in the same category as 58,000 year delay problem. It really doesn't matter. > Scott, I think this brings up the point that the TimerLIb delay routines should have an upper bound on how long they will delay for. There is a high probability that a TimerLib delay is much more power inefficient that a call to UEFI gBS->Stall(). So there is probably not a need for a really long stall via the TimerLib. Thanks, Andrew Fish > Thanks, > Scott > ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
diff --git a/MdeModulePkg/Core/Dxe/Misc/Stall.c b/MdeModulePkg/Core/Dxe/Misc/Stall.c index aae54fd..4c63624 100644 --- a/MdeModulePkg/Core/Dxe/Misc/Stall.c +++ b/MdeModulePkg/Core/Dxe/Misc/Stall.c @@ -51,51 +51,75 @@ CoreInternalWaitForTick ( **/ EFI_STATUS EFIAPI CoreStall ( IN UINTN Microseconds ) { UINT64 Counter; UINT32 Remainder; UINTN Index; + UINT64 Accumulator; if (gMetronome == NULL) { return EFI_NOT_AVAILABLE_YET; } // // Counter = Microseconds * 10 / gMetronome->TickPeriod // 0x1999999999999999 = (2^64 - 1) / 10 // if (Microseconds > 0x1999999999999999ULL) { // // Microseconds is too large to multiple by 10 first. Perform the divide // operation first and loop 10 times to avoid 64-bit math overflow. // Counter = DivU64x32Remainder ( Microseconds, gMetronome->TickPeriod, &Remainder ); + + Accumulator = 0; + // + // In each iteration we wait for Counter whole ticks, and + // (Remainder/TickPeriod) fractional ticks (the latter being strictly less + // than one). Clearly we can only pass whole ticks to + // CoreInternalWaitForTick(), therefore we keep a running account of the + // fractional ticks. Whenever enough fractional ticks accumulate, we flush + // a whole tick from them. + // + // Note that Accumulator will never reach or exceed 2*TickPeriod, because + // Remainder < TickPeriod. This has two consequences: + // - We never have to flush more than 1 whole tick. + // - Accumulator is always expressible as UINT64 (TickPeriod being UINT32). + // for (Index = 0; Index < 10; Index++) { - CoreInternalWaitForTick (Counter); - } + Accumulator += Remainder; + if (Accumulator < gMetronome->TickPeriod) { + CoreInternalWaitForTick (Counter); + } else { + Accumulator -= gMetronome->TickPeriod; + if (Counter == MAX_UINT64) { + CoreInternalWaitForTick (Counter); + CoreInternalWaitForTick (1); + } else { + CoreInternalWaitForTick (Counter + 1); + } + } + } - if (Remainder != 0) { - // - // If Remainder was not zero, then normally, Counter would be rounded - // up by 1 tick. In this case, since a loop for 10 counts was used - // to emulate the multiply by 10 operation, Counter needs to be rounded - // up by 10 counts. - // - CoreInternalWaitForTick (10); + // + // Flush any remaining fractional ticks. + // + if (Accumulator > 0) { + CoreInternalWaitForTick (1); } } else { // // Calculate the number of ticks by dividing the number of microseconds by // the TickPeriod. Calculation is based on 100ns unit. // Counter = DivU64x32Remainder ( MultU64x32 (Microseconds, 10), gMetronome->TickPeriod, &Remainder
The iterating branch of CoreStall() adds an extra tick for each iteration if the division returned a nonzero remainder; effectively rounding up to whole ticks separately per iteration. This has the potential to waste up to 9 ticks. While the current code is technically correct ("the Stall() function stalls execution on the processor for *at least* the requested number of microseconds", according to the UEFI spec, emphasis mine), we can improve the accuracy by rounding up to the exact whole number of ticks. We accumulate fractional ticks during the iteration, and flush whole ticks as they become available. Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- MdeModulePkg/Core/Dxe/Misc/Stall.c | 44 +++++++++++++++++++++++++++++--------- 1 file changed, 34 insertions(+), 10 deletions(-)