comp.lang.ada
 help / color / mirror / Atom feed
* Any leap year issues caused by Ada yesterday?
@ 2012-03-01 13:06 Georg Bauhaus
  2012-03-05 11:07 ` tonyg
  0 siblings, 1 reply; 21+ messages in thread
From: Georg Bauhaus @ 2012-03-01 13:06 UTC (permalink / raw)


News travels the wires of the internet telling us about MS Azure outages
in several places. These were seemingly caused by a piece of software
handling calendar dates involving February 29, 2012.

 Which makes me think, presuming it is as simple as it looks,
is there a software setup that prevents this kind of damage by
definition?

(The bigger companies are too big fail just because of a software bug,
even if it is of the ravenous bugblatter beast of Traal kind.
Maybe a bug that was fixed is even generating attention.
But it might help the businesses surviving only if such things do not
happen ... again.)


From [1]:
"Yesterday, February 28th, 2012 at 5:45 PM PST Windows Azure operations
became aware of an issue impacting the compute service in a number of
regions.  The issue was quickly triaged and it was determined to be caused by
a software bug.  While final root cause analysis is in progress, this issue
appears to be due to a time calculation that was incorrect for the leap
year. Once we discovered the issue we immediately took steps to protect
customer services that were already up and running, and began creating a fix
for the issue.  The fix was successfully deployed to most of the Windows
Azure sub-regions and we restored Windows Azure service availability to the
majority of our customers and services by 2:57AM PST, Feb 29th."

http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure-service-disruption-update.aspx

via

http://www.forbes.com/sites/ciocentral/2012/02/29/microsoft-windows-azure-cloud-service-suffers-big-outage/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-01 13:06 Any leap year issues caused by Ada yesterday? Georg Bauhaus
@ 2012-03-05 11:07 ` tonyg
  2012-03-05 15:59   ` Shark8
  0 siblings, 1 reply; 21+ messages in thread
From: tonyg @ 2012-03-05 11:07 UTC (permalink / raw)


On Mar 1, 1:06 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:
> News travels the wires of the internet telling us about MS Azure outages
> in several places. These were seemingly caused by a piece of software
> handling calendar dates involving February 29, 2012.
>
>  Which makes me think, presuming it is as simple as it looks,
> is there a software setup that prevents this kind of damage by
> definition?
>
> (The bigger companies are too big fail just because of a software bug,
> even if it is of the ravenous bugblatter beast of Traal kind.
> Maybe a bug that was fixed is even generating attention.
> But it might help the businesses surviving only if such things do not
> happen ... again.)
>
> From [1]:
> "Yesterday, February 28th, 2012 at 5:45 PM PST Windows Azure operations
> became aware of an issue impacting the compute service in a number of
> regions.  The issue was quickly triaged and it was determined to be caused by
> a software bug.  While final root cause analysis is in progress, this issue
> appears to be due to a time calculation that was incorrect for the leap
> year. Once we discovered the issue we immediately took steps to protect
> customer services that were already up and running, and began creating a fix
> for the issue.  The fix was successfully deployed to most of the Windows
> Azure sub-regions and we restored Windows Azure service availability to the
> majority of our customers and services by 2:57AM PST, Feb 29th."
>
> http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure...
>
> via
>
> http://www.forbes.com/sites/ciocentral/2012/02/29/microsoft-windows-a...

Personally I deal with it by using the ada.calendar package rather
than writing my own. I hope that answers your question.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-05 11:07 ` tonyg
@ 2012-03-05 15:59   ` Shark8
  2012-03-05 18:03     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 21+ messages in thread
From: Shark8 @ 2012-03-05 15:59 UTC (permalink / raw)


On Monday, March 5, 2012 5:07:29 AM UTC-6, tonyg wrote:
> On Mar 1, 1:06 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
> wrote:
> > News travels the wires of the internet telling us about MS Azure outages
> > in several places. These were seemingly caused by a piece of software
> > handling calendar dates involving February 29, 2012.
> >
> >  Which makes me think, presuming it is as simple as it looks,
> > is there a software setup that prevents this kind of damage by
> > definition?
> >
> > (The bigger companies are too big fail just because of a software bug,
> > even if it is of the ravenous bugblatter beast of Traal kind.
> > Maybe a bug that was fixed is even generating attention.
> > But it might help the businesses surviving only if such things do not
> > happen ... again.)
> >
> > From [1]:
> > "Yesterday, February 28th, 2012 at 5:45 PM PST Windows Azure operations
> > became aware of an issue impacting the compute service in a number of
> > regions.  The issue was quickly triaged and it was determined to be caused by
> > a software bug.  While final root cause analysis is in progress, this issue
> > appears to be due to a time calculation that was incorrect for the leap
> > year. Once we discovered the issue we immediately took steps to protect
> > customer services that were already up and running, and began creating a fix
> > for the issue.  The fix was successfully deployed to most of the Windows
> > Azure sub-regions and we restored Windows Azure service availability to the
> > majority of our customers and services by 2:57AM PST, Feb 29th."
> >
> > http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure...
> >
> > via
> >
> > http://www.forbes.com/sites/ciocentral/2012/02/29/microsoft-windows-a...
> 
> Personally I deal with it by using the ada.calendar package rather
> than writing my own. I hope that answers your question.

Indeed, given the needed amount of work put into timing* (for tasks), it would seem VERY odd if the Ada.Calendar package did not deal with leap-year; also, given that it deals with leap-seconds, not dealing w/ leap-year's 29Feb would be inconsistent.

* Granted, for real-time systems the leap-year/leap-second thing is quite undesirable, giving rise to the monotonic time of the Real_Time package.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-05 15:59   ` Shark8
@ 2012-03-05 18:03     ` Dmitry A. Kazakov
  2012-03-05 18:30       ` Simon Wright
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry A. Kazakov @ 2012-03-05 18:03 UTC (permalink / raw)


On Mon, 5 Mar 2012 07:59:30 -0800 (PST), Shark8 wrote:

> On Monday, March 5, 2012 5:07:29 AM UTC-6, tonyg wrote:

>> Personally I deal with it by using the ada.calendar package rather
>> than writing my own. I hope that answers your question.
> 
> Indeed, given the needed amount of work put into timing* (for tasks), it
> would seem VERY odd if the Ada.Calendar package did not deal with
> leap-year;

No package can actually. The duration of the day (stellar, solar etc) is
not constant due to various factors. Therefore corrections are necessary,
time to time. These corrections are unknown in advance.

> * Granted, for real-time systems the leap-year/leap-second thing is quite
> undesirable, giving rise to the monotonic time of the Real_Time package.

Leap seconds do not influence durations, which is the only thing relevant
for control systems. Systems that use time stamps are not much influenced
by leap seconds either because they too use durations (from some epoch)
rather than local political time.

BTW, I would not wonder to see Real_Time.Time and Calendar.Time same or
correlated.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-05 18:03     ` Dmitry A. Kazakov
@ 2012-03-05 18:30       ` Simon Wright
  2012-03-05 20:17         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 21+ messages in thread
From: Simon Wright @ 2012-03-05 18:30 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> BTW, I would not wonder to see Real_Time.Time and Calendar.Time same
> or correlated.

We were surprised and disappointed to find that earlier releases of GNAT
did have them the same. What happens to your precise timing if you get
an NTP update in the middle of it?

OK, this is slightly GNAT-related, since (at that time, anyway) "delay
0.5" translated under the hood into "delay until now + 0.5" on VxWorks,
so you were always related to the actual clock. A different compiler (or
a different OS) might use a different design.

AdaCore accepted the bug report (after we pointed out the "shall" in
ARM95 D.8(32)).

http://www.adaic.org/resources/add_content/standards/95lrm/ARM_HTML/RM-D-8.html



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-05 18:30       ` Simon Wright
@ 2012-03-05 20:17         ` Dmitry A. Kazakov
  2012-03-05 20:56           ` Simon Wright
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry A. Kazakov @ 2012-03-05 20:17 UTC (permalink / raw)


On Mon, 05 Mar 2012 18:30:46 +0000, Simon Wright wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
>> BTW, I would not wonder to see Real_Time.Time and Calendar.Time same
>> or correlated.
> 
> We were surprised and disappointed to find that earlier releases of GNAT
> did have them the same. What happens to your precise timing if you get
> an NTP update in the middle of it?

Nothing, because I would expect the NTP client to leave alone the clock
readings and the arithmetic of. It should rather adjust Split and Time_Of.

In our setups we are using a different schema anyway. Instead of adjusting
clocks we do the time stamps when sending them from host to host.

> OK, this is slightly GNAT-related, since (at that time, anyway) "delay
> 0.5" translated under the hood into "delay until now + 0.5" on VxWorks,

Yes, but NTP should not mingle in that. [I don't know how the VxWorks NTP
client works.]

> so you were always related to the actual clock.

VxWorks clock on i7 is a garbage. It is driven by the timer interrupts,
i.e. to get 1ms resolution you need 1000 interrupts per second.

We keep on asking Wind River to fix the mess, but without much success so
far. GNAT simply uses that clock. So delay 0.001 may mean absolutely
anything under VxWorks. To fix that one should rewrite the time driver.

> AdaCore accepted the bug report (after we pointed out the "shall" in
> ARM95 D.8(32)).

Well, I would not blame AdaCore for that, it is Wind River's fault, IMO.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-05 20:17         ` Dmitry A. Kazakov
@ 2012-03-05 20:56           ` Simon Wright
  2012-03-06  8:47             ` Dmitry A. Kazakov
  0 siblings, 1 reply; 21+ messages in thread
From: Simon Wright @ 2012-03-05 20:56 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Mon, 05 Mar 2012 18:30:46 +0000, Simon Wright wrote:
>
>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> 
>>> BTW, I would not wonder to see Real_Time.Time and Calendar.Time same
>>> or correlated.
>> 
>> We were surprised and disappointed to find that earlier releases of GNAT
>> did have them the same. What happens to your precise timing if you get
>> an NTP update in the middle of it?
>
> Nothing, because I would expect the NTP client to leave alone the clock
> readings and the arithmetic of. It should rather adjust Split and Time_Of.

Some people would arrange time sync such that if:

a) you read the clock
b) time sync sets the clock back 1 second
c) after 1 second you read the clock again

that the time read at a) and c) would be the same. 

> In our setups we are using a different schema anyway. Instead of
> adjusting clocks we do the time stamps when sending them from host to
> host.

Because of the GNAT bug we couldn't use Ada.Real_Time. So we left
Ada.Calendar untouched and made our own <project>.Calendar with offsets,
as you say. Fun with I/O of time,

>> OK, this is slightly GNAT-related, since (at that time, anyway) "delay
>> 0.5" translated under the hood into "delay until now + 0.5" on VxWorks,
>
> Yes, but NTP should not mingle in that. [I don't know how the VxWorks NTP
> client works.]

I don't understand? What else would it do but change the value of
Ada.Calendar.Clock?

>> so you were always related to the actual clock.
>
> VxWorks clock on i7 is a garbage. It is driven by the timer interrupts,
> i.e. to get 1ms resolution you need 1000 interrupts per second.
>
> We keep on asking Wind River to fix the mess, but without much success so
> far. GNAT simply uses that clock. So delay 0.001 may mean absolutely
> anything under VxWorks. To fix that one should rewrite the time driver.

We were quite happy with 1 ms ticks. So delay 0.001 meant "delay at
least 1 and no more than 2 ms" (like always).

>> AdaCore accepted the bug report (after we pointed out the "shall" in
>> ARM95 D.8(32)).
>
> Well, I would not blame AdaCore for that, it is Wind River's fault,
> IMO.

This was *Ada* running on VxWorks; so it needed fixing. We didn't care
who fixed it (well, as far as we were concerned, who didn't fix it,
actually).

It should have been easy enough on VxWorks: taskDelay(period + 1); (but
there may be all sorts of complicated reasons why that wouldn't be
enough).



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-05 20:56           ` Simon Wright
@ 2012-03-06  8:47             ` Dmitry A. Kazakov
  2012-03-06  9:20               ` Simon Wright
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry A. Kazakov @ 2012-03-06  8:47 UTC (permalink / raw)


On Mon, 05 Mar 2012 20:56:09 +0000, Simon Wright wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
>> On Mon, 05 Mar 2012 18:30:46 +0000, Simon Wright wrote:
>>
>>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>>> 
>>>> BTW, I would not wonder to see Real_Time.Time and Calendar.Time same
>>>> or correlated.
>>> 
>>> We were surprised and disappointed to find that earlier releases of GNAT
>>> did have them the same. What happens to your precise timing if you get
>>> an NTP update in the middle of it?
>>
>> Nothing, because I would expect the NTP client to leave alone the clock
>> readings and the arithmetic of. It should rather adjust Split and Time_Of.
> 
> Some people would arrange time sync such that if:
> 
> a) you read the clock
> b) time sync sets the clock back 1 second
> c) after 1 second you read the clock again
> 
> that the time read at a) and c) would be the same. 
> 
>> In our setups we are using a different schema anyway. Instead of
>> adjusting clocks we do the time stamps when sending them from host to
>> host.
> 
> Because of the GNAT bug we couldn't use Ada.Real_Time. So we left
> Ada.Calendar untouched and made our own <project>.Calendar with offsets,
> as you say. Fun with I/O of time,
> 
>>> OK, this is slightly GNAT-related, since (at that time, anyway) "delay
>>> 0.5" translated under the hood into "delay until now + 0.5" on VxWorks,
>>
>> Yes, but NTP should not mingle in that. [I don't know how the VxWorks NTP
>> client works.]
> 
> I don't understand? What else would it do but change the value of
> Ada.Calendar.Clock?

It would be difficult to do. The most straightforward and efficient
implementation of Time is a 64-bit number (N) taken directly from the
corresponding machine register, e.g. the performance counter. Why would you
change it (provided any machine support for doing that existed)? You would
rather leave it as is and adjust the epoch instead (and, unlikely, the
multiplier) when calculating the calendar time:

   Epoch time + N * Multiplier

This is needed only in operations dealing with year, month etc, e.g. Split.
Granted there could issues with delay until <many-days-ahead>, but this is
negligibly comparing to the problems when the performance counter were
indeed adjusted.

>>> so you were always related to the actual clock.
>>
>> VxWorks clock on i7 is a garbage. It is driven by the timer interrupts,
>> i.e. to get 1ms resolution you need 1000 interrupts per second.
>>
>> We keep on asking Wind River to fix the mess, but without much success so
>> far. GNAT simply uses that clock. So delay 0.001 may mean absolutely
>> anything under VxWorks. To fix that one should rewrite the time driver.
> 
> We were quite happy with 1 ms ticks. So delay 0.001 meant "delay at
> least 1 and no more than 2 ms" (like always).

If you set the timer at 1ms rate, you would have 1ms delays. The problem is
that the time stamps would have only 1ms accuracy! For 10kHz measurements
and control we are doing, this is a bit dire. So we are setting the timer
interrupts at 0.01ms (which is our humble contribution to the "man-made"
climate change (:-))

>>> AdaCore accepted the bug report (after we pointed out the "shall" in
>>> ARM95 D.8(32)).
>>
>> Well, I would not blame AdaCore for that, it is Wind River's fault,
>> IMO.
> 
> This was *Ada* running on VxWorks; so it needed fixing. We didn't care
> who fixed it (well, as far as we were concerned, who didn't fix it,
> actually).

Yes, but AdaCore has neither the resources nor desire to patch crappy OSes.
They will have enough to do fixing the problems introduced with the
implementation of Ada 2012...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06  8:47             ` Dmitry A. Kazakov
@ 2012-03-06  9:20               ` Simon Wright
  2012-03-06 10:07                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 21+ messages in thread
From: Simon Wright @ 2012-03-06  9:20 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> If you set the timer at 1ms rate, you would have 1ms delays. The
> problem is that the time stamps would have only 1ms accuracy! For
> 10kHz measurements and control we are doing, this is a bit dire. So we
> are setting the timer interrupts at 0.01ms (which is our humble
> contribution to the "man-made" climate change (:-))

Our (VxWorks) solution was to use the PowerPC timebase register; at each
clock tick, store the Ada clock and the current tb; in between, use the
tb to interpolate. On the hardware we were using, that gave us 40 ns
resolution (a microsecond would have been fine!)

I couldn't show you that code even if I still had access, it's
proprietary, but there's something at [1].

An Intel equivalent used to work, but (a) most OSs give microsecond
resolution anyway, (b) it certainly doesn't work on Mac OS X with dual
cores where the core is slowed right down if it has nothing to do! (my
interpretation of the observed behaviour).

[1] http://booch95.svn.sourceforge.net/viewvc/booch95/trunk/src/bc-support-high_resolution_time-clock.adb-ppc32?revision=1415&view=markup



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06  9:20               ` Simon Wright
@ 2012-03-06 10:07                 ` Dmitry A. Kazakov
  2012-03-06 10:51                   ` Georg Bauhaus
  2012-03-06 16:46                   ` Simon Wright
  0 siblings, 2 replies; 21+ messages in thread
From: Dmitry A. Kazakov @ 2012-03-06 10:07 UTC (permalink / raw)


On Tue, 06 Mar 2012 09:20:55 +0000, Simon Wright wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
>> If you set the timer at 1ms rate, you would have 1ms delays. The
>> problem is that the time stamps would have only 1ms accuracy! For
>> 10kHz measurements and control we are doing, this is a bit dire. So we
>> are setting the timer interrupts at 0.01ms (which is our humble
>> contribution to the "man-made" climate change (:-))
> 
> Our (VxWorks) solution was to use the PowerPC timebase register; at each
> clock tick, store the Ada clock and the current tb; in between, use the
> tb to interpolate. On the hardware we were using, that gave us 40 ns
> resolution (a microsecond would have been fine!)

I know that PowerPC is so much better in that respect than x86.
Unfortunately it showed an astonishingly poor performance when we run our
initial tests about 2 years ago, so we switched to x86.

> I couldn't show you that code even if I still had access, it's
> proprietary, but there's something at [1].
> 
> An Intel equivalent used to work, but (a) most OSs give microsecond
> resolution anyway,

Not VxWorks, not Windows (I didn't checked Windows 7, though). Under
Windows we use the performance counter and a background thread which
synchronizes it with the system clock. This works perfectly well, but
requires statistical processing and is too slow for 10kHz cycles we have to
run under VxWorks.

> (b) it certainly doesn't work on Mac OS X with dual
> cores where the core is slowed right down if it has nothing to do! (my
> interpretation of the observed behaviour).
>
> [1] http://booch95.svn.sourceforge.net/viewvc/booch95/trunk/src/bc-support-high_resolution_time-clock.adb-ppc32?revision=1415&view=markup

Under VxWorks you can read the TSC without assembly, there is a library
function for that (pentiumTscGet64).

   type Timestamp is new Unsigned_64;
   procedure pentiumTscGet64 (Clock : out Timestamp);
   pragma Import (C, pentiumTscGet64, "pentiumTscGet64");

should do the work.

The actual problem is to get the multiplier, the BIOS time, and keeping the
TSC synchronized with the system clock. Funnily Wind River did all that for
Pentium IV. But then they were too lazy to support it on more recent x86
processors. The most troublesome thing about VxWorks is that Wind River
adds and removes its parts at will. There is no such thing as backward
compatibility whatsoever.

As for multicore/sleep mode issues, AFAIK Intel fixed that, i.e. the TSC
frequency is never changed. I don't know anything about MacOS, but probably
they deploy the same lousy schema of getting time from the PIT timer or
something like that, so the problems.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 10:07                 ` Dmitry A. Kazakov
@ 2012-03-06 10:51                   ` Georg Bauhaus
  2012-03-06 11:16                     ` Dmitry A. Kazakov
  2012-03-06 16:46                   ` Simon Wright
  1 sibling, 1 reply; 21+ messages in thread
From: Georg Bauhaus @ 2012-03-06 10:51 UTC (permalink / raw)


On 06.03.12 11:07, Dmitry A. Kazakov wrote:
> The actual problem is to get the multiplier, the BIOS time, and keeping the
> TSC synchronized with the system clock.

Why isn't there a chip that does just one thing:
signal regularly, at sufficiently high frequency,
and totally independent of its electrical surroundings,
autonomous,
unaffected by software,
and having its own power supply?
Ping - ping - ping - ping - ...
Into some memory location, maybe?

Or is there?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 10:51                   ` Georg Bauhaus
@ 2012-03-06 11:16                     ` Dmitry A. Kazakov
  0 siblings, 0 replies; 21+ messages in thread
From: Dmitry A. Kazakov @ 2012-03-06 11:16 UTC (permalink / raw)


On Tue, 06 Mar 2012 11:51:31 +0100, Georg Bauhaus wrote:

> On 06.03.12 11:07, Dmitry A. Kazakov wrote:
>> The actual problem is to get the multiplier, the BIOS time, and keeping the
>> TSC synchronized with the system clock.
> 
> Why isn't there a chip that does just one thing:
> signal regularly, at sufficiently high frequency,
> and totally independent of its electrical surroundings,
> autonomous,
> unaffected by software,
> and having its own power supply?
> Ping - ping - ping - ping - ...
> Into some memory location, maybe?
> 
> Or is there?

The TSC is, except for being off without power.

The problem is to figure out what year is 764223417. Counter is not yet a
clock. A clock ultimately gives something like 764223417ns since 1 January
2011 00:00 UTC or a method to convert a number into a date and back. Not a
trivial problem, even when ignoring the relativity theory...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 10:07                 ` Dmitry A. Kazakov
  2012-03-06 10:51                   ` Georg Bauhaus
@ 2012-03-06 16:46                   ` Simon Wright
  2012-03-06 17:37                     ` Dmitry A. Kazakov
  1 sibling, 1 reply; 21+ messages in thread
From: Simon Wright @ 2012-03-06 16:46 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> Under VxWorks you can read the TSC without assembly, there is a library
> function for that (pentiumTscGet64).
>
>    type Timestamp is new Unsigned_64;
>    procedure pentiumTscGet64 (Clock : out Timestamp);
>    pragma Import (C, pentiumTscGet64, "pentiumTscGet64");
>
> should do the work.

Not sure if there was an equivalent for PPC.

> The actual problem is to get the multiplier, the BIOS time, and keeping the
> TSC synchronized with the system clock. Funnily Wind River did all that for
> Pentium IV. But then they were too lazy to support it on more recent x86
> processors. The most troublesome thing about VxWorks is that Wind River
> adds and removes its parts at will. There is no such thing as backward
> compatibility whatsoever.

The board manufacturer defined the multiplier for us, so no problems.

> As for multicore/sleep mode issues, AFAIK Intel fixed that, i.e. the TSC
> frequency is never changed. I don't know anything about MacOS, but probably
> they deploy the same lousy schema of getting time from the PIT timer or
> something like that, so the problems.

I think I may have misunderstood the evidence here. Trying it again, the
TSC runs at pretty close to the nominal 2.4 GHz, but it was unreasonable
of me to try to *measure* it by looping for a second and seeing how much
the TSC changed, and then be surprised at errors of the order of a few
milliseconds. But, as you say, you have to calibrate the TSC somehow.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 16:46                   ` Simon Wright
@ 2012-03-06 17:37                     ` Dmitry A. Kazakov
  2012-03-06 17:59                       ` Simon Wright
                                         ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Dmitry A. Kazakov @ 2012-03-06 17:37 UTC (permalink / raw)


On Tue, 06 Mar 2012 16:46:35 +0000, Simon Wright wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
>> Under VxWorks you can read the TSC without assembly, there is a library
>> function for that (pentiumTscGet64).
>>
>>    type Timestamp is new Unsigned_64;
>>    procedure pentiumTscGet64 (Clock : out Timestamp);
>>    pragma Import (C, pentiumTscGet64, "pentiumTscGet64");
>>
>> should do the work.
> 
> Not sure if there was an equivalent for PPC.

AFAIK, PPC has a high resolution real time counter, which is better
designed than Intel's TSC.

>> The actual problem is to get the multiplier, the BIOS time, and keeping the
>> TSC synchronized with the system clock. Funnily Wind River did all that for
>> Pentium IV. But then they were too lazy to support it on more recent x86
>> processors. The most troublesome thing about VxWorks is that Wind River
>> adds and removes its parts at will. There is no such thing as backward
>> compatibility whatsoever.
> 
> The board manufacturer defined the multiplier for us, so no problems.

But there is still a problem of synchronizing it with the system time if
that uses a different source, e.g. counted timer interrupts etc.

>> As for multicore/sleep mode issues, AFAIK Intel fixed that, i.e. the TSC
>> frequency is never changed. I don't know anything about MacOS, but probably
>> they deploy the same lousy schema of getting time from the PIT timer or
>> something like that, so the problems.
> 
> I think I may have misunderstood the evidence here. Trying it again, the
> TSC runs at pretty close to the nominal 2.4 GHz, but it was unreasonable
> of me to try to *measure* it by looping for a second and seeing how much
> the TSC changed, and then be surprised at errors of the order of a few
> milliseconds.

Right, the multiplier is not a whole number. Then since the accuracy of
system clock is catastrophic under VxWorks, you would not know how long a
time interval actually was. The last time I tried something like that under
VxWorks, it didn't work.

For x86 the multiplier must be taken from the BIOS. It is somehow
determined by the frequencies of the front bus and the processor.

> But, as you say, you have to calibrate the TSC somehow.

If the sources of the TSC and of the system clock are different, then there
is a systematic error. I suppose this is the major reason why it did not
work under VxWorks. The system time must be derived from the same quartz.
Otherwise you have to compensate the deviation, which could well be as big
as 5 microseconds per second.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 17:37                     ` Dmitry A. Kazakov
@ 2012-03-06 17:59                       ` Simon Wright
  2012-03-06 19:18                         ` Dmitry A. Kazakov
  2012-03-06 19:08                       ` Shark8
  2012-03-06 21:00                       ` tmoran
  2 siblings, 1 reply; 21+ messages in thread
From: Simon Wright @ 2012-03-06 17:59 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Tue, 06 Mar 2012 16:46:35 +0000, Simon Wright wrote:
>
>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> 
>>> Under VxWorks you can read the TSC without assembly, there is a library
>>> function for that (pentiumTscGet64).
>>>
>>>    type Timestamp is new Unsigned_64;
>>>    procedure pentiumTscGet64 (Clock : out Timestamp);
>>>    pragma Import (C, pentiumTscGet64, "pentiumTscGet64");
>>>
>>> should do the work.
>> 
>> Not sure if there was an equivalent for PPC.
>
> AFAIK, PPC has a high resolution real time counter, which is better
> designed than Intel's TSC.

Yes, the Time Base.

Ours was a 32-bit implementation, so you get to read the lower and upper
halves of the timebase separately, which would cause problems at
rollover. So read the TB using an internal Clock like this

   function Clock return Time is

      type Half is (High, Low);
      type High_Low is array (Half) of Interfaces.Unsigned_32;

      Upper, Lower, Upper_Again : Interfaces.Unsigned_32;

      function To_Time is new Ada.Unchecked_Conversion (High_Low, Time);

      use type Interfaces.Unsigned_32;

   begin

      loop
         System.Machine_Code.Asm
           ("mftbu %0" & ASCII.LF & ASCII.HT &
              "mftb %1" & ASCII.LF & ASCII.HT &
              "mftbu %2",
            Outputs =>
              (Interfaces.Unsigned_32'Asm_Output ("=r", Upper),
               Interfaces.Unsigned_32'Asm_Output ("=r", Lower),
               Interfaces.Unsigned_32'Asm_Output ("=r", Upper_Again)),
            Volatile => True);
         exit when Upper_Again = Upper;
      end loop;

      return To_Time ((High => Upper, Low => Lower));

   end Clock;

The timebase ran off the same crystal as the decrementer, so all were
internally sync'd. Internally, our synchronised time was
(Ada.Calendar.Clock (at last clock interrupt, of course) + high-res time
since last clock interrupt). We added the external sync offset on
sending a time off-board, and subtracted it on receiving one.

(I've never used VxWorks on x86).



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 17:37                     ` Dmitry A. Kazakov
  2012-03-06 17:59                       ` Simon Wright
@ 2012-03-06 19:08                       ` Shark8
  2012-03-06 19:40                         ` Dmitry A. Kazakov
  2012-03-06 21:00                       ` tmoran
  2 siblings, 1 reply; 21+ messages in thread
From: Shark8 @ 2012-03-06 19:08 UTC (permalink / raw)
  Cc: mailbox

On Tuesday, March 6, 2012 11:37:35 AM UTC-6, Dmitry A. Kazakov wrote:
> 
> If the sources of the TSC and of the system clock are different, then there
> is a systematic error. I suppose this is the major reason why it did not
> work under VxWorks. The system time must be derived from the same quartz.
> Otherwise you have to compensate the deviation, which could well be as big
> as 5 microseconds per second.

Really? I thought the whole point of using Quartz crystals was because they vibrate at a precise rate; or is the problem analogous to the "beats" that you hear when two musical instruments* are slightly out of tune?

(*Guitar comes immediately to mind; especially as it uses that as part of the tuning process.)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 17:59                       ` Simon Wright
@ 2012-03-06 19:18                         ` Dmitry A. Kazakov
  2012-03-06 20:22                           ` Simon Wright
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry A. Kazakov @ 2012-03-06 19:18 UTC (permalink / raw)


On Tue, 06 Mar 2012 17:59:04 +0000, Simon Wright wrote:

> The timebase ran off the same crystal as the decrementer, so all were
> internally sync'd. Internally, our synchronised time was
> (Ada.Calendar.Clock (at last clock interrupt, of course) + high-res time
> since last clock interrupt).

You could improve that a bit by putting Ada.Calendar.Clock into a loop.
When it returns a different value, you exit the loop and take that value
for the time base.

If the quartz is same you need not to synchronize it anymore. Since the
multiplier is known you simply add the RTC minus its value at the base
multiplied by the factor and converted to Duration (or Time_Span) to the
base and use that instead of the original Ada's Clock.

Did you check the kernel settings? It has a variable that sets the time
source to the RTC. This might work on your PPC's BSP. The effect is that
the system clock is changed each time when queried rather than upon timer
interrupts.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 19:08                       ` Shark8
@ 2012-03-06 19:40                         ` Dmitry A. Kazakov
  0 siblings, 0 replies; 21+ messages in thread
From: Dmitry A. Kazakov @ 2012-03-06 19:40 UTC (permalink / raw)


On Tue, 6 Mar 2012 11:08:37 -0800 (PST), Shark8 wrote:

> On Tuesday, March 6, 2012 11:37:35 AM UTC-6, Dmitry A. Kazakov wrote:
>> 
>> If the sources of the TSC and of the system clock are different, then there
>> is a systematic error. I suppose this is the major reason why it did not
>> work under VxWorks. The system time must be derived from the same quartz.
>> Otherwise you have to compensate the deviation, which could well be as big
>> as 5 microseconds per second.
> 
> Really? I thought the whole point of using Quartz crystals was because
> they vibrate at a precise rate; or is the problem analogous to the "beats"
> that you hear when two musical instruments* are slightly out of tune?

I am not a physicist, but we actually measured TSC of two identical
computers (same motherboard, processor, manufacturer etc). The deviation
was microseconds per second.
 
> (*Guitar comes immediately to mind; especially as it uses that as part of the tuning process.)

In that analogy it is like two strings having different tension or
diameter.

But what I meant was not oscillators of almost same period, but rather ones
having absolutely unrelated periods, e.g. the BIOS clock and the TSC. x86
has many clocks spread in abundance all over the motherboard. They are
physically different, unsynchronized etc.

OS designers carefully select the least accurate and most unreliable of
them for the time source... (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 19:18                         ` Dmitry A. Kazakov
@ 2012-03-06 20:22                           ` Simon Wright
  0 siblings, 0 replies; 21+ messages in thread
From: Simon Wright @ 2012-03-06 20:22 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Tue, 06 Mar 2012 17:59:04 +0000, Simon Wright wrote:
>
>> The timebase ran off the same crystal as the decrementer, so all were
>> internally sync'd. Internally, our synchronised time was
>> (Ada.Calendar.Clock (at last clock interrupt, of course) + high-res time
>> since last clock interrupt).
>
> You could improve that a bit by putting Ada.Calendar.Clock into a loop.
> When it returns a different value, you exit the loop and take that value
> for the time base.

Ada.Calendar.Clock was basically the POSIX gettimeofday() (?) from
VxWorks; which was (epoch + ticks/clockrate). So it was updated at clock
interrupt.

> If the quartz is same you need not to synchronize it anymore. Since the
> multiplier is known you simply add the RTC minus its value at the base
> multiplied by the factor and converted to Duration (or Time_Span) to the
> base and use that instead of the original Ada's Clock.

Didn't want to mess with the RTS use of Ada.Calendar.Time for delays. No
point in using Ada.Real_Time.Time because (then) it was the same as
Ada.Calendar.Time. So we had AC.Time (for internal use only),
<Project>.Calendar.Time for both internal & external use (but not, of
course, for delays).

> Did you check the kernel settings? It has a variable that sets the time
> source to the RTC. This might work on your PPC's BSP. The effect is that
> the system clock is changed each time when queried rather than upon timer
> interrupts.

Don't remember that one.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 17:37                     ` Dmitry A. Kazakov
  2012-03-06 17:59                       ` Simon Wright
  2012-03-06 19:08                       ` Shark8
@ 2012-03-06 21:00                       ` tmoran
  2012-03-06 21:37                         ` Simon Wright
  2 siblings, 1 reply; 21+ messages in thread
From: tmoran @ 2012-03-06 21:00 UTC (permalink / raw)


> Otherwise you have to compensate the deviation, which could well be as big
> as 5 microseconds per second.
  Where do you find PCs with such a good clock?  PCs of my acquaintance
typically have a performance counter error of a few seconds/day, which
is more like 5 times that bad.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Any leap year issues caused by Ada yesterday?
  2012-03-06 21:00                       ` tmoran
@ 2012-03-06 21:37                         ` Simon Wright
  0 siblings, 0 replies; 21+ messages in thread
From: Simon Wright @ 2012-03-06 21:37 UTC (permalink / raw)


tmoran@acm.org writes:

>> Otherwise you have to compensate the deviation, which could well be as big
>> as 5 microseconds per second.

>   Where do you find PCs with such a good clock?  PCs of my acquaintance
> typically have a performance counter error of a few seconds/day, which
> is more like 5 times that bad.

We used this - http://defense.ge-ip.com/products/2113 - the clock
deviation on the datasheet was 50 ppm AFAICR.



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-03-06 21:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-01 13:06 Any leap year issues caused by Ada yesterday? Georg Bauhaus
2012-03-05 11:07 ` tonyg
2012-03-05 15:59   ` Shark8
2012-03-05 18:03     ` Dmitry A. Kazakov
2012-03-05 18:30       ` Simon Wright
2012-03-05 20:17         ` Dmitry A. Kazakov
2012-03-05 20:56           ` Simon Wright
2012-03-06  8:47             ` Dmitry A. Kazakov
2012-03-06  9:20               ` Simon Wright
2012-03-06 10:07                 ` Dmitry A. Kazakov
2012-03-06 10:51                   ` Georg Bauhaus
2012-03-06 11:16                     ` Dmitry A. Kazakov
2012-03-06 16:46                   ` Simon Wright
2012-03-06 17:37                     ` Dmitry A. Kazakov
2012-03-06 17:59                       ` Simon Wright
2012-03-06 19:18                         ` Dmitry A. Kazakov
2012-03-06 20:22                           ` Simon Wright
2012-03-06 19:08                       ` Shark8
2012-03-06 19:40                         ` Dmitry A. Kazakov
2012-03-06 21:00                       ` tmoran
2012-03-06 21:37                         ` Simon Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox