comp.lang.ada
 help / color / mirror / Atom feed
* GNAT Modification_Time limitation
@ 2018-11-19 22:56 Lionel Draghi
  2018-11-20  0:47 ` Shark8
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Lionel Draghi @ 2018-11-19 22:56 UTC (permalink / raw)


I am coding a kind of make application, that depends on file's time tag (thanks to Ada.Directories.Modification_Time), and on Ada.Calendar.Clock, both returning Ada.Calendar.Time.

Unfortunately, I came across a GNAT limitation in the Modification_Time implementation on Linux : sub-second are ignored, and Modification_Time returns 
> Time_Of (Year, Month, Day, Hour, Minute, Second, 0.0);

So, at the same time Clock returns 2018-10-29 20:36:01.47
while Modification_Time    returns 2018-10-29 20:36:01.00

This prevents me from knowing if a file is modified before or after certain time, and thus undermine my efforts.

My workaround was to impair also Clock precision, with an ugly rounding:
> Time := Ada.Calendar.Clock;
> New_Time := Time_Of
>   (Year    => Year (Time),
>    Month   => Month (Time),
>    Day     => Day (Time),
>    Seconds => Day_Duration (Float'Floor (Float (Seconds (Time)))));

But that's not a correct solution either : I have to order lots of file creation, and having all files created during the same second returning the same time tag also prevent my algorithm from properly working.

Any workaround to get a precise file time tag? 
Or to compare file's time tag with Clock?

Thanks,

--
Lionel  


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-19 22:56 GNAT Modification_Time limitation Lionel Draghi
@ 2018-11-20  0:47 ` Shark8
  2018-11-20  1:33   ` Keith Thompson
  2018-11-20  1:33 ` Keith Thompson
  2018-11-20  8:08 ` briot.emmanuel
  2 siblings, 1 reply; 22+ messages in thread
From: Shark8 @ 2018-11-20  0:47 UTC (permalink / raw)


The problem with using the filesystem timestamp is that its resolution is too coarse compared to the processing-speed of your CPU.

I would recommend either implementing some sort of controlled cache, version-control, or 'hacking' the timestamp so that it's a really a build-number (eg Build 1 -> 01 Jan 1900, build 2 -> 02 Jan 1900, build 35 -> 04 Feb 1900, etc).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-19 22:56 GNAT Modification_Time limitation Lionel Draghi
  2018-11-20  0:47 ` Shark8
@ 2018-11-20  1:33 ` Keith Thompson
  2018-11-20 23:32   ` Randy Brukardt
  2018-11-20  8:08 ` briot.emmanuel
  2 siblings, 1 reply; 22+ messages in thread
From: Keith Thompson @ 2018-11-20  1:33 UTC (permalink / raw)


Lionel Draghi <lionel.draghi@gmail.com> writes:
> I am coding a kind of make application, that depends on file's time
> tag (thanks to Ada.Directories.Modification_Time), and on
> Ada.Calendar.Clock, both returning Ada.Calendar.Time.
>
> Unfortunately, I came across a GNAT limitation in the
> Modification_Time implementation on Linux : sub-second are ignored,
> and Modification_Time returns
>> Time_Of (Year, Month, Day, Hour, Minute, Second, 0.0);
>
> So, at the same time Clock returns 2018-10-29 20:36:01.47
> while Modification_Time    returns 2018-10-29 20:36:01.00
>
> This prevents me from knowing if a file is modified before or after
> certain time, and thus undermine my efforts.
>
> My workaround was to impair also Clock precision, with an ugly rounding:
>> Time := Ada.Calendar.Clock;
>> New_Time := Time_Of
>>   (Year    => Year (Time),
>>    Month   => Month (Time),
>>    Day     => Day (Time),
>>    Seconds => Day_Duration (Float'Floor (Float (Seconds (Time)))));
>
> But that's not a correct solution either : I have to order lots of
> file creation, and having all files created during the same second
> returning the same time tag also prevent my algorithm from properly
> working.
>
> Any workaround to get a precise file time tag? 
> Or to compare file's time tag with Clock?

It's odd that GNAT's Modification_Time truncates the time to
one-second precision.  A quick experiment on my system (Ubuntu 18.04)
also indicates that it does so, even though the system stores the
timestamp in nanosecond precision.

On Linux 2.6 and later, the underlying stat() system call gives you
a "struct timespec" value for the modification time, as specified
by the current POSIX standard.  (struct timespec represents times
with nanosecond precision.)  A file system isn't required to store
times with that precision, but many do.

If you're on a POSIX system, you should be able to call the stat()
system call and *probably* get a more precise timestamp.

If you're on a non-POSIX system, there might still be a
system-specific way to get a more precise timestamp.  (NTFS also
seems to store timestamps with high precision.)

(And remember that nanosecond precision doesn't necessarily imply
nanosecond accuracy.)

-- 
Keith Thompson (The_Other_Keith) kst@mib.org  <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-20  0:47 ` Shark8
@ 2018-11-20  1:33   ` Keith Thompson
  0 siblings, 0 replies; 22+ messages in thread
From: Keith Thompson @ 2018-11-20  1:33 UTC (permalink / raw)


Shark8 <onewingedshark@gmail.com> writes:
> The problem with using the filesystem timestamp is that its resolution
> is too coarse compared to the processing-speed of your CPU.

That depends on the filesystem.  See my other followup in this thread.

-- 
Keith Thompson (The_Other_Keith) kst@mib.org  <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-19 22:56 GNAT Modification_Time limitation Lionel Draghi
  2018-11-20  0:47 ` Shark8
  2018-11-20  1:33 ` Keith Thompson
@ 2018-11-20  8:08 ` briot.emmanuel
  2018-11-20 11:57   ` Lionel Draghi
  2018-11-20 23:53   ` Randy Brukardt
  2 siblings, 2 replies; 22+ messages in thread
From: briot.emmanuel @ 2018-11-20  8:08 UTC (permalink / raw)


> I am coding a kind of make application, that depends on file's time tag (thanks to Ada.Directories.Modification_Time), and on Ada.Calendar.Clock, both returning Ada.Calendar.Time.


Interesting. I am in the middle of a discussion with AdaCore about gprbuild, which fails to recompile when using an alternative body that happens to have the same time stamp (to the second). gprbuild sees that the modification time appears to be the same, and thus doesn't recompile.

Two points:
   - AdaCore mentioned they made progress recently on timestamp precision and it would likely fix the scenario. I think this is similar to what you reported, so it is likely your issue has been fixed now.

   - I am arguing with AdaCore that checking timestamps is not enough (might not even be useful at all), as Shark8 mentioned. The scenario I have is the following:

       Create a project with one scenario variable. Depending on that variable, chose src1 or src2 for source dirs. In each of these directories, have a file utils.adb with a different content. "touch" these two files so that they have the same timestamp. If you build your application once with one value of the variable, then rebuild with another value, gprbuild does nothing the second time.

I had a similar real case because git created two files with the same timestamp. And then it took me days to understand why some of my tests appeared to be linked with both versions of utils.adb, since I could see in the log file traces from both src1/utils.adb and src2/utils.adb.
Very very confusing.

So I would indeed recommend that you don't bother with timestamps, and only look at file contents (or use timestamp+file path at the very least, or perhaps inodes).

I am interested in hearing more why you want to code a new 'make-like' ?

Now trying to persuade AdaCore that gprbuild's behavior is incorrect...

Emmanuel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-20  8:08 ` briot.emmanuel
@ 2018-11-20 11:57   ` Lionel Draghi
  2018-11-21  7:40     ` briot.emmanuel
  2018-11-20 23:53   ` Randy Brukardt
  1 sibling, 1 reply; 22+ messages in thread
From: Lionel Draghi @ 2018-11-20 11:57 UTC (permalink / raw)


Thank you guys for your answers :

@Shark : see the description of my app hereafter, I will try the simple way first :-)

@Keith and Emmanuel : the Time_Of call I put in my message comes from the body of Ada.Directories (/opt/GNAT/2018/lib/gcc/x86_64-pc-linux-gnu/7.3.1/adainclude/a-direct.adb) :

...

Date := File_Time_Stamp (Name);
GM_Split (Date, Year, Month, Day, Hour, Minute, Second);
return Time_Of (Year, Month, Day, Hour, Minute, Second, 0.0);
...

and GM_Split (in System.OS_Lib package) is calling      

procedure To_GM_Time
        (P_Time_T : Address;
         P_Year   : Address;
         P_Month  : Address;
         P_Day    : Address;
         P_Hours  : Address;
         P_Mins   : Address;
         P_Secs   : Address);
      pragma Import (C, To_GM_Time, "__gnat_to_gm_time");

P_Secs is pointing an Integer.

So the limitation seems to come from GNAT C interface to OS lib.


@Keith : my App is (in this first version) using strace, so thanks for the stat idea, I should directly get the OS time stamp from strace output.

 

@Emmanuel : my make is a POC to do a make without makefile! :-)

it runs command and observes files accesses (thanks to linux kernel ptrace interface), and automatically understand what files it depends on, and what files are output.

My first test case is to replace this Makefile:

all: hello

hello.o: hello.c
    gcc -o hello.o -c hello.c

main.o: main.c hello.h
    gcc -o main.o -c main.c

hello: hello.o main.o
    gcc -o hello hello.o main.o

with just :

gcc -o hello.o -c hello.c
gcc -o main.o -c main.c
gcc -o hello hello.o main.o

and to get the same optimized behavior when removing a .o file or touching one of the source files.

-- 
-- Lionel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-20  1:33 ` Keith Thompson
@ 2018-11-20 23:32   ` Randy Brukardt
  2018-11-21  8:23     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 22+ messages in thread
From: Randy Brukardt @ 2018-11-20 23:32 UTC (permalink / raw)


"Keith Thompson" <kst-u@mib.org> wrote in message 
news:lnefbgr0rz.fsf@kst-u.example.com...
...
> If you're on a non-POSIX system, there might still be a
> system-specific way to get a more precise timestamp.  (NTFS also
> seems to store timestamps with high precision.)

NTFS has three timestamps (modification, creation, and last access). Only 
the modification has high precision; the others are only good to full 
seconds (or something like that).

FAT file systems (as you might encounter on a camera or USB stick) only have 
precision to 2 seconds. (Which is why we had to deal with this in the 
Janus/Ada build tools fairly early on.)

Also note that the system clock on Windows systems typically only changes 
every 0.01 sec (Dmitry says this can be changed, although I've never seen 
that done). That extends to the file systems and other OS timers as well. 
Most Ada vendors use a Ada.Calendar.Clock that blends the system clock with 
the high performance timer to get useful accuracy of of Ada.Calendar.Time. 
(A customer/collaborator, Tom Moran, originally wrote that code the the 
Janus/Ada implementation of Calendar to fix some timing problem that he had. 
He eventually submitted similar code to AdaCore who added it to their 
Calendar as well.)

Moral: Doing "Make" on a modern machine, especially if you want it to be 
portable, is a tricky job.

                                  Randy.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-20  8:08 ` briot.emmanuel
  2018-11-20 11:57   ` Lionel Draghi
@ 2018-11-20 23:53   ` Randy Brukardt
  2018-11-21  7:31     ` briot.emmanuel
  1 sibling, 1 reply; 22+ messages in thread
From: Randy Brukardt @ 2018-11-20 23:53 UTC (permalink / raw)


<briot.emmanuel@gmail.com> wrote in message 
news:04221674-95d8-4d4a-8743-42877b13eead@googlegroups.com...
...
>I had a similar real case because git created two files with the same
>timestamp. And then it took me days to understand why some of
>my tests appeared to be linked with both versions of utils.adb, since
>I could see in the log file traces from both src1/utils.adb and
>src2/utils.adb. Very very confusing.
>
>So I would indeed recommend that you don't bother with timestamps,
>and only look at file contents (or use timestamp+file path at the very
>least, or perhaps inodes).

I wouldn't claim that the situation is that dire; it seems to be related to 
the particular implementation of a particular GNAT feature (project scenario 
variables). If you're not implementing something where the source code 
location can be changed for a particular build, then timestamps will work 
(but you have to remember that they are quite granular).

It also seems to be related in part of source-based compilation (which 
necessarily keeps less information between builds). In a Janus/Ada project 
(which is very different than a GNAT project -- it's a binary DB-like file 
of compilation information), changing the location of a source file would 
invalidate the entire entry and essentially delete any existing 
compilations. More likely, however, is that a scenario would be set up using 
separate project files (most likely using Windows batch files/Unix 
shell-scripts to automate), so each would have their own set of compilation 
states. And it's completely impossible to bind multiple versions of a unit 
into a single executable; only one or the other could be selected - and if 
somehow some files were compiled against the wrong one, some or all of the 
compilation timestamps wouldn't match (which would cause binding failure).

The moral here is how to implement a Make-like tool depends a lot on what 
capabilities it will have.

                             Randy.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-20 23:53   ` Randy Brukardt
@ 2018-11-21  7:31     ` briot.emmanuel
  2018-11-21 14:38       ` Shark8
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: briot.emmanuel @ 2018-11-21  7:31 UTC (permalink / raw)


> I wouldn't claim that the situation is that dire; it seems to be related to 
> the particular implementation of a particular GNAT feature (project scenario 
> variables). If you're not implementing something where the source code 
> location can be changed for a particular build, then timestamps will work 
> (but you have to remember that they are quite granular).

The trick of course is to define what a "build" is in your sentence.
If it is one execution of the builder (gprbuild, make,...) then I think it is indeed
a reasonable assertion.

If however a build is defined to something that amount to "in debug mode, in production mode,..."
then of course it might happen that the sources are changed and the timestamp have a timestamp
delta of less than 1s (when we generate code for instance).

Furthermore, the actual scenario was the following: in the automatic tests, I need to simulate the 
connection to the database, so that means I need to have support for alternate bodies (but I still
compile in debug mode, or production mode,...). Is that still the same "build" ?

I would guess it is, but in the end we would end up with literally dozens of "build" types, each with
its own set of object files, and each taking 20 or 30 minutes to build from scratch. Not realistic
for continuous testing.

I spent some time looking around at general builder tools around. Most of them seem to
advertise nowadays that they look at file contents, not timestamps. I started from the list
at https://en.wikipedia.org/wiki/List_of_build_automation_software, and looked at a few of them.

> It also seems to be related in part of source-based compilation (which 
> necessarily keeps less information between builds). In a Janus/Ada project 
> (which is very different than a GNAT project -- it's a binary DB-like file 
> of compilation information), changing the location of a source file would 
> invalidate the entire entry and essentially delete any existing 
> compilations. More likely, however, is that a scenario would be set up using 
> separate project files (most likely using Windows batch files/Unix 
> shell-scripts to automate), so each would have their own set of compilation 
> states.

That's more or less what gprbuild does in practice. It uses a "distributed database"
via the .ALI files, which are found in the object directories, so for best use each
"build" should have a different object directories. And we are again hitting the notion
of "build".

> And it's completely impossible to bind multiple versions of a unit 
> into a single executable; only one or the other could be selected -

That's indeed one of the ways gprbuild could detect the error. To me it is a bug in
gprbuild that it allows linking different files for the same unit into the same executable.

> somehow some files were compiled against the wrong one, some or all of the 
> compilation timestamps wouldn't match (which would cause binding failure).

timestamps are not reliable enough, especially on modern fast machines. I am pretty
sure you will hit a similar issue I had, one day.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-20 11:57   ` Lionel Draghi
@ 2018-11-21  7:40     ` briot.emmanuel
  2018-11-21 11:16       ` briot.emmanuel
  2018-11-21 19:02       ` Lionel Draghi
  0 siblings, 2 replies; 22+ messages in thread
From: briot.emmanuel @ 2018-11-21  7:40 UTC (permalink / raw)


> @Emmanuel : my make is a POC to do a make without makefile! :-)
> it runs command and observes files accesses (thanks to linux kernel ptrace interface), and automatically understand what files it depends on, and what files are output.


There was an article earlier this week on reddit about `redo`, which seems to have a similar
idea of top-down compilation: you have a linker script that tells redo it needs a.o, b.o and c.o (then
redo recursively processes those), and finally does the link.
In turn, for a.o you would tell redo it needs a.ads, a.adb and b.ads, and then compile,...

With your idea of using ptrace, that would be an automatic way maybe to tell redo about the
dependency graph.

I am not sure redo would be really usable on actual projects though. You have to list the dependencies
for the linker for instance (I much prefer the gprbuild approach of finding those automatically).
A similar limitation seems to exist in your POC: how do I, as a novice user, know what to compile in
the first place ? It seems you would need a combination of what gprbuild does, with ptrace:

    - compile (with ptrace) the main unit.
    - gprbuild then uses the ALI file to find the dependencies, and check those recursively.
    - in your case, you would instead look at the ptrace output to find those dependencies.

The ptrace approach would be much more reliable (though linux-specific), since you would know
for instance:

    - that the compiler searched and did not find foo,ads in /first/dir
    - found and opened /other/dir/foo.ads

so next time there is a build you can check first whether 'foo.ads' now exists in /first/dir. If that file
now exists, you need to rebuild.
gprbuild doesn't handle such changes on the system, it only store what it found.

(this is all an interesting concept I learned this week from `redo`)

Let us know the result of the experiment !

Emmanuel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-20 23:32   ` Randy Brukardt
@ 2018-11-21  8:23     ` Dmitry A. Kazakov
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry A. Kazakov @ 2018-11-21  8:23 UTC (permalink / raw)


On 2018-11-21 00:32, Randy Brukardt wrote:

> Also note that the system clock on Windows systems typically only changes
> every 0.01 sec (Dmitry says this can be changed, although I've never seen
> that done).

The API call is timeBeginPeriod

 
https://docs.microsoft.com/en-us/windows/desktop/api/timeapi/nf-timeapi-timebeginperiod

The time resolution could be set down to 1ms (and never call 
timeEndPeriod as the page suggests (:-))

> Moral: Doing "Make" on a modern machine, especially if you want it to be
> portable, is a tricky job.

Yes, especially because the OS on the modern machine tends to deploy 
worst possible time source available. I guess that some MS-DOS code 
still does that job on your i9 ...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21  7:40     ` briot.emmanuel
@ 2018-11-21 11:16       ` briot.emmanuel
  2018-11-21 19:13         ` Lionel Draghi
  2018-11-21 19:02       ` Lionel Draghi
  1 sibling, 1 reply; 22+ messages in thread
From: briot.emmanuel @ 2018-11-21 11:16 UTC (permalink / raw)


> The ptrace approach would be much more reliable (though linux-specific), since you would know
> for instance:
> 
>     - that the compiler searched and did not find foo,ads in /first/dir
>     - found and opened /other/dir/foo.ads
> 
> so next time there is a build you can check first whether 'foo.ads' now exists in /first/dir. If that file
> now exists, you need to rebuild.
> gprbuild doesn't handle such changes on the system, it only store what it found.


Slightly out of topic (sorry): I found tup (http://gittup.org/tup/index.html) which appears to be doing
exactly what you want to achieve. It monitors file accesses but it uses a fuse filesystem for this, rather
than ptrace.

I had implemented a fuse filesystem in Ada at some point, though I do not have that code anymore. AdaCore was using that to access a database that contains all build+tests results on all possible combinations, if I remember right.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21  7:31     ` briot.emmanuel
@ 2018-11-21 14:38       ` Shark8
  2018-11-21 17:32       ` Simon Wright
  2018-11-21 23:34       ` Randy Brukardt
  2 siblings, 0 replies; 22+ messages in thread
From: Shark8 @ 2018-11-21 14:38 UTC (permalink / raw)


On Wednesday, November 21, 2018 at 12:31:13 AM UTC-7, briot.e...@gmail.com wrote:
> 
> I spent some time looking around at general builder tools around. Most of them seem to
> advertise nowadays that they look at file contents, not timestamps. I started from the list
> at https://en.wikipedia.org/wiki/List_of_build_automation_software, and looked at a few of them.

I read that as https://en.wikipedia.org/wiki/List_of_build_abomination_software and had to do a double take.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21  7:31     ` briot.emmanuel
  2018-11-21 14:38       ` Shark8
@ 2018-11-21 17:32       ` Simon Wright
  2018-11-21 17:43         ` briot.emmanuel
  2018-11-21 23:34       ` Randy Brukardt
  2 siblings, 1 reply; 22+ messages in thread
From: Simon Wright @ 2018-11-21 17:32 UTC (permalink / raw)


briot.emmanuel@gmail.com writes:

> That's more or less what gprbuild does in practice. It uses a
> "distributed database" via the .ALI files, which are found in the
> object directories, so for best use each "build" should have a
> different object directories. And we are again hitting the notion of
> "build".

Ideally, each distinct set of scenario variable values should have its
own object directory. Will take a lot of time for the initial
compilations, of course.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21 17:32       ` Simon Wright
@ 2018-11-21 17:43         ` briot.emmanuel
  0 siblings, 0 replies; 22+ messages in thread
From: briot.emmanuel @ 2018-11-21 17:43 UTC (permalink / raw)


> Ideally, each distinct set of scenario variable values should have its
> own object directory. Will take a lot of time for the initial
> compilations, of course.

That's actually more than that. We already use the above (and indeed we have like 5 or 6 major
scenarios, thankfully we do not compile quite all the possible combiinations).

But in the context of tests, we use extending projects to override some of the sources (for instance
so that we do not have to actually have a database running). The test project itself is an extending-all.

So if you have the simple case:
     a.gpr imports b.gpr imports c.gpr imports d.gpr

and need to substitute a body for a file c.adb in C. you then extend that project, and make a2.gpr an
extending-all project, thus we now have:

     a.gpr imports b.gpr imports c.gpr imports d.gpr
      |
     a2.gpr imports b.gpr imports c2.gpr imports d.gpr

The scenario variables have not changed, so b's objects will go in the 'obj-production' directory
as before, for instance. But in fact, some of object files now depend on that alternate body of
c.adb. If you had some inlined subprograms in c.adb (using -gnatn), then part of their code is
in b.o.

In the common (and optimistic) case where c.adb has a different timestamp from before, b.o
will be recompiled and all is fine.

If c.adb has the same timestamp as the original file (because, hey, git does what it wants),
gprbuild doesn't notice the change in c.adb, so doesn't recompile b.o, and when we link the
executable we go some case from the old c.adb (the inlined code).

This is why just checking the timestamp is not (cannot) be good enough.

Ideally, we should try and use a different object directory here (though the scenario is the
same), but I don't know how to do that (b.gpr hasn't changed, thanks to the extend-all project).

And if you add to the original 5 scenario variables another case where you can potentially
mock any number of project, you end up with way too many combinations of object directories,
my disk would not be big enough I think.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21  7:40     ` briot.emmanuel
  2018-11-21 11:16       ` briot.emmanuel
@ 2018-11-21 19:02       ` Lionel Draghi
  2018-11-21 19:48         ` Simon Wright
  1 sibling, 1 reply; 22+ messages in thread
From: Lionel Draghi @ 2018-11-21 19:02 UTC (permalink / raw)


Le mercredi 21 novembre 2018 08:40:08 UTC+1, briot.e...@gmail.com a écrit :

...
> With your idea of using ptrace, that would be an automatic way maybe to tell > redo about the dependency graph.

Exactly, the idea of the POC is see how far we can go without any explicit description of the dependency graph, or whatever build recipes.

...
> A similar limitation seems to exist in your POC: how do I, as a novice user, > know what to compile in the first place ? 

It's not in my scope : I don't target making easier compilations (I don't pretend doing a better job than gprbuild or so), just running smartly a list of command. 
I used a C compilation exemple as it's a classical make exemple, but it could be whatever suite of command :
  latex <file>.tex
  dvips <file>.dvi
  ps2pdf <file>.ps
  pdf2eps <pagenumber> <file> 

And gprbuild, or even a complex make could be one those command.  


...
> The ptrace approach would be much more reliable (though linux-specific), since you would know
> for instance:
> 
>     - that the compiler searched and did not find foo,ads in /first/dir
>     - found and opened /other/dir/foo.ads
> 
> so next time there is a build you can check first whether 'foo.ads' now exists in /first/dir. If that file
> now exists, you need to rebuild.

Exactly my intent.
And to build the dependency graph, I need to identify which file is an input file, and which one is an output (a target).

To do so, I can either:
1. make a complex analysis of a detailed strace log file on each file operation;
2. just ask strace the list of the involved files, and classify those file thanks to modification time : if file modification time > execution time, then it's an output. 

The second option seems to be far less complex, but I need enough precision in time stamps to discriminate if a file is older than the command run time or not.
Note also that I could store a hashtag for each used file to check if the file is the same without getting in all those time tag problems (I am pretty sure most OSes propose such services). 
It would certainly be useful and reliable to decide re-executing a command, but wouldn't help to classify if the used file was only read, or an output. 
So, I didn't investigate in that direction.


--
Lionel



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21 11:16       ` briot.emmanuel
@ 2018-11-21 19:13         ` Lionel Draghi
  0 siblings, 0 replies; 22+ messages in thread
From: Lionel Draghi @ 2018-11-21 19:13 UTC (permalink / raw)


Le mercredi 21 novembre 2018 12:16:12 UTC+1, briot.e...@gmail.com a écrit :
...
> Slightly out of topic (sorry): I found tup (http://gittup.org/tup/index.html) which appears to be doing
> exactly what you want to achieve. It monitors file accesses but it uses a fuse filesystem for this, rather
> than ptrace.
> 
Very interresting information for me at least :-), thank you.

Not sure the goal is the same. 
I see on http://gittup.org/tup/ex_a_first_tupfile.html
a small exemple of tupfile, and it give's both the input and the target with the command:

: hello.c |> gcc hello.c -o hello |> hello

This is what I try to avoid! (not to mention one more specific format) 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21 19:02       ` Lionel Draghi
@ 2018-11-21 19:48         ` Simon Wright
  2018-11-21 22:14           ` Lionel Draghi
  0 siblings, 1 reply; 22+ messages in thread
From: Simon Wright @ 2018-11-21 19:48 UTC (permalink / raw)


Lionel Draghi <lionel.draghi@gmail.com> writes:

> And to build the dependency graph, I need to identify which file is an
> input file, and which one is an output (a target).
>
> To do so, I can either:
> 1. make a complex analysis of a detailed strace log file on each file
> operation;
> 2. just ask strace the list of the involved files, and classify those
> file thanks to modification time : if file modification time >
> execution time, then it's an output.

Can't you tell from strace which files were opened for read and which
for write?

I suppose there are some files that are opened read/write; either,
perhaps most usually, in separate parts of the build, or by being
updated in one.

I have one project (tcladashell) which runs a tcl script to generate a C
source, which is compiled, built, and run to generate an Ada package
spec. Which is then used in the rest of the build.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21 19:48         ` Simon Wright
@ 2018-11-21 22:14           ` Lionel Draghi
  0 siblings, 0 replies; 22+ messages in thread
From: Lionel Draghi @ 2018-11-21 22:14 UTC (permalink / raw)


Le mercredi 21 novembre 2018 20:48:40 UTC+1, Simon Wright a écrit :
...
> Can't you tell from strace which files were opened for read and which
> for write?

Yes, strace can monitor every call to system API, with parameters.
I taught the time tag way was easier, but it may be time to change my mind :-)

> I suppose there are some files that are opened read/write; either,
> perhaps most usually, in separate parts of the build, or by being
> updated in one.

strace -f monitors also sub processes (-f stands for "follow forks")
That's handy.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21  7:31     ` briot.emmanuel
  2018-11-21 14:38       ` Shark8
  2018-11-21 17:32       ` Simon Wright
@ 2018-11-21 23:34       ` Randy Brukardt
  2018-11-22  8:15         ` briot.emmanuel
  2 siblings, 1 reply; 22+ messages in thread
From: Randy Brukardt @ 2018-11-21 23:34 UTC (permalink / raw)


<briot.emmanuel@gmail.com> wrote in message 
news:62ffa1fb-6733-4f97-ba87-ae3103bfc877@googlegroups.com...
>> I wouldn't claim that the situation is that dire; it seems to be related 
>> to
>> the particular implementation of a particular GNAT feature (project 
>> scenario
>> variables). If you're not implementing something where the source code
>> location can be changed for a particular build, then timestamps will work
>> (but you have to remember that they are quite granular).
>
> The trick of course is to define what a "build" is in your sentence.
> If it is one execution of the builder (gprbuild, make,...) then I think it 
> is indeed
> a reasonable assertion.
>
> If however a build is defined to something that amount to "in debug mode, 
> in production mode,..."
> then of course it might happen that the sources are changed and the 
> timestamp have a timestamp
> delta of less than 1s (when we generate code for instance).

I'd argue that these are something else on top of individual builds. And 
that it is a mistake trying to combine basic building with those 
higher-level configuration management things.

I've struggled with those higher level issues almost since the beginning of 
RR Software (we've almost always supported multiple targets for Janus/Ada). 
Neither conventional build tools nor configuration management tools are any 
help whatsoever for managing those situations. I've seen various attempts to 
do so, but none of them address the underlying issues very well.

> Furthermore, the actual scenario was the following: in the automatic 
> tests, I need to simulate the
> connection to the database, so that means I need to have support for 
> alternate bodies (but I still
> compile in debug mode, or production mode,...). Is that still the same 
> "build" ?

No, at least four separate builds.

> I would guess it is, but in the end we would end up with literally dozens 
> of "build" types, each with
> its own set of object files, and each taking 20 or 30 minutes to build 
> from scratch. Not realistic
> for continuous testing.

You need a faster compiler. :-) :-)

Seriously, at least debug vs. production has to be built over from scratch 
(at least the way I typically use those). The debug version uses different 
compiler options and the production version, as debug symbols need to be 
generated, some optimizations need to be turned off, and then the production 
version turns off various Ada checking. So to switch from one to the other 
requires a full rebuild anyway.

In recent years, I've avoided that problem by keeping multiple projects for 
debug and production and various targets, and thus (re)building each 
individually as needed. Disk space is plentiful on modern machines -- it's 
my time that's limited.

> I spent some time looking around at general builder tools around. Most of 
> them seem to
> advertise nowadays that they look at file contents, not timestamps. I 
> started from the list
> at https://en.wikipedia.org/wiki/List_of_build_automation_software, and 
> looked at a few of them.

For source code, I tend to agree. The Janus/Ada COrder tool always had an 
option to read the source files instead of depending on timestamps. And 
Janus/Ada puts the timestamps into the compilation results, so that they 
can't be clobbered by file operations. In any case, source code is only part 
of the picture.

>> It also seems to be related in part of source-based compilation (which
>> necessarily keeps less information between builds). In a Janus/Ada 
>> project
>> (which is very different than a GNAT project -- it's a binary DB-like 
>> file
>> of compilation information), changing the location of a source file would
>> invalidate the entire entry and essentially delete any existing
>> compilations. More likely, however, is that a scenario would be set up 
>> using
>> separate project files (most likely using Windows batch files/Unix
>> shell-scripts to automate), so each would have their own set of 
>> compilation
>> states.
>
> That's more or less what gprbuild does in practice. It uses a "distributed 
> database"
> via the .ALI files, which are found in the object directories, so for best 
> use each
> "build" should have a different object directories. And we are again 
> hitting the notion
> of "build".

Precisely. Higher-level things than raw builds are best kept separate at the 
compilation artifact level.

>> And it's completely impossible to bind multiple versions of a unit
>> into a single executable; only one or the other could be selected -
>
> That's indeed one of the ways gprbuild could detect the error. To me it is 
> a bug in
> gprbuild that it allows linking different files for the same unit into the 
> same executable.

I believe that is a result of the way GNAT compiles files -- the package 
specifications are never materialized, so it would be hard for it to have 
any compilation result which could tell which one is used. I've seen this 
sort of effect working on ACATS tests, and I've never had any reason to use 
GPRBuild for that.

>> somehow some files were compiled against the wrong one, some or all of 
>> the
>> compilation timestamps wouldn't match (which would cause binding 
>> failure).
>
> timestamps are not reliable enough, especially on modern fast machines. I 
> am pretty
> sure you will hit a similar issue I had, one day.

I'm sorry, I confused you here. I was talking about the timestamps that 
Janus/Ada records for compilation units when they are compiled. These are 
internal to the SYM files (which are a representation of the Ada symboltable 
for a library unit), and used to determine which version is "with"ed in 
other files. They're only compared for equality (other than for the purposes 
of error messages). Even on a FAT system, these have 2 second granularity. 
You could only have a problem if the same specification is recompiled twice 
in under 2 seconds.

It's hard to imagine a build taking less than two seconds; certainly not if 
a human is involved, and very unlikely even if automated. (The Janus/Ada 
binder is fairly slow as it removes unreachable subprograms recursively --  
that's required for Windows programs because the presence of binding for a 
variety of Windows versions -- and as such it takes multiple seconds for all 
but the most trivial programs.) On more modern systems, we're talking 
hundredths of seconds granularity; it's essentially impossible for multiple 
builds to happen that fast.

COrder (the Janus/Ada compilation order tool that's at the heart of any 
builds) has an old /T option that uses file timestamps, but it has not been 
recommended for a while. It's faster than the /I option that inspects the 
internal timestamps and the source code ('cause it doesn't have to open 
hundreds of files and read part of them), but it messes up so often it is 
not recommended anymore. (One nice side-effect of /I is that one can simply 
delete all of the SYM files to force a rebuild of everything; /T doesn't 
always rebuild everything in that case.)

In any case, timestamps have their place, but they have to be used 
carefully.

                         Randy.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-21 23:34       ` Randy Brukardt
@ 2018-11-22  8:15         ` briot.emmanuel
  2018-11-26 23:45           ` Randy Brukardt
  0 siblings, 1 reply; 22+ messages in thread
From: briot.emmanuel @ 2018-11-22  8:15 UTC (permalink / raw)


> I'd argue that these are something else on top of individual builds. And 
> that it is a mistake trying to combine basic building with those 
> higher-level configuration management things.

Not sure about the concepts about you describe. For me, "basic building"
is one run of the compiler on one specific file, which always recompiles,
no question asked.
What we are talking about in this thread a tools at the level of make and
gprbuild, that decide what should be compiled, and when. For me, this is
thte "higher-level build management part". That includes configuration
management, since this tool is also responsible for deciding where the
build artifacts (object files, executables,...) should be stored.

> > its own set of object files, and each taking 20 or 30 minutes to build 
> > from scratch. Not realistic for continuous testing.
> 
> You need a faster compiler. :-) :-)

I wish I had one. Still, this is compiling 6000 Ada units + C files iin
20 minutes, not too bad (using parallel builds, of course).

> Seriously, at least debug vs. production has to be built over from scratch 
> (at least the way I typically use those). 

Definitely.
And I agree with your conclusion that we need separate builds for every
possible combination of environment/switches (debug, production,...)
and source files (alternate bodies,...).
Disk space is cheaper than our time, though fast SSDs are still not quite
as cheap as we would all like.

> It's hard to imagine a build taking less than two seconds; certainly not if 
> a human is involved, and very unlikely even if automated.

Don't forget that Lionel's make-like tool and gprbuild are both meant to be
language-neutral. Compiling an Ada file in less than 2s is rare nowadays (but
possible with very simple files. But compiling a python file takes a few ms,
so just looking at timestamps cannot be enough (though when compilation
is that fast, it doesn't matter much to redo it more often...

> (The Janus/Ada 
> binder is fairly slow as it removes unreachable subprograms recursively --  

Nice feature. With gcc we use link time optimization to achieve the same
effect (and more), and that's slow indeed.

> COrder (the Janus/Ada compilation order tool that's at the heart of any 
> builds) has an old /T option that uses file timestamps, but it has not been 
> recommended for a while. It's faster than the /I option that inspects the 
> internal timestamps and the source code ('cause it doesn't have to open 
> hundreds of files and read part of them), but it messes up so often it is 
> not recommended anymore. (One nice side-effect of /I is that one can simply 
> delete all of the SYM files to force a rebuild of everything; /T doesn't 
> always rebuild everything in that case.)

One interesting of the TUP builder I mentioned yesterday is that it comes
with an optional daemon program, that monitors the changes on the
file system (inotify on linux), so that when you start the build it already
knows what files have been modified and can start building right away.
Saving 10s or more every time I launch gprbuild would be nice !

> In any case, timestamps have their place, but they have to be used 
> carefully.

Seconded.
They can be used as a shortcut: the builder can have a mode that 
says "assume the file was modified if the timestamp has changed,
but if the timestamp is the same, check the contents".
And then a "minimal recompilation switch" that says "only look at
file contents to detect whether file has changed".

Emmanuel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: GNAT Modification_Time limitation
  2018-11-22  8:15         ` briot.emmanuel
@ 2018-11-26 23:45           ` Randy Brukardt
  0 siblings, 0 replies; 22+ messages in thread
From: Randy Brukardt @ 2018-11-26 23:45 UTC (permalink / raw)


<briot.emmanuel@gmail.com> wrote in message 
news:0d4679d2-30d7-493f-b9fd-688d044e1a4e@googlegroups.com...
>> I'd argue that these are something else on top of individual builds. And
>> that it is a mistake trying to combine basic building with those
>> higher-level configuration management things.
>
> Not sure about the concepts about you describe. For me, "basic building"
> is one run of the compiler on one specific file, which always recompiles,
> no question asked.
> What we are talking about in this thread a tools at the level of make and
> gprbuild, that decide what should be compiled, and when. For me, this is
> thte "higher-level build management part". That includes configuration
> management, since this tool is also responsible for deciding where the
> build artifacts (object files, executables,...) should be stored.

I think of a single compilation as a "compilation", while a "build" to me is 
something that results in one or more executable files, generally based on 
the source code found in a single directory. Going further (and we have some 
such features, particularly for sharing source/object between multiple 
distinct builds) is part of the higher-level management. (I don't have an 
simple name for that, which shows yet again how hard it is to describe.)

Ada of course allows "build" to be completely automated without any outside 
intervention at all. In theory, it's only necessary to point the build tool 
at the pile of source code.

...
>> It's hard to imagine a build taking less than two seconds; certainly not 
>> if
>> a human is involved, and very unlikely even if automated.
>
> Don't forget that Lionel's make-like tool and gprbuild are both meant to 
> be
> language-neutral. Compiling an Ada file in less than 2s is rare nowadays 
> (but
> possible with very simple files. But compiling a python file takes a few 
> ms,
> so just looking at timestamps cannot be enough (though when compilation
> is that fast, it doesn't matter much to redo it more often...

Again, a "build" in my view is compiling the set of files needed to create 
an executable. (Again, I'll ignore the management of shared libraries.) That 
generally requires the compilation of multiple files, and a linking phase as 
well. Moreover, unless you are running multiple builds from some 
higher-level tool, there's also human reaction time involved. The likelihood 
of that happening faster than 2 seconds isn't high. The issues I've seen 
almost always come from someone terminating a compilation in the middle 
without letting the compiler clean up any half-created artifacts.

Of course, most other languages need a lot of help to determine dependencies 
(information that is directly part of the Ada source code). That need for 
help has confused the issues a lot, because however you give it can't be 
automatic nor bullet-proof. Thus, this gets mixed up with the higher level 
issues. Ada only needs that help at a higher level than basic building; 
basic building should be automatic.

I've even had a customer (with large, complex systems) tell me that they 
didn't want the Ada compiler to even try to manage such things. They wanted 
to grab some set of source from version control and essentially have the 
compiler build it from that source (all found in one large glob in a single 
directory). They thought that build times were short enough that it wasn't 
worth the intermediate steps to avoid recompilations. I've rather thought 
that was the future of such tools; some higher-level management (probably 
from the configuration management system) where whatever the compiler does 
would seem to get in the way.

                                        Randy.



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-11-26 23:45 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-19 22:56 GNAT Modification_Time limitation Lionel Draghi
2018-11-20  0:47 ` Shark8
2018-11-20  1:33   ` Keith Thompson
2018-11-20  1:33 ` Keith Thompson
2018-11-20 23:32   ` Randy Brukardt
2018-11-21  8:23     ` Dmitry A. Kazakov
2018-11-20  8:08 ` briot.emmanuel
2018-11-20 11:57   ` Lionel Draghi
2018-11-21  7:40     ` briot.emmanuel
2018-11-21 11:16       ` briot.emmanuel
2018-11-21 19:13         ` Lionel Draghi
2018-11-21 19:02       ` Lionel Draghi
2018-11-21 19:48         ` Simon Wright
2018-11-21 22:14           ` Lionel Draghi
2018-11-20 23:53   ` Randy Brukardt
2018-11-21  7:31     ` briot.emmanuel
2018-11-21 14:38       ` Shark8
2018-11-21 17:32       ` Simon Wright
2018-11-21 17:43         ` briot.emmanuel
2018-11-21 23:34       ` Randy Brukardt
2018-11-22  8:15         ` briot.emmanuel
2018-11-26 23:45           ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox