Re: Thick bindings to a C library and gnattest: suggestions?

comp.lang.ada
 help / color / mirror / Atom feed

From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Thick bindings to a C library and gnattest: suggestions?
Date: Mon, 1 Jul 2013 14:32:27 +0200
Date: 2013-07-01T14:32:27+02:00	[thread overview]
Message-ID: <5m9o5ouj1e2i.1h3w3i0aa3938$.dlg@40tude.net> (raw)
In-Reply-To: 40bf5a31-b09a-4106-a57a-7ac3dd5f951e@googlegroups.com

On Mon, 1 Jul 2013 04:11:55 -0700 (PDT), Maurizio Tomasi wrote:

>>> First question: the vectors used by the CFITSIO library are sometimes
>>> huge (millions of elements), sometimes very small (~ 10 elements).
>>> I decided to always allocate them on the heap, using declarations like
>>> these:
>> 
>> Why should bindings care about that?
> 
> Shouldn't they care? Perhaps I am missing something regarding the Ada
> language, in fact C++ does not care about whether an array is allocated on
> the stack or on the heap. But if I declare a function like the following:
> 
> function Read_Double_Array_From_FITS (File_Name : String) return Double_Array;
> 
> and I implement it in the following way:
> 
> --  Call the C functions that returns the number of rows in the file
> Num_Of_Elements := Get_Number_Of_Elements(FITS_File);
> declare
>   Vector_To_Return : Double_Array (1 .. Num_Of_Elements);
> begin
>   --  Call the C function that fills Vector_To_Return
>   ...
>   return Vector_To_Return;
> end;
> 
> then every time I read some huge array I will get a STORAGE_ERROR near the
> declaration of "Vector_To_Return". This is the reason why I declared the
> return type of "Read_Data_From_FITS" as a Double_Array_Ptr instead of a
> Double_Array, and implemented it as:
> 
> Num_Of_Elements := Get_Number_Of_Elements(FITS_File);
> declare
>   Vector_To_Return : Double_Array_Ptr;
> begin
>   Vector_To_Return := new Double_Array (1 .. Num_Of_Elements);
>   --  Call the C function that fills Vector_To_Return
>   ...
>   return Vector_To_Return;
> end;
> 
> Your answer makes me think there is a smarter way of avoiding the
> distinction between Double_Array_Ptr and Double_Array. (This would in fact
> make Ada much more like C++, where this distinction does not exist.) Is
> this the reason why in your example you declared Double_Array an array of
> *aliased* doubles? Or was your point just to use the address of its first member?

The first point is that it is not the objective of bindings to manage
memory. Of course, there could be bindings which do that, in which case you
would allocate objects transparently to the caller and have some garbage
collection schema behind opaque handles to the objects. This is a possible
design but it is not what you probably wanted. So let us take for granted
that it is the client's responsibility to allocate objects. In this case
the bindings shall work for any kind of objects allocated in any possible
memory pool, stack included.

Now, how would you do that? There are many ways.

1. Prior to Ada 2005, the usual method was one you find in
Ada.Text_IO.Get_Line. You use a procedure and a parameter telling how much
elements were written:

   procedure Read_Double_Array_From_FITS
       (A : in out Double_Array; Last : out Positive);

Here Last indicates the last element of A containing data. The
implementation would raise End_Error when there are more than A'Length
elements in A. Get_Line, for example, returns Last = A'Last meaning that
there is more to read. The caller uses A (A'First..Last) in further calls.

In my libraries I am using a slightly more universal approach:

   procedure Read_Double_Array_From_FITS
       (A : in out Double_Array; Pointer : in out Positive);

Here A (Pointer..A'Last) is where the result is stored and then Pointer is
advanced to the first element following the input. So the result is between
old pointer and new pointer - 1.

2. With Ada 2005 you can use return statement

   procedure Read_Double_Array_From_FITS return Double_Array is
   begin
       return Result : Double_Array (1..Get_Number_Of_Elements) do
           -- Fill Result here
       end return;
   end Read_Double_Array_From_FITS;

The caller is free to use this function with the allocator new:

   A : access Double_Array :=
           new Double_Array'(Read_Double_Array_From_FITS);

Theoretically the compiler could optimize temp object away. (You should
check if GNAT really does this)

Dealing with huge arrays I would prefer the approach #1. I would probably
allocate some scratch buffer and reuse it all over again.

Another approach is using #1 or #2 with some custom storage pool organized
as a stack or arena in order to minimize memory management overhead.

In any case, it is not the bindings' business.

> The fact that Ada arrays can have arbitrary bounds whom they
> carry is one of the things that made me interested towards Ada at the
> beginning. Why did you say this might be "troublesome"?

Because C arrays have none. When you want to pass an Ada array to C you
must flatten it. One way is to declare a subtype:

   procedure Bar (A : Double_Array) is
      subtype Flat is A (A'Range);
      B : Flat := ...;
   begin
      --- B does not have bounds and can be passed around as-is

Some pass pointer to the first element. After all, C's arrays are a
fiction.

Some use addresses. E.g. GtkAda bindings pass System.Address for any C
objects sparing headache of proper types. Purists would consider this
approach rather being sloppy.

> Given this context, is your suggestion of using Interfaces.C.Pointers still valid?

Yes.

>> Why don't you simply pass the array down to the C subprogram? You can do
>> something like:
>> 
>>    type Double_Array is array (Positive range <>)
>>       of aliased Interfaces.C.double;
>>    pragma Convention (C, Double_Array);
>>    procedure Foo (A : Double_Array);
>> 
>> Implementation:
>> 
>>    type Double_Ptr is access all Interfaces.C.double;
>>    pragma Convention (C, Double_Ptr);
>> 
>>    procedure Foo (A : Double_Array) is
>>    --
>>    -- Assuming foo's signature in C:
>>    --
>>    --    foo (double * a, unsigned n);
>>    --
>>       procedure Internal (A : Double_Ptr; N : Interfaces.C.unsigned);
>>       pragma Import (C, Internal, "foo");
>>    begin
>>       Internal (A (A'First)'Access, A'Length);
>>    end Foo;
> 
> But what if A'Length is so large that the array does not fit into the
> stack?

It is a client's problem. 

>>> I am sure there is some clever way to solve these two minor points,
>>> but so far I have not been able to find it. I tried e.g. to put
>>> "-lcfitsio" in the project file of the AdaFITS library, but with no
>>> success.
>> 
>> Make a library project file for cfitsio instead. "with" it from your
>> project. GNAT knows how to handle it and will add appropriate linker
>> switches to any project using it directly or indirectly. A library project
>> file could look like:
>> 
>> project cfitsio is
>>    for Externally_Built use "true"; -- Do not bother to compile me
>>    for Source_Files use (); -- No sources
>>    for Library_Dir use ".";   -- Where .llb, .a, .dll, .so are
>>    for Library_Name use "cfitsio"; -- Without "lib" prefix!
>>    for Library_Kind use "dynamic"; -- A DLL
>> end cfitsio;
> 
> This is really a good idea, I did not think about this! There are only two problems with this approach:
> 
> 1. The CFITSIO library is often compiled using ad-hoc flags in
> supercomputing facilities, in order to make it work better with the
> storage systems. I need to use the library provided by the system, not my
> own.

This is why Externally_Built is set true.

> 2. ...the system library is not always available as a dynamic .so file: in
> some cases I must statically link CFITSIO (the libcfitsio.so library is
> not available on every node of the cluster: when I discovered this, the
> system admin told me to link statically).

Then Library_Kind is "static". You can even have a scenario variable to
select if you wanted to link it statically or dynamically.

> Is there no other option?

There are other options, but they are incredibly intrusive, especially when
your bindings are themselves a library. Linker options are ignored for
library projects. They are not "transitive". I.e. each client project will
need to specify ever changing linker switches.

There is also a pragma for specify linker switches in the source code.
Which is obviously the worst maintenance nightmare one could ever
imagine...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

next prev parent reply	other threads:[~2013-07-01 12:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-01  9:02 Thick bindings to a C library and gnattest: suggestions? ziotom78
2013-07-01  9:45 ` Dmitry A. Kazakov
2013-07-01 11:11   ` Maurizio Tomasi
2013-07-01 11:41     ` Simon Wright
2013-07-01 12:00       ` Maurizio Tomasi
2013-07-01 12:42         ` Dmitry A. Kazakov
2013-07-01 19:07           ` Simon Wright
2013-07-01 12:32     ` Dmitry A. Kazakov [this message]
2013-07-01 12:41       ` Maurizio Tomasi
2013-07-01 12:47       ` Simon Wright
2013-07-02  8:55     ` Georg Bauhaus
2013-07-02  8:33   ` Maurizio Tomasi
2013-07-02  8:58     ` Dmitry A. Kazakov
2013-07-02 16:58     ` Robert A Duff
2013-07-02 17:00     ` Jeffrey Carter
2013-07-01 17:16 ` Jeffrey Carter
2013-07-02  4:24   ` Randy Brukardt
2013-07-02  4:37     ` Shark8
2013-07-02  5:04     ` tmoran
2013-07-02 22:27       ` Randy Brukardt
2013-07-03 12:02   ` Jacob Sparre Andersen
2013-07-02  3:16 ` Jerry
2013-07-02  4:02   ` Shark8

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox