comp.lang.ada
 help / color / mirror / Atom feed
* Representation clauses for base-64 encoding
@ 2011-12-22  9:41 Natasha Kerensikova
  2011-12-22 11:20 ` Niklas Holsti
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Natasha Kerensikova @ 2011-12-22  9:41 UTC (permalink / raw)


Hello,

the recent discussion about representation clauses vs explicit shifting
made me wonder about what is the Right Way of performing base-64
encoding (rfc 1421).

My first thoughts were along the following lines:

   type Octet is mod 256;
      --  or Character or Storage_Element or Stream_Element
      --  or whatever 8-bit type relevant for the appliication

   for Octet'Size use 8;
   for Octet'Component_Size use 8;
   for Octet'Bit_Order use System.Low_Order_First;

   type Base_64_Digit is mod 64;

   for Base_64_Digit'Size use 6;
   for Base_64_Digit'Component_Size use 6;
   for Base_64_Digit'Bit_Order use System.Low_Order_First;

   type Octet_Block is array (1 .. 3) of Octet;
   pragma Pack (Octet_Block);

   type Base_64_Block is array (1 .. 4) of Base_64_Digit;
   pragma Pack (Base_64_Block);

   function Split_Base_64 is new Ada.Unchecked_Conversion
     (Source => Octet_Block, Target => Base_64_Block);

   function Merge_Base_64 is new Ada.Unchecked_Conversion
     (Source => Base_64_Block, Target => Octet_Block);


However, if I understand 13.3(73) correctly, conforming compilers don't
have to support such arrays (unless 6 and 8 are both factor or multiple
of word size, but I guess there are not many 2-bit or 24-bit platforms
around).

It seems a more portable but uglier way of doing it is using record:
instead of arrays:

   type Octet_Block is record
      P, Q, R : Octet;
   end record;

   for Octet_Block use record
      P at 0 range 0 .. 7;
      Q at 0 range 8 .. 15;
      R at 0 range 16 .. 23;
   end record;

   type Base_64_Block is record
      A, B, C, D : Base_64_Digit;
   end record;

   for Base_64_Block use record
      A at 0 range 0 .. 5;
      B at 0 range 6 .. 11;
      C at 0 range 12 .. 17;
      D at 0 range 18 .. 23;
   end record;

Though I guess it might not work so well in 16-bit platforms.

So is there a better way of doing it? Is it acceptable to handle
portability with different bodies for a spec that only contains the
Split_Base_64 and Merge_Base_64 functions?

Or is there some things I'm missing that makes even that non-portable or
even incorrect?


Thanks in advance for sharing your wisdom,
Natasha



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22  9:41 Representation clauses for base-64 encoding Natasha Kerensikova
@ 2011-12-22 11:20 ` Niklas Holsti
  2011-12-23  1:30   ` Randy Brukardt
  2011-12-22 11:37 ` Georg Bauhaus
  2011-12-23  1:33 ` Randy Brukardt
  2 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-22 11:20 UTC (permalink / raw)


On 11-12-22 11:41 , Natasha Kerensikova wrote:
> Hello,
>
> the recent discussion about representation clauses vs explicit shifting
> made me wonder about what is the Right Way of performing base-64
> encoding (rfc 1421).
>
> My first thoughts were along the following lines:
>
>     type Octet is mod 256;
>        --  or Character or Storage_Element or Stream_Element
>        --  or whatever 8-bit type relevant for the appliication
>
>     for Octet'Size use 8;
>     for Octet'Component_Size use 8;
>     for Octet'Bit_Order use System.Low_Order_First;

The compiler should reject that Bit_Order clause, because Octet is not a 
record type (RM 13.5.3(4)).

What did you want to achieve with that clause?

>
>     type Base_64_Digit is mod 64;
>
>     for Base_64_Digit'Size use 6;
>     for Base_64_Digit'Component_Size use 6;
>     for Base_64_Digit'Bit_Order use System.Low_Order_First;

Same comment and question as above, for Octet.

>
>     type Octet_Block is array (1 .. 3) of Octet;
>     pragma Pack (Octet_Block);

I would add the following, to check that packing is effective:

       for Octet_Block'Size use 24;

>
>     type Base_64_Block is array (1 .. 4) of Base_64_Digit;
>     pragma Pack (Base_64_Block);

Same comment as for Octet_Block.

>     function Split_Base_64 is new Ada.Unchecked_Conversion
>       (Source =>  Octet_Block, Target =>  Base_64_Block);
>
>     function Merge_Base_64 is new Ada.Unchecked_Conversion
>       (Source =>  Base_64_Block, Target =>  Octet_Block);
>
>
> However, if I understand 13.3(73) correctly, conforming compilers don't
> have to support such arrays (unless 6 and 8 are both factor or multiple
> of word size, but I guess there are not many 2-bit or 24-bit platforms
> around).

Right (I assume you meant *12*-bit or 24-bit).

>
> It seems a more portable but uglier way of doing it is using record:
> instead of arrays:
>
>     type Octet_Block is record
>        P, Q, R : Octet;
>     end record;

Here you might want to specify Octet_Block'Bit_Order.

>
>     for Octet_Block use record
>        P at 0 range 0 .. 7;
>        Q at 0 range 8 .. 15;
>        R at 0 range 16 .. 23;
>     end record;
>
>     type Base_64_Block is record
>        A, B, C, D : Base_64_Digit;
>     end record;

Ditto Base_64_Block'Bit_Order.

>
>     for Base_64_Block use record
>        A at 0 range 0 .. 5;
>        B at 0 range 6 .. 11;
>        C at 0 range 12 .. 17;
>        D at 0 range 18 .. 23;
>     end record;
>
> Though I guess it might not work so well in 16-bit platforms.

Maybe. It depends on the default bit-ordering and on the size of the 
"largest machine scalar", whatever that is -- that depends on what the 
compiler writer considers "convenient and efficient" (RM 13.3(8.1/2)).

>
> So is there a better way of doing it?

Do you expect that all the octet-strings to be encoded have a number of 
octets that is a multiple of 3, and conversely that all the base-64 
strings to be decoded have a length that is a multiple of 4? If not, I 
think that using 24-bit encoding/decoding buffers as in your example can 
be cumbersome, in addition to the portability problems.

An alternative is to make the array types Octet_Block and Base_64_Block 
long enough to hold the longest possible input/output strings (but still 
be definite), specify their Component_Sizes as 8 and 6 bits (hoping that 
the compiler accepts this), and apply Unchecked_Conversions on the 
entire arrays. But I would be afraid of problems at the ends of strings 
that only partially fill the last word.

For these reasons, I would definitely choose a shifting method.

I would use an Interfaces.Unsigned_16 or _32 as a buffer that contains 
some number of bits in its least-significant end. Initially the buffer 
is empty (zero bits) and cleared to all bits zero.

To encode a string of 8-bit groups into a string of 6-bit groups 
(omitting the "padding" digits that base-64 sometimes requires):

    while (there are 8-bit groups left) loop
       -- Invariant: buffer contains less than 6 bits.

       inset the next 8-bit group into the buffer by
       left-shifting the buffer for 8 positions and
       or'ing in the next 8-bit group;

       while (the buffer contains at least 6 bits) loop
          output the most significant 6 bits of the buffer
          and remove them from the buffer;
       end loop;

    end loop;

    output any bits (at most 5) left over in the buffer;

Similar code can be used to decode a string of 6-bit groups into a 
string of 8-bit groups.

 > Is it acceptable to handle
 > portability with different bodies for a spec that only contains the
 > Split_Base_64 and Merge_Base_64 functions?

I would accept it, but I still consider the shifting method better and 
safer.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22  9:41 Representation clauses for base-64 encoding Natasha Kerensikova
  2011-12-22 11:20 ` Niklas Holsti
@ 2011-12-22 11:37 ` Georg Bauhaus
  2011-12-22 12:24   ` Niklas Holsti
  2011-12-23  1:33 ` Randy Brukardt
  2 siblings, 1 reply; 23+ messages in thread
From: Georg Bauhaus @ 2011-12-22 11:37 UTC (permalink / raw)


On 22.12.11 10:41, Natasha Kerensikova wrote:
> Hello,
>
> the recent discussion about representation clauses vs explicit shifting
> made me wonder about what is the Right Way of performing base-64
> encoding (rfc 1421).


> My first thoughts were along the following lines:
>
>     type Octet is mod 256;
>        --  or Character or Storage_Element or Stream_Element
>        --  or whatever 8-bit type relevant for the appliication
>
>     for Octet'Size use 8;
>     for Octet'Component_Size use 8;

Here I would stop.

The RFC says that a value from the range 0 .. 63 is associated with
a character from a specific set of characters, for encoding it:
'A' .. 'Z', 'a' .. 'z', '+', '/'.
And there is a "pad", '='.
Since the characters shall stand for 0 .. 25, 26 .. 51, 52 .. 63,
this specifies a range, actually.
  In Ada, the 1:1 translation into a type can be:

type Repertoire is (
  'A','B','C','D','E','F','G','H','I','J','K','L','M',
  'N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
  'a','b','c','d','e','f','g','h','i','j','k','l','m',
  'n','o','p','q','r','s','t','u','v','w','x','y','z',
  '0','1','2','3','4','5','6','7','8','9',
  '+','/','=');
subtype Base_64_Character is Repertoire range 'A' .. '/';
subtype Padding is Repertoire range '=' .. '=';

Note that you could have string literals of these:

   type Base_64_String is
     array (Positive range <>) of Repertoire;
   
   S : Base_64_String := "ABC="; -- but not "ABC!"

The language guarantees that each of the literals is associated
with just the positional number that Base 64 encoding requires.

Let the compiler choose the best representation for Repertoire
subtypes when encoding.

Or simply use subtypes of Character.

Only if you need some representation in memory or other storage
that has 'Size /= Character'Size, or 'Size /= Repertoire'Size
etc, derive new types as needed, and add representation clauses:

http://www.adacore.com/2008/03/03/gem-27/
http://www.adacore.com/2008/03/17/gem-28/

For streaming encoded text over the wire, a subtype of String should
serve the job just fine, convert as necessary. Or use Base_64_String.

Packing and unpacking can be quite expensive.

I remember at least two publicly available Base 64 encoding
packages, one by Tom Moran IIRC, and one in AWS. There are
probably more in the PAL.


-- Georg



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22 11:37 ` Georg Bauhaus
@ 2011-12-22 12:24   ` Niklas Holsti
  2011-12-22 15:09     ` Georg Bauhaus
  2011-12-23  1:42     ` Randy Brukardt
  0 siblings, 2 replies; 23+ messages in thread
From: Niklas Holsti @ 2011-12-22 12:24 UTC (permalink / raw)


On 11-12-22 13:37 , Georg Bauhaus wrote:
> On 22.12.11 10:41, Natasha Kerensikova wrote:
>> Hello,
>>
>> the recent discussion about representation clauses vs explicit shifting
>> made me wonder about what is the Right Way of performing base-64
>> encoding (rfc 1421).
>
>
>> My first thoughts were along the following lines:
>>
>> type Octet is mod 256;
>> -- or Character or Storage_Element or Stream_Element
>> -- or whatever 8-bit type relevant for the appliication
>>
>> for Octet'Size use 8;
>> for Octet'Component_Size use 8;
>
> Here I would stop.
>
> The RFC says that a value from the range 0 .. 63 is associated with
> a character from a specific set of characters, for encoding it:
> 'A' .. 'Z', 'a' .. 'z', '+', '/'.
> And there is a "pad", '='.
> Since the characters shall stand for 0 .. 25, 26 .. 51, 52 .. 63,
> this specifies a range, actually.
> In Ada, the 1:1 translation into a type can be:
>
> type Repertoire is (
> 'A','B','C','D','E','F','G','H','I','J','K','L','M',
> 'N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
> 'a','b','c','d','e','f','g','h','i','j','k','l','m',
> 'n','o','p','q','r','s','t','u','v','w','x','y','z',
> '0','1','2','3','4','5','6','7','8','9',
> '+','/','=');
> subtype Base_64_Character is Repertoire range 'A' .. '/';
> subtype Padding is Repertoire range '=' .. '=';

This defines a nice Ada representation of the six-bit codes. But this 
was not Natasha's question; the question was about the slicing of a 
sequence of bits, composed from 8-bit groups, into a sequence of 6-bit 
groups.

> Let the compiler choose the best representation for Repertoire
> subtypes when encoding.

The point of base-64 encoding is to emit the six-bit groups as ordinary 
Characters (using whatever character encoding is standard, for example 
Latin-1). The compiler's internal representation of Repertoire elements 
is not suitable; the Repertoire literal 'A' should be emitted as the 
Character 'A', not as six zero bits. Natasha did not mention that, of 
course, since the focus was on the mapping between 8-bit and 6-bit 
slices of the bit-string.

> Only if you need some representation in memory or other storage
> that has 'Size /= Character'Size, or 'Size /= Repertoire'Size
> etc, derive new types as needed, and add representation clauses:

That was the point, but the problem is the difficulty of making 
representation clauses portable.

>
> http://www.adacore.com/2008/03/03/gem-27/

I am surprised and disappointed that there is *no* mention of 
portability problems in that "gem". This is marketing hype for GNAT, not 
sound programming advice.

> For streaming encoded text over the wire, a subtype of String should
> serve the job just fine, convert as necessary.

Huh? It is not possible to convert between Repertoire and Character, 
except through Unchecked_Conversion, which does not work as desired in 
this case. The best method would be an array indexed by Repertoire and 
containing Character elements, mapping the Repertoire literal 'A' to the 
Character 'A', etc.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22 12:24   ` Niklas Holsti
@ 2011-12-22 15:09     ` Georg Bauhaus
  2011-12-22 16:00       ` Natasha Kerensikova
  2011-12-23  1:42     ` Randy Brukardt
  1 sibling, 1 reply; 23+ messages in thread
From: Georg Bauhaus @ 2011-12-22 15:09 UTC (permalink / raw)


On 22.12.11 13:24, Niklas Holsti wrote:

> This defines a nice Ada representation of the six-bit codes. But this was not Natasha's question;

You are quite right, sorry.

Stubbornly, I'd like to mumble, though, that the very
notion of representation is at odds with portability.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22 15:09     ` Georg Bauhaus
@ 2011-12-22 16:00       ` Natasha Kerensikova
  2011-12-22 22:18         ` Georg Bauhaus
  0 siblings, 1 reply; 23+ messages in thread
From: Natasha Kerensikova @ 2011-12-22 16:00 UTC (permalink / raw)


Hello,

On 2011-12-22, Georg Bauhaus <rm.dash-bauhaus@futureapps.de> wrote:
> On 22.12.11 13:24, Niklas Holsti wrote:
>
>> This defines a nice Ada representation of the six-bit codes. But this
>> was not Natasha's question;
>
> You are quite right, sorry.
>
> Stubbornly, I'd like to mumble, though, that the very
> notion of representation is at odds with portability.

However here representation is not used as a notion, only as a tool:
using explicit shifts and masks, it is possible to write portable Ada
that performs the correct split of 3 octets on any platform.

The previous argument was that representation clauses allow more
readable code, which I'm inclined to believe. But is it really necessary
to give up portability for the sake of readability?

(I would answer "no" in that particular case, since base-64 splitting is
such a very simple and well-known operation)


Natasha



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22 16:00       ` Natasha Kerensikova
@ 2011-12-22 22:18         ` Georg Bauhaus
  2011-12-25 10:17           ` Niklas Holsti
  0 siblings, 1 reply; 23+ messages in thread
From: Georg Bauhaus @ 2011-12-22 22:18 UTC (permalink / raw)


On 22.12.11 17:00, Natasha Kerensikova wrote:

> However here representation is not used as a notion, only as a tool:
> using explicit shifts and masks, it is possible to write portable Ada
> that performs the correct split of 3 octets on any platform.
>
> The previous argument was that representation clauses allow more
> readable code, which I'm inclined to believe. But is it really necessary
> to give up portability for the sake of readability?

There is middle ground, I think, insofar as it is possible
to extract bits in Ada without thinking about shifts and masks,
or logical operations.

Given a stream chopped into octets, the goal is to extract slices
of 6 consecutive bits and represent these using the characters
from a Base 64 encoding table. Leave the how-to of extraction to
the compiler, just say which bits. This does not use representation
clauses for extraction, and not shifts or mask either.  Without claiming
originality, completeness, enough subtypes, portability of bit indexing
of bits in a single octet, or sufficient code quality, the following might
illustrate the convenience of arrays of packed Booleans (guaranteed by
the LRM to have desirable properties):


package B64 is
    --
    --  prints characters representing sequences of octets in
    --  base 64 encoding. The octets come in via `Add`.
    --
    pragma Elaborate_Body(B64);

    type Repertoire is (    'A',   'B',   'C',   'D',   'E',   'F',   'G',
       'H',   'I',   'J',   'K',   'L',   'M',   'N',   'O',   'P',   'Q',
       'R',   'S',   'T',   'U',   'V',   'W',   'X',   'Y',   'Z',   'a',
       'b',   'c',   'd',   'e',   'f',   'g',   'h',   'i',   'j',   'k',
       'l',   'm',   'n',   'o',   'p',   'q',   'r',   's',   't',   'u',
       'v',   'w',   'x',   'y',   'z',   '0',   '1',   '2',   '3',   '4',
       '5',   '6',   '7',   '8',   '9',   '+',   '/' );
    for Repertoire'Size use 6;

    Pad : constant Character := '=';

    subtype Bit_Index is Natural range 0..23;

    type Bit_String is array(Bit_Index range <>) of Boolean;
    pragma Pack(Bit_String);

    subtype Octet is Bit_String(Bit_Index range 0..7);

    procedure Add(Bits : Octet);
                    -- Take 6 bits and write a corresponding Base 64 character.
                    -- Uses bits from this `Bits`, and bits from last time `Add`
                    -- was called; saves bits for later use.


    procedure Finish;
                                  -- handle any left over bits and finish output


end B64;

with Ada.Text_IO;
with Ada.Unchecked_Conversion;

package body B64 is
    
    subtype Base_64_Digit is Bit_String(Bit_Index range 0..5);
    subtype Word is Bit_String(Bit_Index range 0..15);
    function To_B64 is new Ada.Unchecked_Conversion(
          Base_64_Digit, Repertoire);

    procedure Write_64(C : Repertoire) is
       Position : Natural;
    begin
       case C is
          when 'A'..'Z' =>
             Position := Character'Pos('A') + Repertoire'Pos(C);
          when 'a'..'z' =>
             Position := Character'Pos('a')
             + (Repertoire'Pos(C) - Repertoire'Pos('a'));
          when '0'..'9' =>
             Position := Character'Pos('0')
             + (Repertoire'Pos(C) - Repertoire'Pos('0'));
          when '+' =>
             Position := Character'Pos('+');
          when '/' =>
             Position := Character'Pos('/');
       end case;

       Ada.Text_IO.Put(Character'Val(Position));
    end Write_64;


    --
    --  state information
    --


    type Selection is mod 4;
                        -- one of the four groups of 6 bits in a full bit string


    Scratch           : Word := (others => False);
                                 -- buffer storing left over bits for future use

    Position_In_Group : Selection := Selection'First;

    procedure Add(Bits : Octet) is
       Six_Pack : Base_64_Digit;
                                        -- six bits ready to be processed
    begin
       case Position_In_Group is
          when 0 =>
             Six_Pack := Base_64_Digit(Bits(2..7));
             Write_64(To_B64(Six_Pack));
             Scratch(8..15) := Bits;
             Position_In_Group := Selection'Succ(Position_In_Group);

          when 1 =>
             Scratch(4..7) := Bits(4..7);
             Six_Pack := Base_64_Digit(Scratch(4..9));
             Write_64(To_B64(Six_Pack));
             Scratch(8..15) := Bits;
             Position_In_Group := Selection'Succ(Position_In_Group);

          when 2 =>
                                  -- 4 bits left in `Scratch` plus 8 from `Bits`
                                  -- is worth two output characters

             Scratch(6..7) := Bits(6..7);
             Six_Pack := Base_64_Digit(Scratch(6..11));
             Write_64(To_B64(Six_Pack));
             Position_In_Group := Selection'Succ(Position_In_Group);
             Six_Pack := Base_64_Digit(Bits(0..5));
             Write_64(To_B64(Six_Pack));
             Scratch(8..15) := (others => False);
             Position_In_Group := Selection'Succ(Position_In_Group);

          when 3 =>
                                   -- this won't happen, see comment on case `2`
             raise Program_Error;

       end case;
    end Add;

    procedure Finish is
    begin
       Ada.Text_IO.Put("NOT DONE");
       Ada.Text_IO.Flush;
    end Finish;


end B64;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22 11:20 ` Niklas Holsti
@ 2011-12-23  1:30   ` Randy Brukardt
  2011-12-26  8:33     ` Niklas Holsti
  0 siblings, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2011-12-23  1:30 UTC (permalink / raw)


"Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message 
news:9lgi3jFhaU1@mid.individual.net...
> On 11-12-22 11:41 , Natasha Kerensikova wrote:
...
>>     type Octet_Block is array (1 .. 3) of Octet;
>>     pragma Pack (Octet_Block);
>
> I would add the following, to check that packing is effective:
>
>       for Octet_Block'Size use 24;

I think it would probably be better to forget Pack and actually say what you 
mean here:

    type Octet_Block is array (1 .. 3) of Octet;
    for Octet_Block'Component_Size use 8;

"Pack" should be reserved for cases where you don't care about the exact 
layout, you just want to save space. Here you do care about the exact 
layout, so use a Component_Size.

                                  Randy.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22  9:41 Representation clauses for base-64 encoding Natasha Kerensikova
  2011-12-22 11:20 ` Niklas Holsti
  2011-12-22 11:37 ` Georg Bauhaus
@ 2011-12-23  1:33 ` Randy Brukardt
  2 siblings, 0 replies; 23+ messages in thread
From: Randy Brukardt @ 2011-12-23  1:33 UTC (permalink / raw)


"Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message 
news:slrnjf5uol.1lme.lithiumcat@sigil.instinctive.eu...
>   type Octet is mod 256;
>      --  or Character or Storage_Element or Stream_Element
>      --  or whatever 8-bit type relevant for the appliication
>
>   for Octet'Size use 8;
>   for Octet'Component_Size use 8;

Component_Size is only for array types, put it on the array type.

>   for Octet'Bit_Order use System.Low_Order_First;

Bit_Order is only for record types, put that on the record type.

                             Randy.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22 12:24   ` Niklas Holsti
  2011-12-22 15:09     ` Georg Bauhaus
@ 2011-12-23  1:42     ` Randy Brukardt
  2011-12-28  8:59       ` Niklas Holsti
  1 sibling, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2011-12-23  1:42 UTC (permalink / raw)


"Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message 
news:9lgls6FticU1@mid.individual.net...
...
>> http://www.adacore.com/2008/03/03/gem-27/
>
> I am surprised and disappointed that there is *no* mention of portability 
> problems in that "gem". This is marketing hype for GNAT, not sound 
> programming advice.

There is no portability problem in common use. For instance, Natasha's code 
(once the rep. clauses are right) will work on both a big-endian and 
little-endian machine without change. (And yes, I'd use the record rep. 
clauses to do this conversion; I think I have some code that looks very much 
like his sample.)

The problem occurs when you need to process big-endian data on a 
little-endian machine (or vice versa). That is not that common of a need --  
it surely happens (as in your application), but you are overstating it if 
you think most programs would run into it.

Most code does not care which bits are which bits in the word -- like this 
encoding code -- they just need a consistent representation. That is, it 
doesn't matter where bit 0 is, so long as it is the same in every item that 
is processed.

                                       Randy.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-22 22:18         ` Georg Bauhaus
@ 2011-12-25 10:17           ` Niklas Holsti
  2011-12-27 11:23             ` Georg Bauhaus
  0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-25 10:17 UTC (permalink / raw)


On 11-12-23 00:18 , Georg Bauhaus wrote:
> On 22.12.11 17:00, Natasha Kerensikova wrote:
>
>> However here representation is not used as a notion, only as a tool:
>> using explicit shifts and masks, it is possible to write portable Ada
>> that performs the correct split of 3 octets on any platform.
>>
>> The previous argument was that representation clauses allow more
>> readable code, which I'm inclined to believe. But is it really necessary
>> to give up portability for the sake of readability?
>
> There is middle ground, I think, insofar as it is possible
> to extract bits in Ada without thinking about shifts and masks,
> or logical operations.
>
> Given a stream chopped into octets, the goal is to extract slices
> of 6 consecutive bits and represent these using the characters
> from a Base 64 encoding table. Leave the how-to of extraction to
> the compiler, just say which bits. This does not use representation
> clauses for extraction, and not shifts or mask either. Without claiming
> originality, completeness, enough subtypes, portability of bit indexing
> of bits in a single octet, or sufficient code quality, the following might
> illustrate the convenience of arrays of packed Booleans (guaranteed by
> the LRM to have desirable properties):

What are these "desirable properties" on which you rely, and where in 
the LRM are they guaranteed?

> package B64 is

  [snip]

>    subtype Bit_Index is Natural range 0..23;
>
>    type Bit_String is array(Bit_Index range <>) of Boolean;
>    pragma Pack(Bit_String);
>
>    subtype Octet is Bit_String(Bit_Index range 0..7);

  [snip]

 >    subtype Base_64_Digit is Bit_String(Bit_Index range 0..5);

  [snip]

>    procedure Add(Bits : Octet) is
>       Six_Pack : Base_64_Digit;
>       -- six bits ready to be processed
>    begin
>       case Position_In_Group is
>          when 0 =>
>             Six_Pack := Base_64_Digit(Bits(2..7));

This assumes that the bits in an Octet are indexed in order of 
increasing significance, so that the slice Bits(2..7) gives the six most 
significant bits. Since you defined the Octet type, you are of course 
free to assume this.

But your code does not show how the Octet values are derived from the 
original data, e.g. from a stream of Unsigned_8, or whatever binary data 
is to be encoded as Base-64. That is an essential part of the problem, 
and without it we cannot know if your approach works, or what operations 
it really needs. There are certainly ways to convert an Unsigned_8, for 
example, into an Octet value, such that the bits are indexed in 
increasing significance order, but it requires either shifting and 
masking, or the equivalent division or mod operations, or 
Unchecked_Conversion to a record type with eight Boolean components and 
a representation clause and a Bit_Order clause.

If you use an Unchecked_Conversion directly from Unsigned_8 to Octet, 
you are assuming that the compiler implementes the indexing of Octet in 
significance order. Where does the LRM guarantee that? I don't think it 
does.

Your Unchecked_Conversion (not quoted) from Base_64_Digit to Repertoire 
has the same problem, so at least that part of your code is not portable.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-23  1:30   ` Randy Brukardt
@ 2011-12-26  8:33     ` Niklas Holsti
  2011-12-28  0:09       ` Randy Brukardt
  0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-26  8:33 UTC (permalink / raw)


On 11-12-23 03:30 , Randy Brukardt wrote:
> "Niklas Holsti"<niklas.holsti@tidorum.invalid>  wrote in message
> news:9lgi3jFhaU1@mid.individual.net...
>> On 11-12-22 11:41 , Natasha Kerensikova wrote:
> ...
>>>      type Octet_Block is array (1 .. 3) of Octet;
>>>      pragma Pack (Octet_Block);
>>
>> I would add the following, to check that packing is effective:
>>
>>        for Octet_Block'Size use 24;
>
> I think it would probably be better to forget Pack and actually say what you
> mean here:
>
>      type Octet_Block is array (1 .. 3) of Octet;
>      for Octet_Block'Component_Size use 8;
>
> "Pack" should be reserved for cases where you don't care about the exact
> layout, you just want to save space. Here you do care about the exact
> layout, so use a Component_Size.

Fine adjustment: when Component_Size is not a factor or multiple of the 
word size, RM 13.3(73) says that the array may also have to be packed in 
order to eliminate gaps between components.

Of course, a Component_Size of 8 bits is usually a factor of the word 
size. But perhaps not always.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-25 10:17           ` Niklas Holsti
@ 2011-12-27 11:23             ` Georg Bauhaus
  2011-12-27 19:37               ` Niklas Holsti
  0 siblings, 1 reply; 23+ messages in thread
From: Georg Bauhaus @ 2011-12-27 11:23 UTC (permalink / raw)


On 25.12.11 11:17, Niklas Holsti wrote:

>> There is middle ground, I think, insofar as it is possible
>> to extract bits in Ada without thinking about shifts and masks,
>> or logical operations.

Just saying that using Ada, one may write operations involving
single bits or slices of single bits simply by using arrays
of Booleans that are packed.

I think I didn't say that one may read portably from I/O ports?

>> Without claiming [...] portability of bit indexing
>> of bits in a single octet [...] 

Thanks for illustrating my words in my code. ;-)

> What are these "desirable properties" on which you rely, and where in the LRM
> are they guaranteed?

Everything (the "desirable properties") that an interested party
might conclude after reading LRM 13.2. Yes, on some planet there
might be an Ada compiler that does not handle packed bits
reasonably well. ;-)

13.2 drops alignment requirements dropped, in particular, so that
I do not need to think about boundaries of storage element size
groups of bits.

The BIT_VECTOR in the Rationale of Ada 83 indicates awareness of
Pascals SET type, too, I should think. (Don't know if Wirth's paper
on representation of sets is an influence, or just a summary of
existing practice.)

The Ada 85 Quality and Style Guide has this to say, in Chapter 10,
http://www.adaic.org/resources/add_content/docs/95style/html/sec_10/10-6-3.html

"Use modular types rather than packed Boolean arrays
 when measured performance indicates."

"Measured performance indicates", then , whether or not to
resort to explicit shifting and logical operations.

I looked at how compilers will translate operations on slices of
Booleans (-Os is an interesting option with GNAT). They will emit
instructions for shifting and logical operations; no surprise.


So, would there be a portable Base64 algorithm that reads from
a stream of storage elements, perhaps from a typical controller's
8-bit I/O port, that

(*) shifts and "reasons" more reliably, readably, and portably,

(*) performs shifts and logical operations much faster
    than the shifts and logical operations generated by
    typical compilers for bit slices or representations,

(*) runs on heterogeneous hardware with word size <= 16 bits?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-27 11:23             ` Georg Bauhaus
@ 2011-12-27 19:37               ` Niklas Holsti
  2011-12-27 20:49                 ` Robert A Duff
  0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-27 19:37 UTC (permalink / raw)


On 11-12-27 13:23 , Georg Bauhaus wrote:
> On 25.12.11 11:17, Niklas Holsti wrote:

Actually, George Bauhaus wrote:

>>> There is middle ground, I think, insofar as it is possible
>>> to extract bits in Ada without thinking about shifts and masks,
>>> or logical operations.
>
> Just saying that using Ada, one may write operations involving
> single bits or slices of single bits simply by using arrays
> of Booleans that are packed.

Whether a Boolean array is packed to Component_Size = 1, with no gaps, 
depends on the Ada compiler. It is not a hard requirement in the Ada RM, 
as I understand it.

> I think I didn't say that one may read portably from I/O ports?

That's right, you didn't. But the "Base 64 encoding" problem by 
definition starts from a sequence of 8-bit groups, with a defined bit 
order: the first 6-bit group must be the high 6 bits of the first 8-bit 
group, and so on. This is a core part of the problem.

>>> Without claiming [...] portability of bit indexing
>>> of bits in a single octet [...]
>
> Thanks for illustrating my words in my code. ;-)

I didn't understand what you meant by that "without claiming", which is 
why I asked you to be clearer about the "desirable properties" that you 
assumed.

>> What are these "desirable properties" on which you rely, and where in the LRM
>> are they guaranteed?
>
> Everything (the "desirable properties") that an interested party
> might conclude after reading LRM 13.2.

All of RM 13.2 is non-binding recommendation; the text defines the 
recommended level of support, using "should". The only "shall" in 13.2 
is in a legality rule. No properties of packed Boolean arrays are 
"guaranteed" by 13.2.

> Yes, on some planet there
> might be an Ada compiler that does not handle packed bits
> reasonably well. ;-)

This is the kind of argument I am used to seeing in comp.lang.c: "it 
works for me, so never mind that the C standard says it is undefined 
behaviour".

Why didn't the RM authors write strict "shall" requirements in RM 13.2, 
and the rest of chapter 13? Apparently, they felt that some Ada 
compilers would not be able to implement all the recommendations with 
reasonable effort and performance.

> 13.2 drops alignment requirements dropped, in particular, so that
> I do not need to think about boundaries of storage element size
> groups of bits.

But there is not even a recommendation on the order in which bits in a 
packed Boolean array are indexed, as far as I can see.

> The Ada 85 Quality and Style Guide has this to say, in Chapter 10,
> http://www.adaic.org/resources/add_content/docs/95style/html/sec_10/10-6-3.html
>
> "Use modular types rather than packed Boolean arrays
>   when measured performance indicates."
>
> "Measured performance indicates", then , whether or not to
> resort to explicit shifting and logical operations.

Sure, if your program is too slow (but works) using packed Boolean 
arrays, you can try to speed it up by using modular types instead (but 
it might not become faster). That says nothing about portability; if you 
were happy with the limited (IMO) portability of packed Boolean arrays, 
you will not lose portability by moving to modular types (but you may 
have to find out in which order your Ada compiler indexes packed Boolean 
arrays, in order to transform your program correctly).

>
> I looked at how compilers will translate operations on slices of
> Booleans (-Os is an interesting option with GNAT). They will emit
> instructions for shifting and logical operations; no surprise.

No surprise, I agree. But the problem is the undefined indexing order.

Packed Boolean arrays are good and useful for bit-vector operations. But 
Unchecked_Conversion to and from other types, such as modular types, is 
not portable.

>
> So, would there be a portable Base64 algorithm that reads from
> a stream of storage elements, perhaps from a typical controller's
> 8-bit I/O port, that
>
> (*) shifts and "reasons" more reliably, readably, and portably,
>

Sure, use some Interfaces.Unsigned_N as a bit-buffer, as I sketched in 
an earlier message.

> (*) performs shifts and logical operations much faster
>      than the shifts and logical operations generated by
>      typical compilers for bit slices or representations,
>

You want a portable method that is *much faster* than less portable 
methods? That is a lot to ask...

The Unsigned_N-bit-buffer method is probably no slower than the 
unportable bit-slice method. The Unsigned_N-bit-buffer method may be 
slower than the method that uses records with representation clauses to 
convert three 8-bit groups into four 6-bix groups at a time, but the 
latter needsa machine with at least 24-bit "machine scalars".

> (*) runs on heterogeneous hardware with word size<= 16 bits?

The bit-buffer method works easily with any Unsigned_N with N >= 8+5 = 
13, so Unsigned_16 is ok. It can be made to work with just Unsigned_8, 
but with more difficulty.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-27 19:37               ` Niklas Holsti
@ 2011-12-27 20:49                 ` Robert A Duff
  2011-12-27 23:47                   ` Niklas Holsti
  0 siblings, 1 reply; 23+ messages in thread
From: Robert A Duff @ 2011-12-27 20:49 UTC (permalink / raw)


Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

> Whether a Boolean array is packed to Component_Size = 1, with no gaps,
> depends on the Ada compiler. It is not a hard requirement in the Ada RM,
> as I understand it.

For any compiler that supports the SP annex (which pretty-much all do),
it is a hard requirement.  But if you're depending on Component_Size = 1
for the logical correctness of your program, it's better to say
"for T'Component_Size use 1;" (which also must be supported if the SP annex
applies).

> All of RM 13.2 is non-binding recommendation; the text defines the
> recommended level of support, using "should". The only "shall" in 13.2
> is in a legality rule. No properties of packed Boolean arrays are
> "guaranteed" by 13.2.

Right, but C.2(2) turns all those "should"s into "shall"s.

> But there is not even a recommendation on the order in which bits in a
> packed Boolean array are indexed, as far as I can see.

True, but in practice, it will follow the endianness of the machine.
Likewise, the RM doesn't say which bits of an integer represent
what, but in practice, the implementation is unlikely to make
bit 17 be the high-order bit of a 32-bit integer.  ;-)

I'd prefer these rules to be nailed down better.  But they're not, so
you really have no choice but to rely to some extent on compilers doing
sensible things.

> Sure, if your program is too slow (but works) using packed Boolean
> arrays, you can try to speed it up by using modular types instead (but
> it might not become faster). That says nothing about portability; if you
> were happy with the limited (IMO) portability of packed Boolean arrays,
> you will not lose portability by moving to modular types (but you may
> have to find out in which order your Ada compiler indexes packed Boolean
> arrays, in order to transform your program correctly).

GNAT turns small packed arrays into modular integers internally, so you
should expect to get the same efficiency.

- Bob



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-27 20:49                 ` Robert A Duff
@ 2011-12-27 23:47                   ` Niklas Holsti
  2011-12-29  0:50                     ` Robert A Duff
  0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-27 23:47 UTC (permalink / raw)


On 11-12-27 22:49 , Robert A Duff wrote:
> Niklas Holsti<niklas.holsti@tidorum.invalid>  writes:
>
>> Whether a Boolean array is packed to Component_Size = 1, with no gaps,
>> depends on the Ada compiler. It is not a hard requirement in the Ada RM,
>> as I understand it.
>
> For any compiler that supports the SP annex (which pretty-much all do),
> it is a hard requirement.

Yes, a compiler cannot claim to support annex C (Systems Programming) 
unless it implements chapter 13 as recommended, so that all the 
"shoulds" are implemented. But this is only an argument for "probable" 
portability, since supporting annex C is optional.

For this discussion, I have again studied the chapter 13 rules about 
Bit_Order and record representation clauses. I think I understand now 
how it is meant to work, at least to some extent. As I understand it, if 
you specify Bit_Order and want your record representation clauses to be 
portable, the form of your clauses is strongly limited by the size of 
the "largest machine scalar" in the compilers you use. In particular, if 
you want to follow the style that always writes "at 0 range first_bit .. 
last_bit", you cannot specify the layout of a record that is larger than 
the largest machine scalar.

For example, the 24-bit record that Natasha suggested for the Base-64 
encoding does not work if the largest machine scalar is less than 24 
bits in size.

Out of curiosity, what is the case for JGNAT? Does it support annex C, 
and if not, how much of chapter 13 does it implement?

>> But there is not even a recommendation on the order in which bits in a
>> packed Boolean array are indexed, as far as I can see.
>
> True, but in practice, it will follow the endianness of the machine.

In that case, the meaning of slices of packed Boolean arrays, such as 
Georg Bauhaus presented, is different for little-endian and big-endian 
machines, and this cannot be corrected with a Bit_Order specification,

> Likewise, the RM doesn't say which bits of an integer represent
> what,

Well, combining the definition of Bit_Order in RM 13.5.3 with the 
definition of the First_Bit and Last_Bit attributes in RM 13.5.2 
suggests rather strongly that bit numbers defined by "offsets in bits" 
correspond either to increasing or decreasing significance in unsigned 
integer values.

> but in practice, the implementation is unlikely to make
> bit 17 be the high-order bit of a 32-bit integer.  ;-)

Agreed.

> I'd prefer these rules to be nailed down better.  But they're not, so
> you really have no choice but to rely to some extent on compilers doing
> sensible things.

You can avoid chapter 13 when portability is required, and use other 
means, such as Interfaces.Unsigned_N, where the only uncertainty is 
which sizes of Unsigned are implemented. But it is usually easy to use a 
larger Unsigned, if the exact size you would like to have is not 
provided. For example, the natural size of the bit-buffer for the 
Base-64 encoding is 8+(6-1) = 13 bits, but an Unsigned_N for any N >= 13 
can be used as well.

In contrast, the 24-bit-record-method falls apart if the compiler 
rejects its representation clause for some reason (for example because 
the largest machine scalar is less than 24 bits).

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-26  8:33     ` Niklas Holsti
@ 2011-12-28  0:09       ` Randy Brukardt
  0 siblings, 0 replies; 23+ messages in thread
From: Randy Brukardt @ 2011-12-28  0:09 UTC (permalink / raw)


"Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message 
news:9lqpq5Ft05U1@mid.individual.net...
> On 11-12-23 03:30 , Randy Brukardt wrote:
>> "Niklas Holsti"<niklas.holsti@tidorum.invalid>  wrote in message
>> news:9lgi3jFhaU1@mid.individual.net...
>>> On 11-12-22 11:41 , Natasha Kerensikova wrote:
>> ...
>>>>      type Octet_Block is array (1 .. 3) of Octet;
>>>>      pragma Pack (Octet_Block);
>>>
>>> I would add the following, to check that packing is effective:
>>>
>>>        for Octet_Block'Size use 24;
>>
>> I think it would probably be better to forget Pack and actually say what 
>> you
>> mean here:
>>
>>      type Octet_Block is array (1 .. 3) of Octet;
>>      for Octet_Block'Component_Size use 8;
>>
>> "Pack" should be reserved for cases where you don't care about the exact
>> layout, you just want to save space. Here you do care about the exact
>> layout, so use a Component_Size.
>
> Fine adjustment: when Component_Size is not a factor or multiple of the 
> word size, RM 13.3(73) says that the array may also have to be packed in 
> order to eliminate gaps between components.
>
> Of course, a Component_Size of 8 bits is usually a factor of the word 
> size. But perhaps not always.

You're right of course, although I would hope that no compiler was silly 
enough to ignore a direct component-size command by putting gaps into it. 
I'd much rather the compiler rejected such a clause (surely Janus/Ada 
would). If you want gaps, you should say so. For instance, on the U2200 (a 
36-bit machine with 9-bit divisions), you'd have to say

     type Octet_Block is array (1 .. 3) of Octet;
     for Octet_Block'Component_Size use 9;

if you wanted gaps. (And 8 might have been rejected, I don't remember 
anymore whether the Unisys code generator could handle that.) That's why 
Janus/Ada uses a constant Host.Stor_Unit_Size in all of its representation 
clauses. (As you might guess, package Host contains a bunch of constants and 
subprograms defining the host environment for the compiler. You'd probably 
also guess that there is a similar package Target, and if you did, you'd be 
right.)

                                                     Randy.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-23  1:42     ` Randy Brukardt
@ 2011-12-28  8:59       ` Niklas Holsti
  2011-12-29  5:41         ` Randy Brukardt
  0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-28  8:59 UTC (permalink / raw)


On 11-12-23 03:42 , Randy Brukardt wrote:
> "Niklas Holsti"<niklas.holsti@tidorum.invalid>  wrote in message
> news:9lgls6FticU1@mid.individual.net...
> ...
>>> http://www.adacore.com/2008/03/03/gem-27/
>>
>> I am surprised and disappointed that there is *no* mention of portability
>> problems in that "gem". This is marketing hype for GNAT, not sound
>> programming advice.
>
> There is no portability problem in common use.

Depends on what you consider common. The gem even claims that its 
representation clause makes "Ada" use a biased 7-bit representation for 
an Integer component with range 100..227. RM chapter 13 only recommends 
that compilers should support unbiased representations.

Which Ada compilers, other than GNAT, can use biased representations?

> For instance, Natasha's code
> (once the rep. clauses are right) will work on both a big-endian and
> little-endian machine without change.

Once the representation clauses are corrected, as you say, Natasha's 
code is the best that can be achieved with chapter 13. But...

Natasha's 24-bit-record code works only if the Ada implementation (a) 
supports chapter 13 as recommended and (b) has a machine scalar of at 
least 24 bits.

We know that some Ada programs run on the 16-bit TI MSP430. Will the 
24-bit record code work there? Ok, MSP430 Ada programs are uncommon, but 
perhaps we want to make them more common.

Natasha's code that uses (packed) arrays with Component_Sizes of 6 and 8 
bits has more complex portability questions. Of course it also requires 
support for chapter 13, but I think it needs more than that. It depends 
on the indexing order and the word size. I don't find anything in 
chapter 13 or elsewhere in the RM that even requires the indexing order 
to be the same for different array types, although this is of course 
very likely to be the case.

> The problem occurs when you need to process big-endian data on a
> little-endian machine (or vice versa). That is not that common of a need --

The problem occurs when an application inputs or outputs binary data, 
and the application should be coded in a portable way. I can't believe 
that this is uncommon.

> Most code does not care which bits are which bits in the word -- like this
> encoding code -- they just need a consistent representation. That is, it
> doesn't matter where bit 0 is, so long as it is the same in every item that
> is processed.

True for internal data, false for inputs and outputs.

If all you want to do is to compress internal data, use pragma Pack, as 
you suggested.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-27 23:47                   ` Niklas Holsti
@ 2011-12-29  0:50                     ` Robert A Duff
  2011-12-30 20:54                       ` anon
  2011-12-30 20:56                       ` Niklas Holsti
  0 siblings, 2 replies; 23+ messages in thread
From: Robert A Duff @ 2011-12-29  0:50 UTC (permalink / raw)


Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

> Yes, a compiler cannot claim to support annex C (Systems Programming)
> unless it implements chapter 13 as recommended, so that all the
> "shoulds" are implemented. But this is only an argument for "probable"
> portability, since supporting annex C is optional.

Right.

But of course supporting the Ada standard is optional, too.  ;-)
It's easy to forget that standards don't actually _require_ anybody
to do anything.  So, unfortunately, the best you can be sure of is
"probable" portability.

> Out of curiosity, what is the case for JGNAT?

I don't know much about JGNAT.  I think it doesn't support some
things that are "impossible or impractical" (see AARM-1.1.3(6)),
given the limitations of the JVM.

- Bob



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-28  8:59       ` Niklas Holsti
@ 2011-12-29  5:41         ` Randy Brukardt
  2011-12-29 10:10           ` Dmitry A. Kazakov
  0 siblings, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2011-12-29  5:41 UTC (permalink / raw)


"Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message 
news:9m0433FjokU1@mid.individual.net...
> On 11-12-23 03:42 , Randy Brukardt wrote:
>> "Niklas Holsti"<niklas.holsti@tidorum.invalid>  wrote in message
>> news:9lgls6FticU1@mid.individual.net...
>> ...
>>>> http://www.adacore.com/2008/03/03/gem-27/
>>>
>>> I am surprised and disappointed that there is *no* mention of 
>>> portability
>>> problems in that "gem". This is marketing hype for GNAT, not sound
>>> programming advice.
>>
>> There is no portability problem in common use.
>
> Depends on what you consider common. The gem even claims that its 
> representation clause makes "Ada" use a biased 7-bit representation for an 
> Integer component with range 100..227. RM chapter 13 only recommends that 
> compilers should support unbiased representations.
>
> Which Ada compilers, other than GNAT, can use biased representations?

Ah, I missed that. I don't know of any other Ada compilers that support 
biased representations.

...
> Natasha's 24-bit-record code works only if the Ada implementation (a) 
> supports chapter 13 as recommended and (b) has a machine scalar of at 
> least 24 bits.

No, not true. Machine scalars only come into play if you need to use the 
non-native bit order, and that isn't needed here.

Once you have to work with the non-native bit order, things get messy. But 
that is rarely needed.

> We know that some Ada programs run on the 16-bit TI MSP430. Will the 
> 24-bit record code work there? Ok, MSP430 Ada programs are uncommon, but 
> perhaps we want to make them more common.

Sure, it will work there, so long as you don't try to mess with the 
non-native bit order.

> Natasha's code that uses (packed) arrays with Component_Sizes of 6 and 8 
> bits has more complex portability questions. Of course it also requires 
> support for chapter 13, but I think it needs more than that. It depends on 
> the indexing order and the word size. I don't find anything in chapter 13 
> or elsewhere in the RM that even requires the indexing order to be the 
> same for different array types, although this is of course very likely to 
> be the case.

Worrying about indexing order is just plain silly. As Bob points out, 
*nothing* is really required by a standard, you have to trust implementers 
to do reasonable things.

>> The problem occurs when you need to process big-endian data on a
>> little-endian machine (or vice versa). That is not that common of a 
>> need --
>
> The problem occurs when an application inputs or outputs binary data, and 
> the application should be coded in a portable way. I can't believe that 
> this is uncommon.

Only if the binary data is in the non-native bit order. If the binary data 
is in the native bit order, there is nothing that special that needs to be 
done. And it doesn't matter which bit order that is.

>> Most code does not care which bits are which bits in the word -- like 
>> this
>> encoding code -- they just need a consistent representation. That is, it
>> doesn't matter where bit 0 is, so long as it is the same in every item 
>> that
>> is processed.
>
> True for internal data, false for inputs and outputs.

Not really. The vast majority of inputs and outputs are in text form (as in 
the encoding example). Most of the rest are just a pure stream of bytes. And 
the interesting thing with a stream of bytes is that you get identical 
behavior for both bit orders -- because the bit order and byte order are 
tied together. The problem comes when you try to use the non-native bit and 
byte order on a machine.

For instance, in the encoding example, all of the messy manipulation occurs 
completely inside the application: the input is a stream of bytes and the 
output is a stream of characters -- neither are subject to any byte order 
issues themselves. That being the case, which bytes go where (internally) 
turns out to be irrelevant, because the native order will have the right 
effect either way.

It took me a long time to grasp this; I only figured it out after struggling 
for a long time with our old 68000 code generator (which is big-endian, of 
course). Once I got the code selection right, nothing much needed to be 
changed. I understand when others don't see this; it's easy to think too 
much in terms of the exact bits involved, but those matter less than it may 
seem.

                                                                 Randy.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-29  5:41         ` Randy Brukardt
@ 2011-12-29 10:10           ` Dmitry A. Kazakov
  0 siblings, 0 replies; 23+ messages in thread
From: Dmitry A. Kazakov @ 2011-12-29 10:10 UTC (permalink / raw)


On Wed, 28 Dec 2011 23:41:57 -0600, Randy Brukardt wrote:

> "Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message 
> news:9m0433FjokU1@mid.individual.net...

>> True for internal data, false for inputs and outputs.
> 
> Not really. The vast majority of inputs and outputs are in text form (as in 
> the encoding example). Most of the rest are just a pure stream of bytes. And 
> the interesting thing with a stream of bytes is that you get identical 
> behavior for both bit orders -- because the bit order and byte order are 
> tied together. The problem comes when you try to use the non-native bit and 
> byte order on a machine.

No. It is true that bit order within the octets is properly handled by the
hardware, wrong is that you could ignore it once you got the octets read.
Many octet-based protocols use bit fields spanning across several octets
starting and ending anywhere in the octet. Probably you could ignore this
reordering octets of the stream into the machine-native order. If you knew
how to, because of nice middle-endian representations.

Formally speaking a stream cannot be reordered at all, because it is
unbounded. You could reorder only octets of a packet, a word etc. But not
before calculating the check sums. The real-life protocols are difficult to
crack.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-29  0:50                     ` Robert A Duff
@ 2011-12-30 20:54                       ` anon
  2011-12-30 20:56                       ` Niklas Holsti
  1 sibling, 0 replies; 23+ messages in thread
From: anon @ 2011-12-30 20:54 UTC (permalink / raw)


With the current Ada Standard there are too many options and that called 
"The Killing and the Death" for any language. There should be only one 
option for the Ada Standard. Either you support all features or no features 
not even the language itself.  

Thou Ada Standard might include a sub-class for those compilers that do 
not follow the complete standard and must use the word "Sub-Ada" when 
referring to its compiler and libraries. And all impractical sections 
must be documented in detail and approved by a non-ARG sub-committee 
appointed by the ARG. But for the most Ada systems that too much work.

An example of a "Sub-Ada" could be where Ada is implemented on a JVM. 
Some might say the "Machine_Code" package is impractical, because
on these systems there are two assembly languages. The first being the 
"J-Code" used by the JVM and the second being the hardware CPU assembly 
which in most cases in unknown to the JVM. Even though Ada does not know 
the hardware the Ada system should know the "J-Code" for the Java version 
it was written for. 

But the "System.RPC" package is another story, because no Java machine 
support the RPC sub-system so the JVM version would have to be approved 
by the Ada sub-committee or no JVM Ada would exist. Unless the RPC 
sub-system is fully emulated for a JVM in Ada software.


In <wcclipwcian.fsf@shell01.TheWorld.com>, Robert A Duff <bobduff@shell01.TheWorld.com> writes:
>Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>
>> Yes, a compiler cannot claim to support annex C (Systems Programming)
>> unless it implements chapter 13 as recommended, so that all the
>> "shoulds" are implemented. But this is only an argument for "probable"
>> portability, since supporting annex C is optional.
>
>Right.
>
>But of course supporting the Ada standard is optional, too.  ;-)
>It's easy to forget that standards don't actually _require_ anybody
>to do anything.  So, unfortunately, the best you can be sure of is
>"probable" portability.
>
>> Out of curiosity, what is the case for JGNAT?
>
>I don't know much about JGNAT.  I think it doesn't support some
>things that are "impossible or impractical" (see AARM-1.1.3(6)),
>given the limitations of the JVM.
>
>- Bob




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Representation clauses for base-64 encoding
  2011-12-29  0:50                     ` Robert A Duff
  2011-12-30 20:54                       ` anon
@ 2011-12-30 20:56                       ` Niklas Holsti
  1 sibling, 0 replies; 23+ messages in thread
From: Niklas Holsti @ 2011-12-30 20:56 UTC (permalink / raw)


On 11-12-29 02:50 , Robert A Duff wrote:
> Niklas Holsti<niklas.holsti@tidorum.invalid>  writes:
>
>> Yes, a compiler cannot claim to support annex C (Systems Programming)
>> unless it implements chapter 13 as recommended, so that all the
>> "shoulds" are implemented. But this is only an argument for "probable"
>> portability, since supporting annex C is optional.
>
> Right.
>
> But of course supporting the Ada standard is optional, too.  ;-)

Yes, but we are talking about Ada programming, which to me means using 
an Ada compiler that follows the standard. For me, the issue is what 
level of portability the standard provides; the actual current 
implementations are secondary.

> It's easy to forget that standards don't actually _require_ anybody
> to do anything.  So, unfortunately, the best you can be sure of is
> "probable" portability.

I hope you would agree that Standard.Integer "certainly" has at least 16 
bits in a conforming Ada implementation, so that is one point of 
"certain" portability. The portability of representation clauses is less 
certain, since conformance is optional. In contrast, package Interfaces 
and its shift operations are in the core of the language (RM 1.1.2(5)), 
to which all imoplementations shall conform (RM 1.1.2(17)).

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-12-30 20:56 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-22  9:41 Representation clauses for base-64 encoding Natasha Kerensikova
2011-12-22 11:20 ` Niklas Holsti
2011-12-23  1:30   ` Randy Brukardt
2011-12-26  8:33     ` Niklas Holsti
2011-12-28  0:09       ` Randy Brukardt
2011-12-22 11:37 ` Georg Bauhaus
2011-12-22 12:24   ` Niklas Holsti
2011-12-22 15:09     ` Georg Bauhaus
2011-12-22 16:00       ` Natasha Kerensikova
2011-12-22 22:18         ` Georg Bauhaus
2011-12-25 10:17           ` Niklas Holsti
2011-12-27 11:23             ` Georg Bauhaus
2011-12-27 19:37               ` Niklas Holsti
2011-12-27 20:49                 ` Robert A Duff
2011-12-27 23:47                   ` Niklas Holsti
2011-12-29  0:50                     ` Robert A Duff
2011-12-30 20:54                       ` anon
2011-12-30 20:56                       ` Niklas Holsti
2011-12-23  1:42     ` Randy Brukardt
2011-12-28  8:59       ` Niklas Holsti
2011-12-29  5:41         ` Randy Brukardt
2011-12-29 10:10           ` Dmitry A. Kazakov
2011-12-23  1:33 ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox