Re: Representation clauses for base-64 encoding

comp.lang.ada
 help / color / mirror / Atom feed

From: Niklas Holsti <niklas.holsti@tidorum.invalid>
Subject: Re: Representation clauses for base-64 encoding
Date: Thu, 22 Dec 2011 13:20:19 +0200
Date: 2011-12-22T13:20:19+02:00	[thread overview]
Message-ID: <9lgi3jFhaU1@mid.individual.net> (raw)
In-Reply-To: <slrnjf5uol.1lme.lithiumcat@sigil.instinctive.eu>

On 11-12-22 11:41 , Natasha Kerensikova wrote:
> Hello,
>
> the recent discussion about representation clauses vs explicit shifting
> made me wonder about what is the Right Way of performing base-64
> encoding (rfc 1421).
>
> My first thoughts were along the following lines:
>
>     type Octet is mod 256;
>        --  or Character or Storage_Element or Stream_Element
>        --  or whatever 8-bit type relevant for the appliication
>
>     for Octet'Size use 8;
>     for Octet'Component_Size use 8;
>     for Octet'Bit_Order use System.Low_Order_First;

The compiler should reject that Bit_Order clause, because Octet is not a 
record type (RM 13.5.3(4)).

What did you want to achieve with that clause?

>
>     type Base_64_Digit is mod 64;
>
>     for Base_64_Digit'Size use 6;
>     for Base_64_Digit'Component_Size use 6;
>     for Base_64_Digit'Bit_Order use System.Low_Order_First;

Same comment and question as above, for Octet.

>
>     type Octet_Block is array (1 .. 3) of Octet;
>     pragma Pack (Octet_Block);

I would add the following, to check that packing is effective:

       for Octet_Block'Size use 24;

>
>     type Base_64_Block is array (1 .. 4) of Base_64_Digit;
>     pragma Pack (Base_64_Block);

Same comment as for Octet_Block.

>     function Split_Base_64 is new Ada.Unchecked_Conversion
>       (Source =>  Octet_Block, Target =>  Base_64_Block);
>
>     function Merge_Base_64 is new Ada.Unchecked_Conversion
>       (Source =>  Base_64_Block, Target =>  Octet_Block);
>
>
> However, if I understand 13.3(73) correctly, conforming compilers don't
> have to support such arrays (unless 6 and 8 are both factor or multiple
> of word size, but I guess there are not many 2-bit or 24-bit platforms
> around).

Right (I assume you meant *12*-bit or 24-bit).

>
> It seems a more portable but uglier way of doing it is using record:
> instead of arrays:
>
>     type Octet_Block is record
>        P, Q, R : Octet;
>     end record;

Here you might want to specify Octet_Block'Bit_Order.

>
>     for Octet_Block use record
>        P at 0 range 0 .. 7;
>        Q at 0 range 8 .. 15;
>        R at 0 range 16 .. 23;
>     end record;
>
>     type Base_64_Block is record
>        A, B, C, D : Base_64_Digit;
>     end record;

Ditto Base_64_Block'Bit_Order.

>
>     for Base_64_Block use record
>        A at 0 range 0 .. 5;
>        B at 0 range 6 .. 11;
>        C at 0 range 12 .. 17;
>        D at 0 range 18 .. 23;
>     end record;
>
> Though I guess it might not work so well in 16-bit platforms.

Maybe. It depends on the default bit-ordering and on the size of the 
"largest machine scalar", whatever that is -- that depends on what the 
compiler writer considers "convenient and efficient" (RM 13.3(8.1/2)).

>
> So is there a better way of doing it?

Do you expect that all the octet-strings to be encoded have a number of 
octets that is a multiple of 3, and conversely that all the base-64 
strings to be decoded have a length that is a multiple of 4? If not, I 
think that using 24-bit encoding/decoding buffers as in your example can 
be cumbersome, in addition to the portability problems.

An alternative is to make the array types Octet_Block and Base_64_Block 
long enough to hold the longest possible input/output strings (but still 
be definite), specify their Component_Sizes as 8 and 6 bits (hoping that 
the compiler accepts this), and apply Unchecked_Conversions on the 
entire arrays. But I would be afraid of problems at the ends of strings 
that only partially fill the last word.

For these reasons, I would definitely choose a shifting method.

I would use an Interfaces.Unsigned_16 or _32 as a buffer that contains 
some number of bits in its least-significant end. Initially the buffer 
is empty (zero bits) and cleared to all bits zero.

To encode a string of 8-bit groups into a string of 6-bit groups 
(omitting the "padding" digits that base-64 sometimes requires):

    while (there are 8-bit groups left) loop
       -- Invariant: buffer contains less than 6 bits.

       inset the next 8-bit group into the buffer by
       left-shifting the buffer for 8 positions and
       or'ing in the next 8-bit group;

       while (the buffer contains at least 6 bits) loop
          output the most significant 6 bits of the buffer
          and remove them from the buffer;
       end loop;

    end loop;

    output any bits (at most 5) left over in the buffer;

Similar code can be used to decode a string of 6-bit groups into a 
string of 8-bit groups.

 > Is it acceptable to handle
 > portability with different bodies for a spec that only contains the
 > Split_Base_64 and Merge_Base_64 functions?

I would accept it, but I still consider the shifting method better and 
safer.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

next prev parent reply	other threads:[~2011-12-22 11:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-22  9:41 Representation clauses for base-64 encoding Natasha Kerensikova
2011-12-22 11:20 ` Niklas Holsti [this message]
2011-12-23  1:30   ` Randy Brukardt
2011-12-26  8:33     ` Niklas Holsti
2011-12-28  0:09       ` Randy Brukardt
2011-12-22 11:37 ` Georg Bauhaus
2011-12-22 12:24   ` Niklas Holsti
2011-12-22 15:09     ` Georg Bauhaus
2011-12-22 16:00       ` Natasha Kerensikova
2011-12-22 22:18         ` Georg Bauhaus
2011-12-25 10:17           ` Niklas Holsti
2011-12-27 11:23             ` Georg Bauhaus
2011-12-27 19:37               ` Niklas Holsti
2011-12-27 20:49                 ` Robert A Duff
2011-12-27 23:47                   ` Niklas Holsti
2011-12-29  0:50                     ` Robert A Duff
2011-12-30 20:54                       ` anon
2011-12-30 20:56                       ` Niklas Holsti
2011-12-23  1:42     ` Randy Brukardt
2011-12-28  8:59       ` Niklas Holsti
2011-12-29  5:41         ` Randy Brukardt
2011-12-29 10:10           ` Dmitry A. Kazakov
2011-12-23  1:33 ` Randy Brukardt

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox