From: Niklas Holsti <niklas.holsti@tidorum.invalid>
Subject: Re: Representation clauses for base-64 encoding
Date: Thu, 22 Dec 2011 13:20:19 +0200
Date: 2011-12-22T13:20:19+02:00 [thread overview]
Message-ID: <9lgi3jFhaU1@mid.individual.net> (raw)
In-Reply-To: <slrnjf5uol.1lme.lithiumcat@sigil.instinctive.eu>
On 11-12-22 11:41 , Natasha Kerensikova wrote:
> Hello,
>
> the recent discussion about representation clauses vs explicit shifting
> made me wonder about what is the Right Way of performing base-64
> encoding (rfc 1421).
>
> My first thoughts were along the following lines:
>
> type Octet is mod 256;
> -- or Character or Storage_Element or Stream_Element
> -- or whatever 8-bit type relevant for the appliication
>
> for Octet'Size use 8;
> for Octet'Component_Size use 8;
> for Octet'Bit_Order use System.Low_Order_First;
The compiler should reject that Bit_Order clause, because Octet is not a
record type (RM 13.5.3(4)).
What did you want to achieve with that clause?
>
> type Base_64_Digit is mod 64;
>
> for Base_64_Digit'Size use 6;
> for Base_64_Digit'Component_Size use 6;
> for Base_64_Digit'Bit_Order use System.Low_Order_First;
Same comment and question as above, for Octet.
>
> type Octet_Block is array (1 .. 3) of Octet;
> pragma Pack (Octet_Block);
I would add the following, to check that packing is effective:
for Octet_Block'Size use 24;
>
> type Base_64_Block is array (1 .. 4) of Base_64_Digit;
> pragma Pack (Base_64_Block);
Same comment as for Octet_Block.
> function Split_Base_64 is new Ada.Unchecked_Conversion
> (Source => Octet_Block, Target => Base_64_Block);
>
> function Merge_Base_64 is new Ada.Unchecked_Conversion
> (Source => Base_64_Block, Target => Octet_Block);
>
>
> However, if I understand 13.3(73) correctly, conforming compilers don't
> have to support such arrays (unless 6 and 8 are both factor or multiple
> of word size, but I guess there are not many 2-bit or 24-bit platforms
> around).
Right (I assume you meant *12*-bit or 24-bit).
>
> It seems a more portable but uglier way of doing it is using record:
> instead of arrays:
>
> type Octet_Block is record
> P, Q, R : Octet;
> end record;
Here you might want to specify Octet_Block'Bit_Order.
>
> for Octet_Block use record
> P at 0 range 0 .. 7;
> Q at 0 range 8 .. 15;
> R at 0 range 16 .. 23;
> end record;
>
> type Base_64_Block is record
> A, B, C, D : Base_64_Digit;
> end record;
Ditto Base_64_Block'Bit_Order.
>
> for Base_64_Block use record
> A at 0 range 0 .. 5;
> B at 0 range 6 .. 11;
> C at 0 range 12 .. 17;
> D at 0 range 18 .. 23;
> end record;
>
> Though I guess it might not work so well in 16-bit platforms.
Maybe. It depends on the default bit-ordering and on the size of the
"largest machine scalar", whatever that is -- that depends on what the
compiler writer considers "convenient and efficient" (RM 13.3(8.1/2)).
>
> So is there a better way of doing it?
Do you expect that all the octet-strings to be encoded have a number of
octets that is a multiple of 3, and conversely that all the base-64
strings to be decoded have a length that is a multiple of 4? If not, I
think that using 24-bit encoding/decoding buffers as in your example can
be cumbersome, in addition to the portability problems.
An alternative is to make the array types Octet_Block and Base_64_Block
long enough to hold the longest possible input/output strings (but still
be definite), specify their Component_Sizes as 8 and 6 bits (hoping that
the compiler accepts this), and apply Unchecked_Conversions on the
entire arrays. But I would be afraid of problems at the ends of strings
that only partially fill the last word.
For these reasons, I would definitely choose a shifting method.
I would use an Interfaces.Unsigned_16 or _32 as a buffer that contains
some number of bits in its least-significant end. Initially the buffer
is empty (zero bits) and cleared to all bits zero.
To encode a string of 8-bit groups into a string of 6-bit groups
(omitting the "padding" digits that base-64 sometimes requires):
while (there are 8-bit groups left) loop
-- Invariant: buffer contains less than 6 bits.
inset the next 8-bit group into the buffer by
left-shifting the buffer for 8 positions and
or'ing in the next 8-bit group;
while (the buffer contains at least 6 bits) loop
output the most significant 6 bits of the buffer
and remove them from the buffer;
end loop;
end loop;
output any bits (at most 5) left over in the buffer;
Similar code can be used to decode a string of 6-bit groups into a
string of 8-bit groups.
> Is it acceptable to handle
> portability with different bodies for a spec that only contains the
> Split_Base_64 and Merge_Base_64 functions?
I would accept it, but I still consider the shifting method better and
safer.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
next prev parent reply other threads:[~2011-12-22 11:20 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-22 9:41 Representation clauses for base-64 encoding Natasha Kerensikova
2011-12-22 11:20 ` Niklas Holsti [this message]
2011-12-23 1:30 ` Randy Brukardt
2011-12-26 8:33 ` Niklas Holsti
2011-12-28 0:09 ` Randy Brukardt
2011-12-22 11:37 ` Georg Bauhaus
2011-12-22 12:24 ` Niklas Holsti
2011-12-22 15:09 ` Georg Bauhaus
2011-12-22 16:00 ` Natasha Kerensikova
2011-12-22 22:18 ` Georg Bauhaus
2011-12-25 10:17 ` Niklas Holsti
2011-12-27 11:23 ` Georg Bauhaus
2011-12-27 19:37 ` Niklas Holsti
2011-12-27 20:49 ` Robert A Duff
2011-12-27 23:47 ` Niklas Holsti
2011-12-29 0:50 ` Robert A Duff
2011-12-30 20:54 ` anon
2011-12-30 20:56 ` Niklas Holsti
2011-12-23 1:42 ` Randy Brukardt
2011-12-28 8:59 ` Niklas Holsti
2011-12-29 5:41 ` Randy Brukardt
2011-12-29 10:10 ` Dmitry A. Kazakov
2011-12-23 1:33 ` Randy Brukardt
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox