* Representation clauses for base-64 encoding
@ 2011-12-22 9:41 Natasha Kerensikova
2011-12-22 11:20 ` Niklas Holsti
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Natasha Kerensikova @ 2011-12-22 9:41 UTC (permalink / raw)
Hello,
the recent discussion about representation clauses vs explicit shifting
made me wonder about what is the Right Way of performing base-64
encoding (rfc 1421).
My first thoughts were along the following lines:
type Octet is mod 256;
-- or Character or Storage_Element or Stream_Element
-- or whatever 8-bit type relevant for the appliication
for Octet'Size use 8;
for Octet'Component_Size use 8;
for Octet'Bit_Order use System.Low_Order_First;
type Base_64_Digit is mod 64;
for Base_64_Digit'Size use 6;
for Base_64_Digit'Component_Size use 6;
for Base_64_Digit'Bit_Order use System.Low_Order_First;
type Octet_Block is array (1 .. 3) of Octet;
pragma Pack (Octet_Block);
type Base_64_Block is array (1 .. 4) of Base_64_Digit;
pragma Pack (Base_64_Block);
function Split_Base_64 is new Ada.Unchecked_Conversion
(Source => Octet_Block, Target => Base_64_Block);
function Merge_Base_64 is new Ada.Unchecked_Conversion
(Source => Base_64_Block, Target => Octet_Block);
However, if I understand 13.3(73) correctly, conforming compilers don't
have to support such arrays (unless 6 and 8 are both factor or multiple
of word size, but I guess there are not many 2-bit or 24-bit platforms
around).
It seems a more portable but uglier way of doing it is using record:
instead of arrays:
type Octet_Block is record
P, Q, R : Octet;
end record;
for Octet_Block use record
P at 0 range 0 .. 7;
Q at 0 range 8 .. 15;
R at 0 range 16 .. 23;
end record;
type Base_64_Block is record
A, B, C, D : Base_64_Digit;
end record;
for Base_64_Block use record
A at 0 range 0 .. 5;
B at 0 range 6 .. 11;
C at 0 range 12 .. 17;
D at 0 range 18 .. 23;
end record;
Though I guess it might not work so well in 16-bit platforms.
So is there a better way of doing it? Is it acceptable to handle
portability with different bodies for a spec that only contains the
Split_Base_64 and Merge_Base_64 functions?
Or is there some things I'm missing that makes even that non-portable or
even incorrect?
Thanks in advance for sharing your wisdom,
Natasha
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 9:41 Representation clauses for base-64 encoding Natasha Kerensikova
@ 2011-12-22 11:20 ` Niklas Holsti
2011-12-23 1:30 ` Randy Brukardt
2011-12-22 11:37 ` Georg Bauhaus
2011-12-23 1:33 ` Randy Brukardt
2 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-22 11:20 UTC (permalink / raw)
On 11-12-22 11:41 , Natasha Kerensikova wrote:
> Hello,
>
> the recent discussion about representation clauses vs explicit shifting
> made me wonder about what is the Right Way of performing base-64
> encoding (rfc 1421).
>
> My first thoughts were along the following lines:
>
> type Octet is mod 256;
> -- or Character or Storage_Element or Stream_Element
> -- or whatever 8-bit type relevant for the appliication
>
> for Octet'Size use 8;
> for Octet'Component_Size use 8;
> for Octet'Bit_Order use System.Low_Order_First;
The compiler should reject that Bit_Order clause, because Octet is not a
record type (RM 13.5.3(4)).
What did you want to achieve with that clause?
>
> type Base_64_Digit is mod 64;
>
> for Base_64_Digit'Size use 6;
> for Base_64_Digit'Component_Size use 6;
> for Base_64_Digit'Bit_Order use System.Low_Order_First;
Same comment and question as above, for Octet.
>
> type Octet_Block is array (1 .. 3) of Octet;
> pragma Pack (Octet_Block);
I would add the following, to check that packing is effective:
for Octet_Block'Size use 24;
>
> type Base_64_Block is array (1 .. 4) of Base_64_Digit;
> pragma Pack (Base_64_Block);
Same comment as for Octet_Block.
> function Split_Base_64 is new Ada.Unchecked_Conversion
> (Source => Octet_Block, Target => Base_64_Block);
>
> function Merge_Base_64 is new Ada.Unchecked_Conversion
> (Source => Base_64_Block, Target => Octet_Block);
>
>
> However, if I understand 13.3(73) correctly, conforming compilers don't
> have to support such arrays (unless 6 and 8 are both factor or multiple
> of word size, but I guess there are not many 2-bit or 24-bit platforms
> around).
Right (I assume you meant *12*-bit or 24-bit).
>
> It seems a more portable but uglier way of doing it is using record:
> instead of arrays:
>
> type Octet_Block is record
> P, Q, R : Octet;
> end record;
Here you might want to specify Octet_Block'Bit_Order.
>
> for Octet_Block use record
> P at 0 range 0 .. 7;
> Q at 0 range 8 .. 15;
> R at 0 range 16 .. 23;
> end record;
>
> type Base_64_Block is record
> A, B, C, D : Base_64_Digit;
> end record;
Ditto Base_64_Block'Bit_Order.
>
> for Base_64_Block use record
> A at 0 range 0 .. 5;
> B at 0 range 6 .. 11;
> C at 0 range 12 .. 17;
> D at 0 range 18 .. 23;
> end record;
>
> Though I guess it might not work so well in 16-bit platforms.
Maybe. It depends on the default bit-ordering and on the size of the
"largest machine scalar", whatever that is -- that depends on what the
compiler writer considers "convenient and efficient" (RM 13.3(8.1/2)).
>
> So is there a better way of doing it?
Do you expect that all the octet-strings to be encoded have a number of
octets that is a multiple of 3, and conversely that all the base-64
strings to be decoded have a length that is a multiple of 4? If not, I
think that using 24-bit encoding/decoding buffers as in your example can
be cumbersome, in addition to the portability problems.
An alternative is to make the array types Octet_Block and Base_64_Block
long enough to hold the longest possible input/output strings (but still
be definite), specify their Component_Sizes as 8 and 6 bits (hoping that
the compiler accepts this), and apply Unchecked_Conversions on the
entire arrays. But I would be afraid of problems at the ends of strings
that only partially fill the last word.
For these reasons, I would definitely choose a shifting method.
I would use an Interfaces.Unsigned_16 or _32 as a buffer that contains
some number of bits in its least-significant end. Initially the buffer
is empty (zero bits) and cleared to all bits zero.
To encode a string of 8-bit groups into a string of 6-bit groups
(omitting the "padding" digits that base-64 sometimes requires):
while (there are 8-bit groups left) loop
-- Invariant: buffer contains less than 6 bits.
inset the next 8-bit group into the buffer by
left-shifting the buffer for 8 positions and
or'ing in the next 8-bit group;
while (the buffer contains at least 6 bits) loop
output the most significant 6 bits of the buffer
and remove them from the buffer;
end loop;
end loop;
output any bits (at most 5) left over in the buffer;
Similar code can be used to decode a string of 6-bit groups into a
string of 8-bit groups.
> Is it acceptable to handle
> portability with different bodies for a spec that only contains the
> Split_Base_64 and Merge_Base_64 functions?
I would accept it, but I still consider the shifting method better and
safer.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 9:41 Representation clauses for base-64 encoding Natasha Kerensikova
2011-12-22 11:20 ` Niklas Holsti
@ 2011-12-22 11:37 ` Georg Bauhaus
2011-12-22 12:24 ` Niklas Holsti
2011-12-23 1:33 ` Randy Brukardt
2 siblings, 1 reply; 23+ messages in thread
From: Georg Bauhaus @ 2011-12-22 11:37 UTC (permalink / raw)
On 22.12.11 10:41, Natasha Kerensikova wrote:
> Hello,
>
> the recent discussion about representation clauses vs explicit shifting
> made me wonder about what is the Right Way of performing base-64
> encoding (rfc 1421).
> My first thoughts were along the following lines:
>
> type Octet is mod 256;
> -- or Character or Storage_Element or Stream_Element
> -- or whatever 8-bit type relevant for the appliication
>
> for Octet'Size use 8;
> for Octet'Component_Size use 8;
Here I would stop.
The RFC says that a value from the range 0 .. 63 is associated with
a character from a specific set of characters, for encoding it:
'A' .. 'Z', 'a' .. 'z', '+', '/'.
And there is a "pad", '='.
Since the characters shall stand for 0 .. 25, 26 .. 51, 52 .. 63,
this specifies a range, actually.
In Ada, the 1:1 translation into a type can be:
type Repertoire is (
'A','B','C','D','E','F','G','H','I','J','K','L','M',
'N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
'a','b','c','d','e','f','g','h','i','j','k','l','m',
'n','o','p','q','r','s','t','u','v','w','x','y','z',
'0','1','2','3','4','5','6','7','8','9',
'+','/','=');
subtype Base_64_Character is Repertoire range 'A' .. '/';
subtype Padding is Repertoire range '=' .. '=';
Note that you could have string literals of these:
type Base_64_String is
array (Positive range <>) of Repertoire;
S : Base_64_String := "ABC="; -- but not "ABC!"
The language guarantees that each of the literals is associated
with just the positional number that Base 64 encoding requires.
Let the compiler choose the best representation for Repertoire
subtypes when encoding.
Or simply use subtypes of Character.
Only if you need some representation in memory or other storage
that has 'Size /= Character'Size, or 'Size /= Repertoire'Size
etc, derive new types as needed, and add representation clauses:
http://www.adacore.com/2008/03/03/gem-27/
http://www.adacore.com/2008/03/17/gem-28/
For streaming encoded text over the wire, a subtype of String should
serve the job just fine, convert as necessary. Or use Base_64_String.
Packing and unpacking can be quite expensive.
I remember at least two publicly available Base 64 encoding
packages, one by Tom Moran IIRC, and one in AWS. There are
probably more in the PAL.
-- Georg
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 11:37 ` Georg Bauhaus
@ 2011-12-22 12:24 ` Niklas Holsti
2011-12-22 15:09 ` Georg Bauhaus
2011-12-23 1:42 ` Randy Brukardt
0 siblings, 2 replies; 23+ messages in thread
From: Niklas Holsti @ 2011-12-22 12:24 UTC (permalink / raw)
On 11-12-22 13:37 , Georg Bauhaus wrote:
> On 22.12.11 10:41, Natasha Kerensikova wrote:
>> Hello,
>>
>> the recent discussion about representation clauses vs explicit shifting
>> made me wonder about what is the Right Way of performing base-64
>> encoding (rfc 1421).
>
>
>> My first thoughts were along the following lines:
>>
>> type Octet is mod 256;
>> -- or Character or Storage_Element or Stream_Element
>> -- or whatever 8-bit type relevant for the appliication
>>
>> for Octet'Size use 8;
>> for Octet'Component_Size use 8;
>
> Here I would stop.
>
> The RFC says that a value from the range 0 .. 63 is associated with
> a character from a specific set of characters, for encoding it:
> 'A' .. 'Z', 'a' .. 'z', '+', '/'.
> And there is a "pad", '='.
> Since the characters shall stand for 0 .. 25, 26 .. 51, 52 .. 63,
> this specifies a range, actually.
> In Ada, the 1:1 translation into a type can be:
>
> type Repertoire is (
> 'A','B','C','D','E','F','G','H','I','J','K','L','M',
> 'N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
> 'a','b','c','d','e','f','g','h','i','j','k','l','m',
> 'n','o','p','q','r','s','t','u','v','w','x','y','z',
> '0','1','2','3','4','5','6','7','8','9',
> '+','/','=');
> subtype Base_64_Character is Repertoire range 'A' .. '/';
> subtype Padding is Repertoire range '=' .. '=';
This defines a nice Ada representation of the six-bit codes. But this
was not Natasha's question; the question was about the slicing of a
sequence of bits, composed from 8-bit groups, into a sequence of 6-bit
groups.
> Let the compiler choose the best representation for Repertoire
> subtypes when encoding.
The point of base-64 encoding is to emit the six-bit groups as ordinary
Characters (using whatever character encoding is standard, for example
Latin-1). The compiler's internal representation of Repertoire elements
is not suitable; the Repertoire literal 'A' should be emitted as the
Character 'A', not as six zero bits. Natasha did not mention that, of
course, since the focus was on the mapping between 8-bit and 6-bit
slices of the bit-string.
> Only if you need some representation in memory or other storage
> that has 'Size /= Character'Size, or 'Size /= Repertoire'Size
> etc, derive new types as needed, and add representation clauses:
That was the point, but the problem is the difficulty of making
representation clauses portable.
>
> http://www.adacore.com/2008/03/03/gem-27/
I am surprised and disappointed that there is *no* mention of
portability problems in that "gem". This is marketing hype for GNAT, not
sound programming advice.
> For streaming encoded text over the wire, a subtype of String should
> serve the job just fine, convert as necessary.
Huh? It is not possible to convert between Repertoire and Character,
except through Unchecked_Conversion, which does not work as desired in
this case. The best method would be an array indexed by Repertoire and
containing Character elements, mapping the Repertoire literal 'A' to the
Character 'A', etc.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 12:24 ` Niklas Holsti
@ 2011-12-22 15:09 ` Georg Bauhaus
2011-12-22 16:00 ` Natasha Kerensikova
2011-12-23 1:42 ` Randy Brukardt
1 sibling, 1 reply; 23+ messages in thread
From: Georg Bauhaus @ 2011-12-22 15:09 UTC (permalink / raw)
On 22.12.11 13:24, Niklas Holsti wrote:
> This defines a nice Ada representation of the six-bit codes. But this was not Natasha's question;
You are quite right, sorry.
Stubbornly, I'd like to mumble, though, that the very
notion of representation is at odds with portability.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 15:09 ` Georg Bauhaus
@ 2011-12-22 16:00 ` Natasha Kerensikova
2011-12-22 22:18 ` Georg Bauhaus
0 siblings, 1 reply; 23+ messages in thread
From: Natasha Kerensikova @ 2011-12-22 16:00 UTC (permalink / raw)
Hello,
On 2011-12-22, Georg Bauhaus <rm.dash-bauhaus@futureapps.de> wrote:
> On 22.12.11 13:24, Niklas Holsti wrote:
>
>> This defines a nice Ada representation of the six-bit codes. But this
>> was not Natasha's question;
>
> You are quite right, sorry.
>
> Stubbornly, I'd like to mumble, though, that the very
> notion of representation is at odds with portability.
However here representation is not used as a notion, only as a tool:
using explicit shifts and masks, it is possible to write portable Ada
that performs the correct split of 3 octets on any platform.
The previous argument was that representation clauses allow more
readable code, which I'm inclined to believe. But is it really necessary
to give up portability for the sake of readability?
(I would answer "no" in that particular case, since base-64 splitting is
such a very simple and well-known operation)
Natasha
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 16:00 ` Natasha Kerensikova
@ 2011-12-22 22:18 ` Georg Bauhaus
2011-12-25 10:17 ` Niklas Holsti
0 siblings, 1 reply; 23+ messages in thread
From: Georg Bauhaus @ 2011-12-22 22:18 UTC (permalink / raw)
On 22.12.11 17:00, Natasha Kerensikova wrote:
> However here representation is not used as a notion, only as a tool:
> using explicit shifts and masks, it is possible to write portable Ada
> that performs the correct split of 3 octets on any platform.
>
> The previous argument was that representation clauses allow more
> readable code, which I'm inclined to believe. But is it really necessary
> to give up portability for the sake of readability?
There is middle ground, I think, insofar as it is possible
to extract bits in Ada without thinking about shifts and masks,
or logical operations.
Given a stream chopped into octets, the goal is to extract slices
of 6 consecutive bits and represent these using the characters
from a Base 64 encoding table. Leave the how-to of extraction to
the compiler, just say which bits. This does not use representation
clauses for extraction, and not shifts or mask either. Without claiming
originality, completeness, enough subtypes, portability of bit indexing
of bits in a single octet, or sufficient code quality, the following might
illustrate the convenience of arrays of packed Booleans (guaranteed by
the LRM to have desirable properties):
package B64 is
--
-- prints characters representing sequences of octets in
-- base 64 encoding. The octets come in via `Add`.
--
pragma Elaborate_Body(B64);
type Repertoire is ( 'A', 'B', 'C', 'D', 'E', 'F', 'G',
'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a',
'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k',
'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4',
'5', '6', '7', '8', '9', '+', '/' );
for Repertoire'Size use 6;
Pad : constant Character := '=';
subtype Bit_Index is Natural range 0..23;
type Bit_String is array(Bit_Index range <>) of Boolean;
pragma Pack(Bit_String);
subtype Octet is Bit_String(Bit_Index range 0..7);
procedure Add(Bits : Octet);
-- Take 6 bits and write a corresponding Base 64 character.
-- Uses bits from this `Bits`, and bits from last time `Add`
-- was called; saves bits for later use.
procedure Finish;
-- handle any left over bits and finish output
end B64;
with Ada.Text_IO;
with Ada.Unchecked_Conversion;
package body B64 is
subtype Base_64_Digit is Bit_String(Bit_Index range 0..5);
subtype Word is Bit_String(Bit_Index range 0..15);
function To_B64 is new Ada.Unchecked_Conversion(
Base_64_Digit, Repertoire);
procedure Write_64(C : Repertoire) is
Position : Natural;
begin
case C is
when 'A'..'Z' =>
Position := Character'Pos('A') + Repertoire'Pos(C);
when 'a'..'z' =>
Position := Character'Pos('a')
+ (Repertoire'Pos(C) - Repertoire'Pos('a'));
when '0'..'9' =>
Position := Character'Pos('0')
+ (Repertoire'Pos(C) - Repertoire'Pos('0'));
when '+' =>
Position := Character'Pos('+');
when '/' =>
Position := Character'Pos('/');
end case;
Ada.Text_IO.Put(Character'Val(Position));
end Write_64;
--
-- state information
--
type Selection is mod 4;
-- one of the four groups of 6 bits in a full bit string
Scratch : Word := (others => False);
-- buffer storing left over bits for future use
Position_In_Group : Selection := Selection'First;
procedure Add(Bits : Octet) is
Six_Pack : Base_64_Digit;
-- six bits ready to be processed
begin
case Position_In_Group is
when 0 =>
Six_Pack := Base_64_Digit(Bits(2..7));
Write_64(To_B64(Six_Pack));
Scratch(8..15) := Bits;
Position_In_Group := Selection'Succ(Position_In_Group);
when 1 =>
Scratch(4..7) := Bits(4..7);
Six_Pack := Base_64_Digit(Scratch(4..9));
Write_64(To_B64(Six_Pack));
Scratch(8..15) := Bits;
Position_In_Group := Selection'Succ(Position_In_Group);
when 2 =>
-- 4 bits left in `Scratch` plus 8 from `Bits`
-- is worth two output characters
Scratch(6..7) := Bits(6..7);
Six_Pack := Base_64_Digit(Scratch(6..11));
Write_64(To_B64(Six_Pack));
Position_In_Group := Selection'Succ(Position_In_Group);
Six_Pack := Base_64_Digit(Bits(0..5));
Write_64(To_B64(Six_Pack));
Scratch(8..15) := (others => False);
Position_In_Group := Selection'Succ(Position_In_Group);
when 3 =>
-- this won't happen, see comment on case `2`
raise Program_Error;
end case;
end Add;
procedure Finish is
begin
Ada.Text_IO.Put("NOT DONE");
Ada.Text_IO.Flush;
end Finish;
end B64;
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 11:20 ` Niklas Holsti
@ 2011-12-23 1:30 ` Randy Brukardt
2011-12-26 8:33 ` Niklas Holsti
0 siblings, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2011-12-23 1:30 UTC (permalink / raw)
"Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message
news:9lgi3jFhaU1@mid.individual.net...
> On 11-12-22 11:41 , Natasha Kerensikova wrote:
...
>> type Octet_Block is array (1 .. 3) of Octet;
>> pragma Pack (Octet_Block);
>
> I would add the following, to check that packing is effective:
>
> for Octet_Block'Size use 24;
I think it would probably be better to forget Pack and actually say what you
mean here:
type Octet_Block is array (1 .. 3) of Octet;
for Octet_Block'Component_Size use 8;
"Pack" should be reserved for cases where you don't care about the exact
layout, you just want to save space. Here you do care about the exact
layout, so use a Component_Size.
Randy.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 9:41 Representation clauses for base-64 encoding Natasha Kerensikova
2011-12-22 11:20 ` Niklas Holsti
2011-12-22 11:37 ` Georg Bauhaus
@ 2011-12-23 1:33 ` Randy Brukardt
2 siblings, 0 replies; 23+ messages in thread
From: Randy Brukardt @ 2011-12-23 1:33 UTC (permalink / raw)
"Natasha Kerensikova" <lithiumcat@gmail.com> wrote in message
news:slrnjf5uol.1lme.lithiumcat@sigil.instinctive.eu...
> type Octet is mod 256;
> -- or Character or Storage_Element or Stream_Element
> -- or whatever 8-bit type relevant for the appliication
>
> for Octet'Size use 8;
> for Octet'Component_Size use 8;
Component_Size is only for array types, put it on the array type.
> for Octet'Bit_Order use System.Low_Order_First;
Bit_Order is only for record types, put that on the record type.
Randy.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 12:24 ` Niklas Holsti
2011-12-22 15:09 ` Georg Bauhaus
@ 2011-12-23 1:42 ` Randy Brukardt
2011-12-28 8:59 ` Niklas Holsti
1 sibling, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2011-12-23 1:42 UTC (permalink / raw)
"Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message
news:9lgls6FticU1@mid.individual.net...
...
>> http://www.adacore.com/2008/03/03/gem-27/
>
> I am surprised and disappointed that there is *no* mention of portability
> problems in that "gem". This is marketing hype for GNAT, not sound
> programming advice.
There is no portability problem in common use. For instance, Natasha's code
(once the rep. clauses are right) will work on both a big-endian and
little-endian machine without change. (And yes, I'd use the record rep.
clauses to do this conversion; I think I have some code that looks very much
like his sample.)
The problem occurs when you need to process big-endian data on a
little-endian machine (or vice versa). That is not that common of a need --
it surely happens (as in your application), but you are overstating it if
you think most programs would run into it.
Most code does not care which bits are which bits in the word -- like this
encoding code -- they just need a consistent representation. That is, it
doesn't matter where bit 0 is, so long as it is the same in every item that
is processed.
Randy.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-22 22:18 ` Georg Bauhaus
@ 2011-12-25 10:17 ` Niklas Holsti
2011-12-27 11:23 ` Georg Bauhaus
0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-25 10:17 UTC (permalink / raw)
On 11-12-23 00:18 , Georg Bauhaus wrote:
> On 22.12.11 17:00, Natasha Kerensikova wrote:
>
>> However here representation is not used as a notion, only as a tool:
>> using explicit shifts and masks, it is possible to write portable Ada
>> that performs the correct split of 3 octets on any platform.
>>
>> The previous argument was that representation clauses allow more
>> readable code, which I'm inclined to believe. But is it really necessary
>> to give up portability for the sake of readability?
>
> There is middle ground, I think, insofar as it is possible
> to extract bits in Ada without thinking about shifts and masks,
> or logical operations.
>
> Given a stream chopped into octets, the goal is to extract slices
> of 6 consecutive bits and represent these using the characters
> from a Base 64 encoding table. Leave the how-to of extraction to
> the compiler, just say which bits. This does not use representation
> clauses for extraction, and not shifts or mask either. Without claiming
> originality, completeness, enough subtypes, portability of bit indexing
> of bits in a single octet, or sufficient code quality, the following might
> illustrate the convenience of arrays of packed Booleans (guaranteed by
> the LRM to have desirable properties):
What are these "desirable properties" on which you rely, and where in
the LRM are they guaranteed?
> package B64 is
[snip]
> subtype Bit_Index is Natural range 0..23;
>
> type Bit_String is array(Bit_Index range <>) of Boolean;
> pragma Pack(Bit_String);
>
> subtype Octet is Bit_String(Bit_Index range 0..7);
[snip]
> subtype Base_64_Digit is Bit_String(Bit_Index range 0..5);
[snip]
> procedure Add(Bits : Octet) is
> Six_Pack : Base_64_Digit;
> -- six bits ready to be processed
> begin
> case Position_In_Group is
> when 0 =>
> Six_Pack := Base_64_Digit(Bits(2..7));
This assumes that the bits in an Octet are indexed in order of
increasing significance, so that the slice Bits(2..7) gives the six most
significant bits. Since you defined the Octet type, you are of course
free to assume this.
But your code does not show how the Octet values are derived from the
original data, e.g. from a stream of Unsigned_8, or whatever binary data
is to be encoded as Base-64. That is an essential part of the problem,
and without it we cannot know if your approach works, or what operations
it really needs. There are certainly ways to convert an Unsigned_8, for
example, into an Octet value, such that the bits are indexed in
increasing significance order, but it requires either shifting and
masking, or the equivalent division or mod operations, or
Unchecked_Conversion to a record type with eight Boolean components and
a representation clause and a Bit_Order clause.
If you use an Unchecked_Conversion directly from Unsigned_8 to Octet,
you are assuming that the compiler implementes the indexing of Octet in
significance order. Where does the LRM guarantee that? I don't think it
does.
Your Unchecked_Conversion (not quoted) from Base_64_Digit to Repertoire
has the same problem, so at least that part of your code is not portable.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-23 1:30 ` Randy Brukardt
@ 2011-12-26 8:33 ` Niklas Holsti
2011-12-28 0:09 ` Randy Brukardt
0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-26 8:33 UTC (permalink / raw)
On 11-12-23 03:30 , Randy Brukardt wrote:
> "Niklas Holsti"<niklas.holsti@tidorum.invalid> wrote in message
> news:9lgi3jFhaU1@mid.individual.net...
>> On 11-12-22 11:41 , Natasha Kerensikova wrote:
> ...
>>> type Octet_Block is array (1 .. 3) of Octet;
>>> pragma Pack (Octet_Block);
>>
>> I would add the following, to check that packing is effective:
>>
>> for Octet_Block'Size use 24;
>
> I think it would probably be better to forget Pack and actually say what you
> mean here:
>
> type Octet_Block is array (1 .. 3) of Octet;
> for Octet_Block'Component_Size use 8;
>
> "Pack" should be reserved for cases where you don't care about the exact
> layout, you just want to save space. Here you do care about the exact
> layout, so use a Component_Size.
Fine adjustment: when Component_Size is not a factor or multiple of the
word size, RM 13.3(73) says that the array may also have to be packed in
order to eliminate gaps between components.
Of course, a Component_Size of 8 bits is usually a factor of the word
size. But perhaps not always.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-25 10:17 ` Niklas Holsti
@ 2011-12-27 11:23 ` Georg Bauhaus
2011-12-27 19:37 ` Niklas Holsti
0 siblings, 1 reply; 23+ messages in thread
From: Georg Bauhaus @ 2011-12-27 11:23 UTC (permalink / raw)
On 25.12.11 11:17, Niklas Holsti wrote:
>> There is middle ground, I think, insofar as it is possible
>> to extract bits in Ada without thinking about shifts and masks,
>> or logical operations.
Just saying that using Ada, one may write operations involving
single bits or slices of single bits simply by using arrays
of Booleans that are packed.
I think I didn't say that one may read portably from I/O ports?
>> Without claiming [...] portability of bit indexing
>> of bits in a single octet [...]
Thanks for illustrating my words in my code. ;-)
> What are these "desirable properties" on which you rely, and where in the LRM
> are they guaranteed?
Everything (the "desirable properties") that an interested party
might conclude after reading LRM 13.2. Yes, on some planet there
might be an Ada compiler that does not handle packed bits
reasonably well. ;-)
13.2 drops alignment requirements dropped, in particular, so that
I do not need to think about boundaries of storage element size
groups of bits.
The BIT_VECTOR in the Rationale of Ada 83 indicates awareness of
Pascals SET type, too, I should think. (Don't know if Wirth's paper
on representation of sets is an influence, or just a summary of
existing practice.)
The Ada 85 Quality and Style Guide has this to say, in Chapter 10,
http://www.adaic.org/resources/add_content/docs/95style/html/sec_10/10-6-3.html
"Use modular types rather than packed Boolean arrays
when measured performance indicates."
"Measured performance indicates", then , whether or not to
resort to explicit shifting and logical operations.
I looked at how compilers will translate operations on slices of
Booleans (-Os is an interesting option with GNAT). They will emit
instructions for shifting and logical operations; no surprise.
So, would there be a portable Base64 algorithm that reads from
a stream of storage elements, perhaps from a typical controller's
8-bit I/O port, that
(*) shifts and "reasons" more reliably, readably, and portably,
(*) performs shifts and logical operations much faster
than the shifts and logical operations generated by
typical compilers for bit slices or representations,
(*) runs on heterogeneous hardware with word size <= 16 bits?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-27 11:23 ` Georg Bauhaus
@ 2011-12-27 19:37 ` Niklas Holsti
2011-12-27 20:49 ` Robert A Duff
0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-27 19:37 UTC (permalink / raw)
On 11-12-27 13:23 , Georg Bauhaus wrote:
> On 25.12.11 11:17, Niklas Holsti wrote:
Actually, George Bauhaus wrote:
>>> There is middle ground, I think, insofar as it is possible
>>> to extract bits in Ada without thinking about shifts and masks,
>>> or logical operations.
>
> Just saying that using Ada, one may write operations involving
> single bits or slices of single bits simply by using arrays
> of Booleans that are packed.
Whether a Boolean array is packed to Component_Size = 1, with no gaps,
depends on the Ada compiler. It is not a hard requirement in the Ada RM,
as I understand it.
> I think I didn't say that one may read portably from I/O ports?
That's right, you didn't. But the "Base 64 encoding" problem by
definition starts from a sequence of 8-bit groups, with a defined bit
order: the first 6-bit group must be the high 6 bits of the first 8-bit
group, and so on. This is a core part of the problem.
>>> Without claiming [...] portability of bit indexing
>>> of bits in a single octet [...]
>
> Thanks for illustrating my words in my code. ;-)
I didn't understand what you meant by that "without claiming", which is
why I asked you to be clearer about the "desirable properties" that you
assumed.
>> What are these "desirable properties" on which you rely, and where in the LRM
>> are they guaranteed?
>
> Everything (the "desirable properties") that an interested party
> might conclude after reading LRM 13.2.
All of RM 13.2 is non-binding recommendation; the text defines the
recommended level of support, using "should". The only "shall" in 13.2
is in a legality rule. No properties of packed Boolean arrays are
"guaranteed" by 13.2.
> Yes, on some planet there
> might be an Ada compiler that does not handle packed bits
> reasonably well. ;-)
This is the kind of argument I am used to seeing in comp.lang.c: "it
works for me, so never mind that the C standard says it is undefined
behaviour".
Why didn't the RM authors write strict "shall" requirements in RM 13.2,
and the rest of chapter 13? Apparently, they felt that some Ada
compilers would not be able to implement all the recommendations with
reasonable effort and performance.
> 13.2 drops alignment requirements dropped, in particular, so that
> I do not need to think about boundaries of storage element size
> groups of bits.
But there is not even a recommendation on the order in which bits in a
packed Boolean array are indexed, as far as I can see.
> The Ada 85 Quality and Style Guide has this to say, in Chapter 10,
> http://www.adaic.org/resources/add_content/docs/95style/html/sec_10/10-6-3.html
>
> "Use modular types rather than packed Boolean arrays
> when measured performance indicates."
>
> "Measured performance indicates", then , whether or not to
> resort to explicit shifting and logical operations.
Sure, if your program is too slow (but works) using packed Boolean
arrays, you can try to speed it up by using modular types instead (but
it might not become faster). That says nothing about portability; if you
were happy with the limited (IMO) portability of packed Boolean arrays,
you will not lose portability by moving to modular types (but you may
have to find out in which order your Ada compiler indexes packed Boolean
arrays, in order to transform your program correctly).
>
> I looked at how compilers will translate operations on slices of
> Booleans (-Os is an interesting option with GNAT). They will emit
> instructions for shifting and logical operations; no surprise.
No surprise, I agree. But the problem is the undefined indexing order.
Packed Boolean arrays are good and useful for bit-vector operations. But
Unchecked_Conversion to and from other types, such as modular types, is
not portable.
>
> So, would there be a portable Base64 algorithm that reads from
> a stream of storage elements, perhaps from a typical controller's
> 8-bit I/O port, that
>
> (*) shifts and "reasons" more reliably, readably, and portably,
>
Sure, use some Interfaces.Unsigned_N as a bit-buffer, as I sketched in
an earlier message.
> (*) performs shifts and logical operations much faster
> than the shifts and logical operations generated by
> typical compilers for bit slices or representations,
>
You want a portable method that is *much faster* than less portable
methods? That is a lot to ask...
The Unsigned_N-bit-buffer method is probably no slower than the
unportable bit-slice method. The Unsigned_N-bit-buffer method may be
slower than the method that uses records with representation clauses to
convert three 8-bit groups into four 6-bix groups at a time, but the
latter needsa machine with at least 24-bit "machine scalars".
> (*) runs on heterogeneous hardware with word size<= 16 bits?
The bit-buffer method works easily with any Unsigned_N with N >= 8+5 =
13, so Unsigned_16 is ok. It can be made to work with just Unsigned_8,
but with more difficulty.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-27 19:37 ` Niklas Holsti
@ 2011-12-27 20:49 ` Robert A Duff
2011-12-27 23:47 ` Niklas Holsti
0 siblings, 1 reply; 23+ messages in thread
From: Robert A Duff @ 2011-12-27 20:49 UTC (permalink / raw)
Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
> Whether a Boolean array is packed to Component_Size = 1, with no gaps,
> depends on the Ada compiler. It is not a hard requirement in the Ada RM,
> as I understand it.
For any compiler that supports the SP annex (which pretty-much all do),
it is a hard requirement. But if you're depending on Component_Size = 1
for the logical correctness of your program, it's better to say
"for T'Component_Size use 1;" (which also must be supported if the SP annex
applies).
> All of RM 13.2 is non-binding recommendation; the text defines the
> recommended level of support, using "should". The only "shall" in 13.2
> is in a legality rule. No properties of packed Boolean arrays are
> "guaranteed" by 13.2.
Right, but C.2(2) turns all those "should"s into "shall"s.
> But there is not even a recommendation on the order in which bits in a
> packed Boolean array are indexed, as far as I can see.
True, but in practice, it will follow the endianness of the machine.
Likewise, the RM doesn't say which bits of an integer represent
what, but in practice, the implementation is unlikely to make
bit 17 be the high-order bit of a 32-bit integer. ;-)
I'd prefer these rules to be nailed down better. But they're not, so
you really have no choice but to rely to some extent on compilers doing
sensible things.
> Sure, if your program is too slow (but works) using packed Boolean
> arrays, you can try to speed it up by using modular types instead (but
> it might not become faster). That says nothing about portability; if you
> were happy with the limited (IMO) portability of packed Boolean arrays,
> you will not lose portability by moving to modular types (but you may
> have to find out in which order your Ada compiler indexes packed Boolean
> arrays, in order to transform your program correctly).
GNAT turns small packed arrays into modular integers internally, so you
should expect to get the same efficiency.
- Bob
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-27 20:49 ` Robert A Duff
@ 2011-12-27 23:47 ` Niklas Holsti
2011-12-29 0:50 ` Robert A Duff
0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-27 23:47 UTC (permalink / raw)
On 11-12-27 22:49 , Robert A Duff wrote:
> Niklas Holsti<niklas.holsti@tidorum.invalid> writes:
>
>> Whether a Boolean array is packed to Component_Size = 1, with no gaps,
>> depends on the Ada compiler. It is not a hard requirement in the Ada RM,
>> as I understand it.
>
> For any compiler that supports the SP annex (which pretty-much all do),
> it is a hard requirement.
Yes, a compiler cannot claim to support annex C (Systems Programming)
unless it implements chapter 13 as recommended, so that all the
"shoulds" are implemented. But this is only an argument for "probable"
portability, since supporting annex C is optional.
For this discussion, I have again studied the chapter 13 rules about
Bit_Order and record representation clauses. I think I understand now
how it is meant to work, at least to some extent. As I understand it, if
you specify Bit_Order and want your record representation clauses to be
portable, the form of your clauses is strongly limited by the size of
the "largest machine scalar" in the compilers you use. In particular, if
you want to follow the style that always writes "at 0 range first_bit ..
last_bit", you cannot specify the layout of a record that is larger than
the largest machine scalar.
For example, the 24-bit record that Natasha suggested for the Base-64
encoding does not work if the largest machine scalar is less than 24
bits in size.
Out of curiosity, what is the case for JGNAT? Does it support annex C,
and if not, how much of chapter 13 does it implement?
>> But there is not even a recommendation on the order in which bits in a
>> packed Boolean array are indexed, as far as I can see.
>
> True, but in practice, it will follow the endianness of the machine.
In that case, the meaning of slices of packed Boolean arrays, such as
Georg Bauhaus presented, is different for little-endian and big-endian
machines, and this cannot be corrected with a Bit_Order specification,
> Likewise, the RM doesn't say which bits of an integer represent
> what,
Well, combining the definition of Bit_Order in RM 13.5.3 with the
definition of the First_Bit and Last_Bit attributes in RM 13.5.2
suggests rather strongly that bit numbers defined by "offsets in bits"
correspond either to increasing or decreasing significance in unsigned
integer values.
> but in practice, the implementation is unlikely to make
> bit 17 be the high-order bit of a 32-bit integer. ;-)
Agreed.
> I'd prefer these rules to be nailed down better. But they're not, so
> you really have no choice but to rely to some extent on compilers doing
> sensible things.
You can avoid chapter 13 when portability is required, and use other
means, such as Interfaces.Unsigned_N, where the only uncertainty is
which sizes of Unsigned are implemented. But it is usually easy to use a
larger Unsigned, if the exact size you would like to have is not
provided. For example, the natural size of the bit-buffer for the
Base-64 encoding is 8+(6-1) = 13 bits, but an Unsigned_N for any N >= 13
can be used as well.
In contrast, the 24-bit-record-method falls apart if the compiler
rejects its representation clause for some reason (for example because
the largest machine scalar is less than 24 bits).
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-26 8:33 ` Niklas Holsti
@ 2011-12-28 0:09 ` Randy Brukardt
0 siblings, 0 replies; 23+ messages in thread
From: Randy Brukardt @ 2011-12-28 0:09 UTC (permalink / raw)
"Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message
news:9lqpq5Ft05U1@mid.individual.net...
> On 11-12-23 03:30 , Randy Brukardt wrote:
>> "Niklas Holsti"<niklas.holsti@tidorum.invalid> wrote in message
>> news:9lgi3jFhaU1@mid.individual.net...
>>> On 11-12-22 11:41 , Natasha Kerensikova wrote:
>> ...
>>>> type Octet_Block is array (1 .. 3) of Octet;
>>>> pragma Pack (Octet_Block);
>>>
>>> I would add the following, to check that packing is effective:
>>>
>>> for Octet_Block'Size use 24;
>>
>> I think it would probably be better to forget Pack and actually say what
>> you
>> mean here:
>>
>> type Octet_Block is array (1 .. 3) of Octet;
>> for Octet_Block'Component_Size use 8;
>>
>> "Pack" should be reserved for cases where you don't care about the exact
>> layout, you just want to save space. Here you do care about the exact
>> layout, so use a Component_Size.
>
> Fine adjustment: when Component_Size is not a factor or multiple of the
> word size, RM 13.3(73) says that the array may also have to be packed in
> order to eliminate gaps between components.
>
> Of course, a Component_Size of 8 bits is usually a factor of the word
> size. But perhaps not always.
You're right of course, although I would hope that no compiler was silly
enough to ignore a direct component-size command by putting gaps into it.
I'd much rather the compiler rejected such a clause (surely Janus/Ada
would). If you want gaps, you should say so. For instance, on the U2200 (a
36-bit machine with 9-bit divisions), you'd have to say
type Octet_Block is array (1 .. 3) of Octet;
for Octet_Block'Component_Size use 9;
if you wanted gaps. (And 8 might have been rejected, I don't remember
anymore whether the Unisys code generator could handle that.) That's why
Janus/Ada uses a constant Host.Stor_Unit_Size in all of its representation
clauses. (As you might guess, package Host contains a bunch of constants and
subprograms defining the host environment for the compiler. You'd probably
also guess that there is a similar package Target, and if you did, you'd be
right.)
Randy.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-23 1:42 ` Randy Brukardt
@ 2011-12-28 8:59 ` Niklas Holsti
2011-12-29 5:41 ` Randy Brukardt
0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2011-12-28 8:59 UTC (permalink / raw)
On 11-12-23 03:42 , Randy Brukardt wrote:
> "Niklas Holsti"<niklas.holsti@tidorum.invalid> wrote in message
> news:9lgls6FticU1@mid.individual.net...
> ...
>>> http://www.adacore.com/2008/03/03/gem-27/
>>
>> I am surprised and disappointed that there is *no* mention of portability
>> problems in that "gem". This is marketing hype for GNAT, not sound
>> programming advice.
>
> There is no portability problem in common use.
Depends on what you consider common. The gem even claims that its
representation clause makes "Ada" use a biased 7-bit representation for
an Integer component with range 100..227. RM chapter 13 only recommends
that compilers should support unbiased representations.
Which Ada compilers, other than GNAT, can use biased representations?
> For instance, Natasha's code
> (once the rep. clauses are right) will work on both a big-endian and
> little-endian machine without change.
Once the representation clauses are corrected, as you say, Natasha's
code is the best that can be achieved with chapter 13. But...
Natasha's 24-bit-record code works only if the Ada implementation (a)
supports chapter 13 as recommended and (b) has a machine scalar of at
least 24 bits.
We know that some Ada programs run on the 16-bit TI MSP430. Will the
24-bit record code work there? Ok, MSP430 Ada programs are uncommon, but
perhaps we want to make them more common.
Natasha's code that uses (packed) arrays with Component_Sizes of 6 and 8
bits has more complex portability questions. Of course it also requires
support for chapter 13, but I think it needs more than that. It depends
on the indexing order and the word size. I don't find anything in
chapter 13 or elsewhere in the RM that even requires the indexing order
to be the same for different array types, although this is of course
very likely to be the case.
> The problem occurs when you need to process big-endian data on a
> little-endian machine (or vice versa). That is not that common of a need --
The problem occurs when an application inputs or outputs binary data,
and the application should be coded in a portable way. I can't believe
that this is uncommon.
> Most code does not care which bits are which bits in the word -- like this
> encoding code -- they just need a consistent representation. That is, it
> doesn't matter where bit 0 is, so long as it is the same in every item that
> is processed.
True for internal data, false for inputs and outputs.
If all you want to do is to compress internal data, use pragma Pack, as
you suggested.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-27 23:47 ` Niklas Holsti
@ 2011-12-29 0:50 ` Robert A Duff
2011-12-30 20:54 ` anon
2011-12-30 20:56 ` Niklas Holsti
0 siblings, 2 replies; 23+ messages in thread
From: Robert A Duff @ 2011-12-29 0:50 UTC (permalink / raw)
Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
> Yes, a compiler cannot claim to support annex C (Systems Programming)
> unless it implements chapter 13 as recommended, so that all the
> "shoulds" are implemented. But this is only an argument for "probable"
> portability, since supporting annex C is optional.
Right.
But of course supporting the Ada standard is optional, too. ;-)
It's easy to forget that standards don't actually _require_ anybody
to do anything. So, unfortunately, the best you can be sure of is
"probable" portability.
> Out of curiosity, what is the case for JGNAT?
I don't know much about JGNAT. I think it doesn't support some
things that are "impossible or impractical" (see AARM-1.1.3(6)),
given the limitations of the JVM.
- Bob
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-28 8:59 ` Niklas Holsti
@ 2011-12-29 5:41 ` Randy Brukardt
2011-12-29 10:10 ` Dmitry A. Kazakov
0 siblings, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2011-12-29 5:41 UTC (permalink / raw)
"Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message
news:9m0433FjokU1@mid.individual.net...
> On 11-12-23 03:42 , Randy Brukardt wrote:
>> "Niklas Holsti"<niklas.holsti@tidorum.invalid> wrote in message
>> news:9lgls6FticU1@mid.individual.net...
>> ...
>>>> http://www.adacore.com/2008/03/03/gem-27/
>>>
>>> I am surprised and disappointed that there is *no* mention of
>>> portability
>>> problems in that "gem". This is marketing hype for GNAT, not sound
>>> programming advice.
>>
>> There is no portability problem in common use.
>
> Depends on what you consider common. The gem even claims that its
> representation clause makes "Ada" use a biased 7-bit representation for an
> Integer component with range 100..227. RM chapter 13 only recommends that
> compilers should support unbiased representations.
>
> Which Ada compilers, other than GNAT, can use biased representations?
Ah, I missed that. I don't know of any other Ada compilers that support
biased representations.
...
> Natasha's 24-bit-record code works only if the Ada implementation (a)
> supports chapter 13 as recommended and (b) has a machine scalar of at
> least 24 bits.
No, not true. Machine scalars only come into play if you need to use the
non-native bit order, and that isn't needed here.
Once you have to work with the non-native bit order, things get messy. But
that is rarely needed.
> We know that some Ada programs run on the 16-bit TI MSP430. Will the
> 24-bit record code work there? Ok, MSP430 Ada programs are uncommon, but
> perhaps we want to make them more common.
Sure, it will work there, so long as you don't try to mess with the
non-native bit order.
> Natasha's code that uses (packed) arrays with Component_Sizes of 6 and 8
> bits has more complex portability questions. Of course it also requires
> support for chapter 13, but I think it needs more than that. It depends on
> the indexing order and the word size. I don't find anything in chapter 13
> or elsewhere in the RM that even requires the indexing order to be the
> same for different array types, although this is of course very likely to
> be the case.
Worrying about indexing order is just plain silly. As Bob points out,
*nothing* is really required by a standard, you have to trust implementers
to do reasonable things.
>> The problem occurs when you need to process big-endian data on a
>> little-endian machine (or vice versa). That is not that common of a
>> need --
>
> The problem occurs when an application inputs or outputs binary data, and
> the application should be coded in a portable way. I can't believe that
> this is uncommon.
Only if the binary data is in the non-native bit order. If the binary data
is in the native bit order, there is nothing that special that needs to be
done. And it doesn't matter which bit order that is.
>> Most code does not care which bits are which bits in the word -- like
>> this
>> encoding code -- they just need a consistent representation. That is, it
>> doesn't matter where bit 0 is, so long as it is the same in every item
>> that
>> is processed.
>
> True for internal data, false for inputs and outputs.
Not really. The vast majority of inputs and outputs are in text form (as in
the encoding example). Most of the rest are just a pure stream of bytes. And
the interesting thing with a stream of bytes is that you get identical
behavior for both bit orders -- because the bit order and byte order are
tied together. The problem comes when you try to use the non-native bit and
byte order on a machine.
For instance, in the encoding example, all of the messy manipulation occurs
completely inside the application: the input is a stream of bytes and the
output is a stream of characters -- neither are subject to any byte order
issues themselves. That being the case, which bytes go where (internally)
turns out to be irrelevant, because the native order will have the right
effect either way.
It took me a long time to grasp this; I only figured it out after struggling
for a long time with our old 68000 code generator (which is big-endian, of
course). Once I got the code selection right, nothing much needed to be
changed. I understand when others don't see this; it's easy to think too
much in terms of the exact bits involved, but those matter less than it may
seem.
Randy.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-29 5:41 ` Randy Brukardt
@ 2011-12-29 10:10 ` Dmitry A. Kazakov
0 siblings, 0 replies; 23+ messages in thread
From: Dmitry A. Kazakov @ 2011-12-29 10:10 UTC (permalink / raw)
On Wed, 28 Dec 2011 23:41:57 -0600, Randy Brukardt wrote:
> "Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message
> news:9m0433FjokU1@mid.individual.net...
>> True for internal data, false for inputs and outputs.
>
> Not really. The vast majority of inputs and outputs are in text form (as in
> the encoding example). Most of the rest are just a pure stream of bytes. And
> the interesting thing with a stream of bytes is that you get identical
> behavior for both bit orders -- because the bit order and byte order are
> tied together. The problem comes when you try to use the non-native bit and
> byte order on a machine.
No. It is true that bit order within the octets is properly handled by the
hardware, wrong is that you could ignore it once you got the octets read.
Many octet-based protocols use bit fields spanning across several octets
starting and ending anywhere in the octet. Probably you could ignore this
reordering octets of the stream into the machine-native order. If you knew
how to, because of nice middle-endian representations.
Formally speaking a stream cannot be reordered at all, because it is
unbounded. You could reorder only octets of a packet, a word etc. But not
before calculating the check sums. The real-life protocols are difficult to
crack.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-29 0:50 ` Robert A Duff
@ 2011-12-30 20:54 ` anon
2011-12-30 20:56 ` Niklas Holsti
1 sibling, 0 replies; 23+ messages in thread
From: anon @ 2011-12-30 20:54 UTC (permalink / raw)
With the current Ada Standard there are too many options and that called
"The Killing and the Death" for any language. There should be only one
option for the Ada Standard. Either you support all features or no features
not even the language itself.
Thou Ada Standard might include a sub-class for those compilers that do
not follow the complete standard and must use the word "Sub-Ada" when
referring to its compiler and libraries. And all impractical sections
must be documented in detail and approved by a non-ARG sub-committee
appointed by the ARG. But for the most Ada systems that too much work.
An example of a "Sub-Ada" could be where Ada is implemented on a JVM.
Some might say the "Machine_Code" package is impractical, because
on these systems there are two assembly languages. The first being the
"J-Code" used by the JVM and the second being the hardware CPU assembly
which in most cases in unknown to the JVM. Even though Ada does not know
the hardware the Ada system should know the "J-Code" for the Java version
it was written for.
But the "System.RPC" package is another story, because no Java machine
support the RPC sub-system so the JVM version would have to be approved
by the Ada sub-committee or no JVM Ada would exist. Unless the RPC
sub-system is fully emulated for a JVM in Ada software.
In <wcclipwcian.fsf@shell01.TheWorld.com>, Robert A Duff <bobduff@shell01.TheWorld.com> writes:
>Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>
>> Yes, a compiler cannot claim to support annex C (Systems Programming)
>> unless it implements chapter 13 as recommended, so that all the
>> "shoulds" are implemented. But this is only an argument for "probable"
>> portability, since supporting annex C is optional.
>
>Right.
>
>But of course supporting the Ada standard is optional, too. ;-)
>It's easy to forget that standards don't actually _require_ anybody
>to do anything. So, unfortunately, the best you can be sure of is
>"probable" portability.
>
>> Out of curiosity, what is the case for JGNAT?
>
>I don't know much about JGNAT. I think it doesn't support some
>things that are "impossible or impractical" (see AARM-1.1.3(6)),
>given the limitations of the JVM.
>
>- Bob
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Representation clauses for base-64 encoding
2011-12-29 0:50 ` Robert A Duff
2011-12-30 20:54 ` anon
@ 2011-12-30 20:56 ` Niklas Holsti
1 sibling, 0 replies; 23+ messages in thread
From: Niklas Holsti @ 2011-12-30 20:56 UTC (permalink / raw)
On 11-12-29 02:50 , Robert A Duff wrote:
> Niklas Holsti<niklas.holsti@tidorum.invalid> writes:
>
>> Yes, a compiler cannot claim to support annex C (Systems Programming)
>> unless it implements chapter 13 as recommended, so that all the
>> "shoulds" are implemented. But this is only an argument for "probable"
>> portability, since supporting annex C is optional.
>
> Right.
>
> But of course supporting the Ada standard is optional, too. ;-)
Yes, but we are talking about Ada programming, which to me means using
an Ada compiler that follows the standard. For me, the issue is what
level of portability the standard provides; the actual current
implementations are secondary.
> It's easy to forget that standards don't actually _require_ anybody
> to do anything. So, unfortunately, the best you can be sure of is
> "probable" portability.
I hope you would agree that Standard.Integer "certainly" has at least 16
bits in a conforming Ada implementation, so that is one point of
"certain" portability. The portability of representation clauses is less
certain, since conformance is optional. In contrast, package Interfaces
and its shift operations are in the core of the language (RM 1.1.2(5)),
to which all imoplementations shall conform (RM 1.1.2(17)).
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2011-12-30 20:56 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-22 9:41 Representation clauses for base-64 encoding Natasha Kerensikova
2011-12-22 11:20 ` Niklas Holsti
2011-12-23 1:30 ` Randy Brukardt
2011-12-26 8:33 ` Niklas Holsti
2011-12-28 0:09 ` Randy Brukardt
2011-12-22 11:37 ` Georg Bauhaus
2011-12-22 12:24 ` Niklas Holsti
2011-12-22 15:09 ` Georg Bauhaus
2011-12-22 16:00 ` Natasha Kerensikova
2011-12-22 22:18 ` Georg Bauhaus
2011-12-25 10:17 ` Niklas Holsti
2011-12-27 11:23 ` Georg Bauhaus
2011-12-27 19:37 ` Niklas Holsti
2011-12-27 20:49 ` Robert A Duff
2011-12-27 23:47 ` Niklas Holsti
2011-12-29 0:50 ` Robert A Duff
2011-12-30 20:54 ` anon
2011-12-30 20:56 ` Niklas Holsti
2011-12-23 1:42 ` Randy Brukardt
2011-12-28 8:59 ` Niklas Holsti
2011-12-29 5:41 ` Randy Brukardt
2011-12-29 10:10 ` Dmitry A. Kazakov
2011-12-23 1:33 ` Randy Brukardt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox