Big-endian vs little-endian

comp.lang.ada
 help / color / mirror / Atom feed

* Big-endian vs little-endian
@ 1999-01-29  0:00 Mike Werner
  1999-02-02  0:00 ` Nick Roberts
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Werner @ 1999-01-29  0:00 UTC (permalink / raw)


Now, I know that this problem has been around for a long time.  I also
am aware of the fact that it is basically a hardware implementation
based problem.  Or at least that's what I've always been told.

I recently had a project for school that involved a binary data file
that needed read using sequential IO.  As the data file was created on
the departments server and I was doing the project on my PC at home, I
ran into the endian problem.  The text string and the two enumeration
types read iin just fine - it was the two numeric fields that were
hosed.  I did manage to get around the problem by creating my own data
file to test with - fortunately the instructor told us what was in the
file.

But I got to wondering - shouldn't things like that be standardized by
now?  Or at least a way for the compiler to deal with such things?  If
such does exist, then I apologize for dragging this up and humbly
request a pointer to where to find such info.  If not ... well once I
figure out what I'm doing maybe I'll tackle that as a project some day.
-- 
Mike Werner  KA8YSD           |  "Where do you want to go today?"
ICQ# 12934898                 |  "As far from Redmond as possible!"
'91 GS500E                    |
Morgantown WV                 |

-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GU d-@ s:+ a- C++>$ UL++ P+ L+++ E W++ N++ !o w--- O- !M V-- PS+ PE+
 Y+ R+ !tv b+++(++++) DI+ D--- G e*>++ h! r++ y++++
------END GEEK CODE BLOCK------





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-01-29  0:00 Big-endian vs little-endian Mike Werner
@ 1999-02-02  0:00 ` Nick Roberts
  1999-02-03  0:00   ` Mark A Biggar
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Nick Roberts @ 1999-02-02  0:00 UTC (permalink / raw)


I can think of two possible solutions:

(a) declare a type derived from Interfaces.Integer_8/16/32 etc. (RM95 B.2),
and then apply a Bit_Order representation clause (RM95 13.5.3) to this type;

(b) use Text_IO instead of Sequential_IO, and input and output the data in
the form of text.

The advantage of (b) is that text is the most universal data format: non-Ada
programs will (almost always) be able to use the data (if that's what you
might ever require). The disadvantage is that the text uses up more storage
than its equivalent binary form. How much data do you have?

The problem with (a) is that it isn't applicable to real types. Nor will it
work if your compiler is an Ada 83 (rather than 95) compiler.

Power to your ulna.

-------------------------------------------
Nick Roberts
-------------------------------------------







^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-02-02  0:00 ` Nick Roberts
@ 1999-02-03  0:00   ` Mark A Biggar
  1999-02-06  0:00     ` Samuel T. Harris
  1999-02-04  0:00   ` Richard D Riehle
  1999-02-06  0:00   ` Mike Werner
  2 siblings, 1 reply; 11+ messages in thread
From: Mark A Biggar @ 1999-02-03  0:00 UTC (permalink / raw)

Nick Roberts wrote:

> (b) use Text_IO instead of Sequential_IO, and input and output the data in
> the form of text.
> 
> The advantage of (b) is that text is the most universal data format: non-Ada
> programs will (almost always) be able to use the data (if that's what you
> might ever require). The disadvantage is that the text uses up more storage
> than its equivalent binary form. How much data do you have?

umm..  How many times have you actually coded this up both ways and compared.
Almost every time I have tried this the text version of the data was smaller
then the binary version, especially if you have variable sized data.  The
only cases where the text was bigger envolved data that consisted of large
amounts of high percession floats and even then the text was only about twice 
as big.  Even then, usually the advantages of portablility and human readablity
of the text format outweigh the small space savings of binary data formats.

--
Mark Biggar
mark.a.biggar@lmco.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-02-02  0:00 ` Nick Roberts
  1999-02-03  0:00   ` Mark A Biggar
@ 1999-02-04  0:00   ` Richard D Riehle
  1999-02-06  0:00   ` Mike Werner
  2 siblings, 0 replies; 11+ messages in thread
From: Richard D Riehle @ 1999-02-04  0:00 UTC (permalink / raw)

In article <7982p9$nll$3@plug.news.pipex.net>,
	"Nick Roberts" <Nick.Roberts@dial.pipex.com> wrote:

>I can think of two possible solutions:
>
>(a) [ snipped >

>(b) use Text_IO instead of Sequential_IO, and input and output the data in
>the form of text.
>
>The advantage of (b) is that text is the most universal data format: 

 The Text_IO solution is especially useful when converting floating point
 from one machine to floating point on another.  For example, where is the
 sign bit on a VAX 32 floating point number?  You'd be surprised!  We were
 converting VAX floating point to IBM mainframe floating point.  People 
 came up with all sorts of algorithmic solutions.  The best solution was
 to write the VAX numbers to a text file and read the text file back to
 to the IBM.  No fuss. No muss. No algorithmic gymnastics.  

 Richard Riehle
 richard@adaworks.com
 http://www.adaworks.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-02-02  0:00 ` Nick Roberts
  1999-02-03  0:00   ` Mark A Biggar
  1999-02-04  0:00   ` Richard D Riehle
@ 1999-02-06  0:00   ` Mike Werner
  1999-02-07  0:00     ` Matthew Heaney
                       ` (2 more replies)
  2 siblings, 3 replies; 11+ messages in thread
From: Mike Werner @ 1999-02-06  0:00 UTC (permalink / raw)


Nick Roberts wrote:
> 
> I can think of two possible solutions:
> 
> (a) declare a type derived from Interfaces.Integer_8/16/32 etc. (RM95 B.2),
> and then apply a Bit_Order representation clause (RM95 13.5.3) to this type;
> 
> (b) use Text_IO instead of Sequential_IO, and input and output the data in
> the form of text.

I wish I could use (b) - unfortunately this program was for class and we
had to use the data file provided by the instructor.  I looked at RM
13.5.3 as pointed out in (a) but really did not understand it.  I'm very
new at Ada, and that LRM is quite a ways over my head.  But I'll take a
stab and see if I've got the general idea.

Here's the relevant data structure:

   type Sys_type is (Zarya, Unity, PMA1, PMA2);
   type Subsys_type is (CDH, CT, ECLSS, EPS, GNC, SM);
   subtype Desc_type is String(1..256);
   subtype Dur_Min_Type is Integer;
   subtype Dur_Sec_type is Integer;
   type Apm_Rec is 
      record
	 Description : Desc_Type;      
	 System : Sys_Type;
	 Subsystem : Subsys_Type;
	 Dur_Min : Dur_Min_Type;
	 Dur_Sec : Dur_Sec_Type;  
      end record;

The problematic parts were the Apm_Rec.Dur_Min and the Apm_Rec.Dur_Sec -
all the others read in just fine.  If I'm understanding all this, should
I have changed

   subtype Dur_Min_Type is Integer;
   subtype Dur_Sec_type is Integer;

to

   subtype Dur_Min_Type is Integer(S'Bit_Order=>Low_Order_First);
   subtype Dur_Sec_type is Integer(S'Bit_Order=>Low_Order_First);

or perhaps High_Order_First - haven't got everything handy at the
moment.  But the main question is do I have the right syntax and usage? 
Or am I completely off here?

Really the main reason I'm concerned with this is I much prefer to do my
assignments on my own computer as opposed to telnetting into the
school's server - that telnet connection lags badly enough that most any
task requires much patience.  If it weren't for that, I probably
wouldn't worry about it.  I can work around this for future projects
(hopefully) the same way I did for this project - I'm just looking for
an easier way.  And I do appreciate all the pointers so far - I'm
reading the LRM as I have the opportunity.  Just the slight problem of
most of it being well beyond my knowledge level.  But I'm working on it.
-- 
Mike Werner  KA8YSD           |  "Where do you want to go today?"
ICQ# 12934898                 |  "As far from Redmond as possible!"
'91 GS500E                    |
Morgantown WV                 |

-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GU d-@ s:+ a- C++>$ UL++ P+ L+++ E W++ N++ !o w--- O- !M V-- PS+ PE+
 Y+ R+ !tv b+++(++++) DI+ D--- G e*>++ h! r++ y++++
------END GEEK CODE BLOCK------





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-02-03  0:00   ` Mark A Biggar
@ 1999-02-06  0:00     ` Samuel T. Harris
  1999-02-08  0:00       ` dennison
  0 siblings, 1 reply; 11+ messages in thread
From: Samuel T. Harris @ 1999-02-06  0:00 UTC (permalink / raw)

Mark A Biggar wrote:
> 
> Nick Roberts wrote:
> 
> > (b) use Text_IO instead of Sequential_IO, and input and output the data in
> > the form of text.
> >
> > The advantage of (b) is that text is the most universal data format: non-Ada
> > programs will (almost always) be able to use the data (if that's what you
> > might ever require). The disadvantage is that the text uses up more storage
> > than its equivalent binary form. How much data do you have?
> 
> umm..  How many times have you actually coded this up both ways and compared.
> Almost every time I have tried this the text version of the data was smaller
> then the binary version, especially if you have variable sized data.  The
> only cases where the text was bigger envolved data that consisted of large
> amounts of high percession floats and even then the text was only about twice
> as big.  Even then, usually the advantages of portablility and human readablity
> of the text format outweigh the small space savings of binary data formats.
> 
> --
> Mark Biggar
> mark.a.biggar@lmco.com

As Technical Lead on a Air Force major command and control system,
our initial implementation used textual representations for all
the messaging between the distributed workstations and the central
server. This got us a working product much faster than dealing
with binary representations since the workstation and the central
server hardware were so contrary to each other. This also provided
easy network debugging with a simple sniffer/snopper (which was
also a security concern). Since then, I have always advocated
producing width, image, and value functions for all important
data types. In fact, I have generics which produce these functions
for arrays (trivial) and records (almost trivial) so the overhead
for developing these functions is insignificant. An they do come
in handy when a little text_io based debugging instrumentation
is needed. A simple put_line(image(whatever)); is always available.

-- 
Samuel T. Harris, Principal Engineer
Raytheon, Scientific and Technical Systems
"If you can make it, We can fake it!"

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-02-06  0:00   ` Mike Werner
@ 1999-02-07  0:00     ` Matthew Heaney
  1999-02-09  0:00     ` Stephen Leake
  1999-02-10  0:00     ` Mike Werner
  2 siblings, 0 replies; 11+ messages in thread
From: Matthew Heaney @ 1999-02-07  0:00 UTC (permalink / raw)


Mike Werner <mwerner@wvu.edu> writes:

> Here's the relevant data structure:
> 
>    type Sys_type is (Zarya, Unity, PMA1, PMA2);
>    type Subsys_type is (CDH, CT, ECLSS, EPS, GNC, SM);
>    subtype Desc_type is String(1..256);
>    subtype Dur_Min_Type is Integer;
>    subtype Dur_Sec_type is Integer;
>    type Apm_Rec is 
>       record
> 	 Description : Desc_Type;      
> 	 System : Sys_Type;
> 	 Subsystem : Subsys_Type;
> 	 Dur_Min : Dur_Min_Type;
> 	 Dur_Sec : Dur_Sec_Type;  
>       end record;
> 
> The problematic parts were the Apm_Rec.Dur_Min and the Apm_Rec.Dur_Sec -
> all the others read in just fine.  If I'm understanding all this, should
> I have changed
> 
>    subtype Dur_Min_Type is Integer;
>    subtype Dur_Sec_type is Integer;
> 
> to
> 
>    subtype Dur_Min_Type is Integer(S'Bit_Order=>Low_Order_First);
>    subtype Dur_Sec_type is Integer(S'Bit_Order=>Low_Order_First);
> 
> or perhaps High_Order_First - haven't got everything handy at the
> moment.  But the main question is do I have the right syntax and usage? 
> Or am I completely off here?


Yes, you are completely off.

Don't bother with the Bit_Order attribute.  No compiler vendors support
it.

That leads us to this:

    type Sys_type is (Zarya, Unity, PMA1, PMA2);

    type Subsys_type is (CDH, CT, ECLSS, EPS, GNC, SM);

    type Apm_Rec is 
       record
 	 Description : Desc_Type (1 .. 256);      
 	 System      : Sys_Type;
 	 Subsystem   : Subsys_Type;
 	 Dur_Min     : Integer range 0 .. 59;
 	 Dur_Sec     : Integer range 0 .. 59;  
       end record;

There are two advantages to this:

1) we can pack the last four fields in one longword

2) the integer types fit in 1 byte, so we don't have to worry about
   byte-swapping

I think you should now be able to write a standard rep clause for this
record ("for Apm_Rec use ..."), that will work for both big- and
little-endian machines.  (Because the latter fields are 1 byte, and the
representation of one-byte data on either machine is the same.)

If you do decide to use 32-bit integers for Dur_Min and Dur_Sec, you
still don't have a rep clause problem.  But you do have a byte-swapping
problem.











^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-02-08  0:00       ` dennison
@ 1999-02-08  0:00         ` Samuel T. Harris
  0 siblings, 0 replies; 11+ messages in thread
From: Samuel T. Harris @ 1999-02-08  0:00 UTC (permalink / raw)

dennison@telepath.com wrote:
> 
> In article <36BD02DB.737849EE@hso.link.com>,
>   "Samuel T. Harris" <sam_harris@hso.link.com> wrote:
> 
> > also a security concern). Since then, I have always advocated
> > producing width, image, and value functions for all important
> > data types. In fact, I have generics which produce these functions
> > for arrays (trivial) and records (almost trivial) so the overhead
> > for developing these functions is insignificant. An they do come
> > in handy when a little text_io based debugging instrumentation
> > is needed. A simple put_line(image(whatever)); is always available.
> 
> Cool idea! But what method do you use to make generation of records "almost
> trivial"? And how do you handle pointers?
> 

Glad you asked.

Given the requirements that all images look like Ada aggregates,
supporting format options to specify the optional "decorations"
such as qualification and positional vs named notation (which I'll
justify later on in this message), then you have the following
needs for the generics.

A generic for producing width, image, and value functions for
arrays needs to know the type name as a string for qualification,
the width, image, and value functions for the index type
(usually readily available from the appropriate attributes),
the width, image, and value functions for the component type
(possible made available from a previous instantiation of
one of these generics). I believe we all can see the trivial
nature of the width and image functions. The value function
is not so trivial, having to support an optional qualification
and having to deal with named and positional notation. But
its not too difficult once a little effort if put into it.
Of course, an initial version of these generics can be
limited to support positional notation only since this
greatly simplifies the value function.

A generic for producing width, image, and value functions for
records is a little more trouble some. The array generic can
directly use the index to get the component. The record generic
cannot. So, you have to provide extra "helper" functions in the
form of field-level width, image, and value functions. The record
generic needs the type name as a string for qualification,
an enumerated type for all the discriminants and record fields,
the width, image, and value functions for this field-nameing type,
and a width and image function which takes a record object
and field_name and produces the appropriate result (similar
to the component functions of the array generic). It is the
value function which different. It has to be a procedure taking
an in out record object and a field_name with the objective
of filling the appropriate field with the given string.
Each of the helper subprograms uses a simple case statement
to call the appropriate width, image, value for the field
identified. Variant records are no problem since the generic
functions run through all the field names. It is up to the
field-level helper subprogram to either perform an action
or do nothing for fields not in the variant in use.

As far as pointers are concerned, they are outputed as
an allocation by the component (for arrays) or field-level
(for records) subprogram. One may envision a parallel
to the image function called debug which outputs the
pointer itself in some appropriate format so the reader
can track the actual pointers themselves. This is useful
not only for debugging, but also useful when the array
of pointers reuses the same pointer in several slots.
The usage of allocator notation in the image does not
reflect that property. OTOH, output the pointer itself
with its dereference does not satisfy keeping the
image compliant with Ada aggregate notation, so I usually
use a separate subprogram for this or provide options to
image to control the outputed format.

Keeping image compliant with Ada aggregate notation is
important when you consider code which have large aggregates
intializing complex data structures. With a conforming image/value
function pair in place, you can copy the aggregate text
to a file and use the enclosing package elaboration to do
text_io on the file to initialize the data structure with
the image of the file contents. While this will slow down
elaboration, this does allow you change the data structure
without recompiling and relinking the program. I find this
very powerful during development. Once the initial data is
tested and locked down, you can paste it back into the
declaration of the object and comment out the text_io code
in the package elaboration.

-- 
Samuel T. Harris, Principal Engineer
Raytheon, Scientific and Technical Systems
"If you can make it, We can fake it!"

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-02-06  0:00     ` Samuel T. Harris
@ 1999-02-08  0:00       ` dennison
  1999-02-08  0:00         ` Samuel T. Harris
  0 siblings, 1 reply; 11+ messages in thread
From: dennison @ 1999-02-08  0:00 UTC (permalink / raw)


In article <36BD02DB.737849EE@hso.link.com>,
  "Samuel T. Harris" <sam_harris@hso.link.com> wrote:

> also a security concern). Since then, I have always advocated
> producing width, image, and value functions for all important
> data types. In fact, I have generics which produce these functions
> for arrays (trivial) and records (almost trivial) so the overhead
> for developing these functions is insignificant. An they do come
> in handy when a little text_io based debugging instrumentation
> is needed. A simple put_line(image(whatever)); is always available.

Cool idea! But what method do you use to make generation of records "almost
trivial"? And how do you handle pointers?

T.E.D.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-02-06  0:00   ` Mike Werner
  1999-02-07  0:00     ` Matthew Heaney
@ 1999-02-09  0:00     ` Stephen Leake
  1999-02-10  0:00     ` Mike Werner
  2 siblings, 0 replies; 11+ messages in thread
From: Stephen Leake @ 1999-02-09  0:00 UTC (permalink / raw)

Mike Werner <mwerner@wvu.edu> writes:

> Here's the relevant data structure:
> 
>    type Sys_type is (Zarya, Unity, PMA1, PMA2);
>    type Subsys_type is (CDH, CT, ECLSS, EPS, GNC, SM);
>    subtype Desc_type is String(1..256);
>    subtype Dur_Min_Type is Integer;
>    subtype Dur_Sec_type is Integer;
>    type Apm_Rec is 
>       record
> 	 Description : Desc_Type;      
> 	 System : Sys_Type;
> 	 Subsystem : Subsys_Type;
> 	 Dur_Min : Dur_Min_Type;
> 	 Dur_Sec : Dur_Sec_Type;  
>       end record;
> 
> The problematic parts were the Apm_Rec.Dur_Min and the Apm_Rec.Dur_Sec -
> all the others read in just fine.

You have a byte-endianness problem. System.Bit_Order address a
bit-endiannes problem. They are similar, but different.

The best Ada solution is to use streams to read the binary file. You
have to define your own Integer type (you should do this anyway, to
make sure it is the same size as the school's server Integer type!).
Then you can define the stream read and write functions to do byte
swapping.

If you're not up to streams (quite understandable :), you can just use
Unchecked_Conversion. Assuming 32 bit integers, do something like:

type Network_4_Bytes is record
    Hi_Byte : Interfaces.Unsigned_8;
    Byte_3 : Interfaces.Unsigned_8;
    Byte_2 : Interfaces.Unsigned_8;
    Low_Byte : Interfaces.Unsigned_8;
end record;
pragma Pack (Network_4_Bytes);
for Network_4_Bytes'size use 32; -- confirm size

function To_Network is new Unchecked_conversion
    (Source => Interfaces.Integer_32,
     Target => Network_4_Bytes);

Of course, to make your code portable to your school computer, you'll
have to hide this in a body, and set a compile-time flag to decide
whether to swap bytes or not. I define a package Endianness to handle
the compile-time flag.

Good luck!

-- Stephe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Big-endian vs little-endian
  1999-02-06  0:00   ` Mike Werner
  1999-02-07  0:00     ` Matthew Heaney
  1999-02-09  0:00     ` Stephen Leake
@ 1999-02-10  0:00     ` Mike Werner
  2 siblings, 0 replies; 11+ messages in thread
From: Mike Werner @ 1999-02-10  0:00 UTC (permalink / raw)


Thanks for all the tips and pointers so far.  I spoke with my instructor today 
and it appears we may not be using binary data files for a while, so I've got 
some time to try and figure out what some of the tips I've received mean. ;)

If I haven't figured it out by then, well I guess I'll try the kludge I used 
last time.  It wasn't pretty, but it worked.  My questions here were more out 
of puzzlement than anything else.
-- 
Mike Werner  KA8YSD           |  "Where do you want to go today?"
ICQ# 12934898                 |  "As far from Redmond as possible!"
'91 GS500E                    |
Morgantown WV                 |
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GU d-@ s:+ a- C++>$ UL++ P+ L+++ E W++ N++ !o w--- O- !M V-- PS+ PE+
 Y+ R+ !tv b+++(++++) DI+ D--- G e*>++ h! r++ y++++
------END GEEK CODE BLOCK------






^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~1999-02-10  0:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-29  0:00 Big-endian vs little-endian Mike Werner
1999-02-02  0:00 ` Nick Roberts
1999-02-03  0:00   ` Mark A Biggar
1999-02-06  0:00     ` Samuel T. Harris
1999-02-08  0:00       ` dennison
1999-02-08  0:00         ` Samuel T. Harris
1999-02-04  0:00   ` Richard D Riehle
1999-02-06  0:00   ` Mike Werner
1999-02-07  0:00     ` Matthew Heaney
1999-02-09  0:00     ` Stephen Leake
1999-02-10  0:00     ` Mike Werner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox