What is a byte?

comp.lang.ada
 help / color / mirror / Atom feed

* What is a byte?
@ 2014-07-28 19:09 Victor Porton
  2014-07-28 19:48 ` Dan'l Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Victor Porton @ 2014-07-28 19:09 UTC (permalink / raw)


When I need to pass a byte to a C function, which Ada type should I use?

-- 
Victor Porton - http://portonvictor.org


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-28 19:09 What is a byte? Victor Porton
@ 2014-07-28 19:48 ` Dan'l Miller
  2014-07-28 20:05   ` Dan'l Miller
  2014-07-28 22:38   ` Randy Brukardt
  2014-07-28 21:15 ` Simon Wright
  2014-07-29 10:53 ` Jeffrey Carter
  2 siblings, 2 replies; 13+ messages in thread
From: Dan'l Miller @ 2014-07-28 19:48 UTC (permalink / raw)


On Monday, July 28, 2014 2:09:17 PM UTC-5, Victor Porton wrote:
> When I need to pass a byte to a C function, which Ada type should I use?

That depends on the hardware processor for which the C compiler is generating code.  The vast majority of modern* processors have 8-bit bytes, but FPGAs are typically configurable to have 9-bit bytes with all 9 bits usable to represent 0 to 511 or -256 to 255 (or to have 8-bit bytes plus one parity bit, to represent 0 to 255 or -128 to 127) and many DSPs have 32-bit bytes.  Ultimately, a byte is the amount of bits that have been traversed when a char* (or void*) is incremented by one.  Also, you must consider whether the processor uses twos-complement for negative numbers (e.g., -128 to 127) or sign-magnitude (e.g., -127 to -0 and +0 to +127, where typically C is oblivious to -0 on sign-magnitude processors).

of notable historical interest:
a) The MIT/GE/Honeywell Multics machines had 9-bit bytes.
b) The Control Data mainframes had 6-bit bytes.
c) Prime 50 series had natively 16-bit bytes (which Prime strictly called words, not bytes), using C's definition of how many bits are traversed when incrementing void*.  But Prime 50 Series was one of the rare processors that also had bit-pointers in addition to word-pointers, so conceivably Prime 50 series could have any number of bits per increment of char* or void* (e.g., the language designer's choice of 1-bit bytes through 16-bit bytes), so Prime 50 Series C compilers had a mode that used a particular configuration of bit-pointers to supplement word-pointers to emulate 8-bit bytes, but this caused C pointers to be bigger than a C int, which violated the de facto idiom back then.  Hence, the C community on Prime 50 Series was split into 2 camps:
c.1) those shops that demanded pointers be the same size as integers (and forego the use of slower-executing bit-pointers to emulate 8-bit bytes in C)
versus
c.2) those shops that demanded 8-bit bytes (and forego the use of int as interchangeable with pointer in C).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-28 19:48 ` Dan'l Miller
@ 2014-07-28 20:05   ` Dan'l Miller
  2014-07-28 22:38   ` Randy Brukardt
  1 sibling, 0 replies; 13+ messages in thread
From: Dan'l Miller @ 2014-07-28 20:05 UTC (permalink / raw)

On Monday, July 28, 2014 2:48:16 PM UTC-5, Dan'l Miller wrote:
> On Monday, July 28, 2014 2:09:17 PM UTC-5, Victor Porton wrote:
> 
> > When I need to pass a byte to a C function, which Ada type should I use?

What you would *really* want to do in Ada is have a multistage Ada202X compiler that (at stage N) generates some Convention C code, then executes it to learn about the characteristics of the target C compiler, then generate the appropriate subtypes in corresponding stage-N+1 Ada source code, then compile that N+1 stage, where the app-domain Ada source-code is (i.e., where today's era of Ada source-code is).

Short of that state-of-the-art reflection (in e.g. multi-stage OCaml) to reflect the execution environment back into source code, the best that you can do is:
0) forego extreme portability entirely:  choose to support 8-bit bytes only with no Plan B and no apologies;
1) use a preprocessor (e.g., gnatprep; m4) to choose different Ada subtypes via conditional compilation;
2) use a child package for each different bit-size of byte:  one child package for 8-bit bytes, one child package for FPGAs' 9-bit bytes; and one child package for DSPs' 32-bit bytes;
3) use variant records or generics to force app-domain programmers to specify 8 or 9 or 32 in app-domain code
4) by far the worst of all:  evolve Ada202X to duplicate C++'s metatemplate-programming poor-man's functional-programming language to perform arcane contortions to achieve a limited amount of reflection with extremely-verbose syntax

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-28 19:09 What is a byte? Victor Porton
  2014-07-28 19:48 ` Dan'l Miller
@ 2014-07-28 21:15 ` Simon Wright
  2014-07-29 10:53 ` Jeffrey Carter
  2 siblings, 0 replies; 13+ messages in thread
From: Simon Wright @ 2014-07-28 21:15 UTC (permalink / raw)


Victor Porton <porton@narod.ru> writes:

> When I need to pass a byte to a C function, which Ada type should I
> use?

I'd have thought you'd use signed_char or unsigned_char, from
Interfaces.C, depending on what the called function expects.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-28 19:48 ` Dan'l Miller
  2014-07-28 20:05   ` Dan'l Miller
@ 2014-07-28 22:38   ` Randy Brukardt
  1 sibling, 0 replies; 13+ messages in thread
From: Randy Brukardt @ 2014-07-28 22:38 UTC (permalink / raw)


"Dan'l Miller" <optikos@verizon.net> wrote in message 
news:90305214-9a02-4684-8521-847e8ba38f79@googlegroups.com...
>of notable historical interest:
>a) The MIT/GE/Honeywell Multics machines had 9-bit bytes.
>b) The Control Data mainframes had 6-bit bytes.
>c) Prime 50...

You missed the Unisys (nee Univac, Sperry, etc.) mainframes, which I've 
worked on at various points in my career. (The very first version of what 
became Janus/Ada ran on a Unisys mainframe at the University of Wisconsin. 
Later, we created a version of Janus/Ada 95 for those mainframes as a 
project with Unisys.)

They originally had 6-bit bytes; later versions also had 9-bit bytes (so 
ASCII characters would fit) -- that's the versions that we built compilers 
for. The C-compilers assumed 9-bit byte addressing, but the machine only 
addressed words directly -- "C pointers" had a byte offset to go with the 
machine address, while "Ada pointers" were just raw machine addresses (to 
36-bit words). The latter was more efficient, of course.

Interestingly, the possibility of access types with different 
representations had never been adequately supported by the Ada standard; 
conversions between different access types designating the same type were 
always assumed to work (something that was never true in any Janus/Ada 95 
compiler; 80x86 versions supposed both segmented and offset alone access 
types). That was just fixed at the last ARG meeting (AI12-0074-1).

                                        Randy.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-28 19:09 What is a byte? Victor Porton
  2014-07-28 19:48 ` Dan'l Miller
  2014-07-28 21:15 ` Simon Wright
@ 2014-07-29 10:53 ` Jeffrey Carter
  2014-07-29 12:26   ` Dan'l Miller
  2014-08-02 21:01   ` Keith Thompson
  2 siblings, 2 replies; 13+ messages in thread
From: Jeffrey Carter @ 2014-07-29 10:53 UTC (permalink / raw)


On 07/28/2014 12:09 PM, Victor Porton wrote:
> When I need to pass a byte to a C function, which Ada type should I use?

"Byte" isn't a C concept. Generally what others call a byte is called "char" in 
C. So you should use whatever type in Interfaces.C(.*) corresponds to the C type 
used by the C function.

-- 
Jeff Carter
"My dear Mrs. Hemoglobin, when I first saw you, I
was so enamored with your beauty I ran to the basket,
jumped in, went down to the city, and bought myself a
wedding outfit."
Never Give a Sucker an Even Break
111


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-29 10:53 ` Jeffrey Carter
@ 2014-07-29 12:26   ` Dan'l Miller
  2014-07-29 18:40     ` Simon Wright
  2014-08-02 21:01   ` Keith Thompson
  1 sibling, 1 reply; 13+ messages in thread
From: Dan'l Miller @ 2014-07-29 12:26 UTC (permalink / raw)

On Tuesday, July 29, 2014 5:53:54 AM UTC-5, Jeffrey Carter wrote:
> On 07/28/2014 12:09 PM, Victor Porton wrote:
> > When I need to pass a byte to a C function, which Ada type should I use?
>
> "Byte" isn't a C concept. Generally what others call a byte is called "char" in 
> C. So you should use whatever type in Interfaces.C(.*) corresponds to the C type 
> used by the C function.

Jeff, you are factually incorrect there (which is why ITU uses the term octet for what you call a byte).  Both C & C++ overload the term byte to mean something quite altered from the historical/customary definition, which is how they end up with 16-bit bytes on Prime 50 Series and 32-bit bytes on many modern DSPs.  The main difference is:  "It shall be possible to express the address of each individual byte of an object uniquely."  In effect, this portion of the definition (which is dominant over "basic character set" in this era of potentially UTF-16, UCS2, UTF-32, and UCS4 Unicode being potentially taken as the base character set) forces a byte to be the quantity of bits traversed by incrementing a void* by one (which is how bytes become 32-bit on many DSPs regardless of whether that DSP uses, say, an 8-bit character set; in such a DSP, the length of ASCII or UTF-8 strings are 32-bit aligned with 0, 8, 16, or 24 bits of padding---typically 1, 2 or 3 ASCII nulls, which might be 1 or 2 more than needed for C's idiom of null-terminated strings when packing an 8-bit-character string into the DSP's 32-bit bytes, such as when interfacing the DSP to the outside world).

Victor, if  you transliterate your specification to say "octet" wherever it says "byte" (following ITU's convention to end this perennial/chronic silly debate over how big a byte is in C), then the answer to your question becomes quite clear:  use integer of range {0, ..., 255} or {-128, ..., 128} and mask off the upper powers of 2 if any, because unsigned char will always be at least 8 bits on modern processors, which means that on some processors byte might be 16- or 32-bit, but those extra powers of 2 to the left are simply ignored, just as they are in the underlying protocol.  (No specification of any modern interoperable protocol uses the 2^8 and higher bits in a byte if they exist on some arcade hardware.)  Likewise for signed char, respecting twos-complement's bias of sign-extension versus sign-magnitude's sign bit.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-29 12:26   ` Dan'l Miller
@ 2014-07-29 18:40     ` Simon Wright
  2014-07-29 21:15       ` Dan'l Miller
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Simon Wright @ 2014-07-29 18:40 UTC (permalink / raw)


"Dan'l Miller" <optikos@verizon.net> writes:

> Victor, if you transliterate your specification to say "octet"
> wherever it says "byte" (following ITU's convention to end this
> perennial/chronic silly debate over how big a byte is in C), then the
> answer to your question becomes quite clear: use integer of range {0,
> ..., 255} or {-128, ..., 128} and mask off the upper powers of 2 if
> any, because unsigned char will always be at least 8 bits on modern
> processors, which means that on some processors byte might be 16- or
> 32-bit, but those extra powers of 2 to the left are simply ignored,
> just as they are in the underlying protocol.  (No specification of any
> modern interoperable protocol uses the 2^8 and higher bits in a byte
> if they exist on some arcade hardware.)  Likewise for signed char,
> respecting twos-complement's bias of sign-extension versus
> sign-magnitude's sign bit.

I don't think this is right (not in terms of how the ABI expects a char
to be passed to a function, but in terms of what Victor ought to do).

If the C function takes a char, you should specify
Interfaces.C.char. It's up to the compiler to decide whether that gets
passed in 8 bits, 32 bits, or come to that 9/36 bits on hardware that no
one is remotely likely to encounter (even in an arcade).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-29 18:40     ` Simon Wright
@ 2014-07-29 21:15       ` Dan'l Miller
  2014-07-29 23:08         ` Simon Clubley
  2014-07-30  4:11       ` Dan'l Miller
  2014-07-30  7:47       ` Simon Wright
  2 siblings, 1 reply; 13+ messages in thread
From: Dan'l Miller @ 2014-07-29 21:15 UTC (permalink / raw)


On Tuesday, July 29, 2014 1:40:56 PM UTC-5, Simon Wright wrote:
> "Dan'l Miller" writes:
> > ...  (No specification of any
> > modern interoperable protocol uses the 2^8 and higher bits in a byte
> > if they exist on some arcade hardware.)
> 
> I don't think this is right (not in terms of how the ABI expects a char
> to be passed to a function, but in terms of what Victor ought to do).
> 
> If the C function takes a char, you should specify
> Interfaces.C.char. It's up to the compiler to decide whether that gets
> passed in 8 bits, 32 bits, or come to that 9/36 bits on hardware that no
> one is remotely likely to encounter (even in an arcade).

That dang spell check has become my worst enema:  arcane, not arcade.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-29 21:15       ` Dan'l Miller
@ 2014-07-29 23:08         ` Simon Clubley
  0 siblings, 0 replies; 13+ messages in thread
From: Simon Clubley @ 2014-07-29 23:08 UTC (permalink / raw)


On 2014-07-29, Dan'l Miller <optikos@verizon.net> wrote:
> On Tuesday, July 29, 2014 1:40:56 PM UTC-5, Simon Wright wrote:
>> "Dan'l Miller" writes:
>> > ...  (No specification of any
>> > modern interoperable protocol uses the 2^8 and higher bits in a byte
>> > if they exist on some arcade hardware.)
>> 
>> I don't think this is right (not in terms of how the ABI expects a char
>> to be passed to a function, but in terms of what Victor ought to do).
>> 
>> If the C function takes a char, you should specify
>> Interfaces.C.char. It's up to the compiler to decide whether that gets
>> passed in 8 bits, 32 bits, or come to that 9/36 bits on hardware that no
>> one is remotely likely to encounter (even in an arcade).
>
> That dang spell check has become my worst enema:  arcane, not arcade.

FYI, someone (a long time ago) actually ported gcc to the PDP-10. :-)

See: http://pdp10.nocrew.org/gcc/

(No Ada port however.)

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-29 18:40     ` Simon Wright
  2014-07-29 21:15       ` Dan'l Miller
@ 2014-07-30  4:11       ` Dan'l Miller
  2014-07-30  7:47       ` Simon Wright
  2 siblings, 0 replies; 13+ messages in thread
From: Dan'l Miller @ 2014-07-30  4:11 UTC (permalink / raw)

On Tuesday, July 29, 2014 1:40:56 PM UTC-5, Simon Wright wrote:
> "Dan'l Miller" writes:
> If the C function takes a char, you should specify
> Interfaces.C.char.

  As far as the actual invocation of C-library function by Victor's Ada code, I agree.  But that is only a portion of Victor's situation:
1) A C library may pass char back to Victor's Ada code via return values, out-bound char* parameters, and as in-bound parameters on a call-back function.
2) Victor's Ada code needs to store that char somehow.  He could have a thin binding and store these in-coming (typically-8-bit) values always as Interfaces.C.char or aliases thereof.  Or he could narrow to Ada ranged subtypes if handling the beyond-8-bit cases throughout his code becomes unwieldy.  It is this simplification of his Ada code to the octet meaning to which I was referring.

> It's up to the compiler to decide whether that gets
> passed in 8 bits, 32 bits, or come to that 9/36 bits on hardware that no
> one is remotely likely to encounter (even in an arcade).

Focusing solely on out-going values to char, I agree.  But it is up to Victor as a human-being to decide how much pain to incur at Ada-code design-time to handle the beyond-8-bit varieties of byte on arcane ISAs for in-coming chars.  Most programmers have little patience to pursue oddball definitions of byte for the non-octet cases.  Lack of patience typically implies a greater quantity of bugs for nonoctet bytes.  It was this latter effort-reduction to which I was referring.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-29 18:40     ` Simon Wright
  2014-07-29 21:15       ` Dan'l Miller
  2014-07-30  4:11       ` Dan'l Miller
@ 2014-07-30  7:47       ` Simon Wright
  2 siblings, 0 replies; 13+ messages in thread
From: Simon Wright @ 2014-07-30  7:47 UTC (permalink / raw)


Simon Wright <simon@pushface.org> writes:

> If the C function takes a char, you should specify Interfaces.C.char.

Perhaps this should be [un]signed_char, if the numeric values are
important.

The ARM says

   type char is <implementation-defined character type>;

and GNAT has

   type char is new Character;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: What is a byte?
  2014-07-29 10:53 ` Jeffrey Carter
  2014-07-29 12:26   ` Dan'l Miller
@ 2014-08-02 21:01   ` Keith Thompson
  1 sibling, 0 replies; 13+ messages in thread
From: Keith Thompson @ 2014-08-02 21:01 UTC (permalink / raw)

Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:
> On 07/28/2014 12:09 PM, Victor Porton wrote:
>> When I need to pass a byte to a C function, which Ada type should I use?
>
> "Byte" isn't a C concept. Generally what others call a byte is called
> "char" in C. So you should use whatever type in Interfaces.C(.*)
> corresponds to the C type used by the C function.

Yes, it is.  C has no type called "byte" (unless you define it
yourself), but the C standard defines a "byte" as an "addressable unit
of data storage large enough to hold any member of the basic character
set of the execution environment".

Other wording in the C standard says that the types char, unsigned char,
and signed char are exactly 1 byte in size.  A byte is exactly CHAR_BIT
bits, where CHAR_BIT is a macro defined in <limits.h>, and is required
to be at least 8.  (POSIX further requires CHAR_BIT==8, but of course
not all implementations conform to POSIX.)

Interfaces.C.CHAR_BIT should have the same value as C's CHAR_BIT,
and Interfaces.C.unsigned_char is probably the best type to use to
hold a single byte (in the sense that C uses the term).

Looking at C.Interfaces in the RM, it's odd that it defines CHAR_BIT,
SCHAR_MIN, SCHAR_MAX, and UCHAR_MAX (all defined in C's <limits.h>),
but doesn't define any of the other constants from that header.  For
example, it has

    type signed_char is range SCHAR_MIN .. SCHAR_MAX;

but

    type int is *mplementation-defined*;

rather than

    INT_MIN : constant := *implementation-defined*;
    INT_MAX : constant := *implementation-defined*;
    type int is range INT_MIN .. INT_MAX;

and so on.  (Of course you can just use C.Interfaces.int'First rather
than INT_MAX, but a bit more consistency would be good.)

Also, in C char, signed char, and unsigned char are all distinct types
(with char having the same range and representation as one of the
others), but C.Interfaces.plain_char is an implementation-defined
subtype.

It's also odd that Interfaces.C doesn't mention long long (an integer
type at least 64 bits wide, added to the C standard in 1999), even
though the RM refers to the 1999 ISO C standard as a normative
reference.  (I'm not sure I'm looking at the most current Ada standard;
a newer C standard was published in 2011.)

The most current draft of the 1999 ISO C standard, with the three
Technical Corrigenda merged into it, is:
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

The most current public draft of the 2011 ISO C standard (from shortly
before the publication of the final standard) is:
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

THe actual C standard is available from, for example, ansi.org, but it
costs money.  The free drafts are nearly as good (or, in the case of
n1256, probably better).

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-08-02 21:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-28 19:09 What is a byte? Victor Porton
2014-07-28 19:48 ` Dan'l Miller
2014-07-28 20:05   ` Dan'l Miller
2014-07-28 22:38   ` Randy Brukardt
2014-07-28 21:15 ` Simon Wright
2014-07-29 10:53 ` Jeffrey Carter
2014-07-29 12:26   ` Dan'l Miller
2014-07-29 18:40     ` Simon Wright
2014-07-29 21:15       ` Dan'l Miller
2014-07-29 23:08         ` Simon Clubley
2014-07-30  4:11       ` Dan'l Miller
2014-07-30  7:47       ` Simon Wright
2014-08-02 21:01   ` Keith Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox