comp.lang.ada
 help / color / mirror / Atom feed
From: Robert Dewar <dewar@gnat.com>
Subject: Re: Wide_String, Chinese & Japanese text files
Date: 1999/08/21
Date: 1999-08-21T00:00:00+00:00	[thread overview]
Message-ID: <7pmcir$l62$1@nnrp1.deja.com> (raw)
In-Reply-To: 7pka2j$lnn$1@front2.grolier.fr

In article <7pka2j$lnn$1@front2.grolier.fr>,
  "Thierry Lelegard" <lelegard@club-internet.fr> wrote:
> We are going to need to process, in Ada 95, text files
> containing Chinese and Japanese messages (for i18n purpose).
> I have absolutely no experience in handling this kind of
files.
> I do not even know the usual format of that kind of text
> files (8 or 16 bits/char).

There is no "usual format", there are many possible formats.
GNAT supports:

    Upper half coding (sometimes used for chinese, never as
    far as I know for japanese)

    Shift JIS coding (a common Japanese convention)

    EUC coding (another common Japanese convention)

    UTF-8 coding (an ISO standard, never seen it used in
    practice, but probably is, and will be more over time?)

    Brackets coding (a portable ASCII coding, primarily useful
    for standard texts, e.g. the ACVC tests, not used for real
    data information interchange).

    ESC coding (another very simple ASCII portable coding, using
    an ESC character instead of brackets, again, not used for
    real data information interchange.

> Due to the amount of possible combinations, I assume
> that at least some these languages require 16 bits per
> character.

You really need to familiarize yourself with the relevant
ISO standard, and with Unicode, yes, of course 16-bits
are required (in fact for full Chinese support, 32-bits
are required, Unicode supports only a subset of Chinese).

> So, before appointing people to write the messages, I
> have one request and one question.
>
> 1) Could anyone e-mail me a text file containing a
> typical example of 16 bits characters Chinese or
> Japanese text file, preferably from both UNIX
> Windows worlds if there are some incompatibilities
> such as the traditional LF vs CR/LF ?

No such thing, you really must find out the source encoding,
since there are multiple possibilities.

> 2) How could I handle this in Ada? I naively though
> that Ada.Wide_Text_IO would read 16 bits per character.
> However (at least with gnat), it writes and read 8 bits
> characters with "bracket coding" (as in Wide_String
> literals). Of course, Sequential_IO on Wide_Character
> or some kind of Stream_IO could do the trick but I
> wonder if there some "standard" or at least "usual"
> way to deal with this.

You really need to know much more than you do to be successful
here. I strongly suggest you contact your vendor for assistance.
In the case of GNAT, especially for a Japanese environment we
can give a lot of help to GNAT Professional users, since our
Japanese Distributor (Jun Shimura) has extensive technical
experience in this area (in fact we worked closely with him
to ensure that our implementations of Shift-JIS and EUC were
correct).

If your compiler supports only the brackets encoding standard,
it is probably useless for your purposes, and you should look
around for a compiler that supports the format in which your
messages will be processed.

> I must precise that we do not need to handle 16 bits
> Ada source files, simply text files containing messages.
> We already have the combined String/Wide_String support
> in our applications, we simply need to choose the best
> way to get the data from a file to a Wide_String. Then,
> the Wide_Strings will be sent in SNMP messages.

Robert Dewar
Ada Core Technologies


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.




  reply	other threads:[~1999-08-21  0:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-08-20  0:00 Wide_String, Chinese & Japanese text files Thierry Lelegard
1999-08-21  0:00 ` Robert Dewar [this message]
1999-08-21  0:00   ` Thierry Lelegard
1999-08-22  0:00     ` Robert Dewar
1999-08-22  0:00     ` Florian Weimer
1999-08-25  0:00       ` Georg Bauhaus
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox