* Wide_String, Chinese & Japanese text files @ 1999-08-20 0:00 Thierry Lelegard 1999-08-21 0:00 ` Robert Dewar 0 siblings, 1 reply; 6+ messages in thread From: Thierry Lelegard @ 1999-08-20 0:00 UTC (permalink / raw) Hello, We are going to need to process, in Ada 95, text files containing Chinese and Japanese messages (for i18n purpose). I have absolutely no experience in handling this kind of files. I do not even know the usual format of that kind of text files (8 or 16 bits/char). Due to the amount of possible combinations, I assume that at least some these languages require 16 bits per character. So, before appointing people to write the messages, I have one request and one question. 1) Could anyone e-mail me a text file containing a typical example of 16 bits characters Chinese or Japanese text file, preferably from both UNIX Windows worlds if there are some incompatibilities such as the traditional LF vs CR/LF ? 2) How could I handle this in Ada? I naively though that Ada.Wide_Text_IO would read 16 bits per character. However (at least with gnat), it writes and read 8 bits characters with "bracket coding" (as in Wide_String literals). Of course, Sequential_IO on Wide_Character or some kind of Stream_IO could do the trick but I wonder if there some "standard" or at least "usual" way to deal with this. I must precise that we do not need to handle 16 bits Ada source files, simply text files containing messages. We already have the combined String/Wide_String support in our applications, we simply need to choose the best way to get the data from a file to a Wide_String. Then, the Wide_Strings will be sent in SNMP messages. Thank you all in advance. -Thierry ________________________________________________________ Thierry Lelegard, Paris, France E-mail: lelegard@club-internet.fr ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wide_String, Chinese & Japanese text files 1999-08-20 0:00 Wide_String, Chinese & Japanese text files Thierry Lelegard @ 1999-08-21 0:00 ` Robert Dewar 1999-08-21 0:00 ` Thierry Lelegard 0 siblings, 1 reply; 6+ messages in thread From: Robert Dewar @ 1999-08-21 0:00 UTC (permalink / raw) In article <7pka2j$lnn$1@front2.grolier.fr>, "Thierry Lelegard" <lelegard@club-internet.fr> wrote: > We are going to need to process, in Ada 95, text files > containing Chinese and Japanese messages (for i18n purpose). > I have absolutely no experience in handling this kind of files. > I do not even know the usual format of that kind of text > files (8 or 16 bits/char). There is no "usual format", there are many possible formats. GNAT supports: Upper half coding (sometimes used for chinese, never as far as I know for japanese) Shift JIS coding (a common Japanese convention) EUC coding (another common Japanese convention) UTF-8 coding (an ISO standard, never seen it used in practice, but probably is, and will be more over time?) Brackets coding (a portable ASCII coding, primarily useful for standard texts, e.g. the ACVC tests, not used for real data information interchange). ESC coding (another very simple ASCII portable coding, using an ESC character instead of brackets, again, not used for real data information interchange. > Due to the amount of possible combinations, I assume > that at least some these languages require 16 bits per > character. You really need to familiarize yourself with the relevant ISO standard, and with Unicode, yes, of course 16-bits are required (in fact for full Chinese support, 32-bits are required, Unicode supports only a subset of Chinese). > So, before appointing people to write the messages, I > have one request and one question. > > 1) Could anyone e-mail me a text file containing a > typical example of 16 bits characters Chinese or > Japanese text file, preferably from both UNIX > Windows worlds if there are some incompatibilities > such as the traditional LF vs CR/LF ? No such thing, you really must find out the source encoding, since there are multiple possibilities. > 2) How could I handle this in Ada? I naively though > that Ada.Wide_Text_IO would read 16 bits per character. > However (at least with gnat), it writes and read 8 bits > characters with "bracket coding" (as in Wide_String > literals). Of course, Sequential_IO on Wide_Character > or some kind of Stream_IO could do the trick but I > wonder if there some "standard" or at least "usual" > way to deal with this. You really need to know much more than you do to be successful here. I strongly suggest you contact your vendor for assistance. In the case of GNAT, especially for a Japanese environment we can give a lot of help to GNAT Professional users, since our Japanese Distributor (Jun Shimura) has extensive technical experience in this area (in fact we worked closely with him to ensure that our implementations of Shift-JIS and EUC were correct). If your compiler supports only the brackets encoding standard, it is probably useless for your purposes, and you should look around for a compiler that supports the format in which your messages will be processed. > I must precise that we do not need to handle 16 bits > Ada source files, simply text files containing messages. > We already have the combined String/Wide_String support > in our applications, we simply need to choose the best > way to get the data from a file to a Wide_String. Then, > the Wide_Strings will be sent in SNMP messages. Robert Dewar Ada Core Technologies Sent via Deja.com http://www.deja.com/ Share what you know. Learn what you don't. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wide_String, Chinese & Japanese text files 1999-08-21 0:00 ` Robert Dewar @ 1999-08-21 0:00 ` Thierry Lelegard 1999-08-22 0:00 ` Florian Weimer 1999-08-22 0:00 ` Robert Dewar 0 siblings, 2 replies; 6+ messages in thread From: Thierry Lelegard @ 1999-08-21 0:00 UTC (permalink / raw) Hello Mr Dewar, > You really need to familiarize yourself with the relevant > ISO standard, and with Unicode, yes, of course 16-bits > are required (in fact for full Chinese support, 32-bits > are required, Unicode supports only a subset of Chinese). Yes, I know I need to familiarize myself with this, this is precisely why I posted this note: in order to get some information or pointers to this information. Does anyone have some pointers to these standards and to some simple free text utilities which can create a few sample text files with a US or European keyboard on UNIX (for test purpose, not production of course). > In the case of GNAT, especially for a Japanese environment we > can give a lot of help to GNAT Professional users, since our Concerning the GNAT library, I will continue this topic through the commercial channel with ACT. -Thierry ________________________________________________________ Thierry Lelegard, Paris, France E-mail: lelegard@club-internet.fr ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wide_String, Chinese & Japanese text files 1999-08-21 0:00 ` Thierry Lelegard @ 1999-08-22 0:00 ` Florian Weimer 1999-08-25 0:00 ` Georg Bauhaus 1999-08-22 0:00 ` Robert Dewar 1 sibling, 1 reply; 6+ messages in thread From: Florian Weimer @ 1999-08-22 0:00 UTC (permalink / raw) "Thierry Lelegard" <lelegard@club-internet.fr> writes: > Does anyone have some pointers to these standards and to some > simple free text utilities which can create a few sample > text files with a US or European keyboard on UNIX (for > test purpose, not production of course). Emacs 20.4 plus the intlfonts-1.1 package. It is completely free (well, GPL), and you can (at least in theory) edit quite a few of those strange languages with it. Japanese, Chinese (both simplified and traditional), Hindi, and even French or German, for example. Major drawbacks: no Unicode, no right-to-left writing. (If you intend to use XEmacs instead: I'd recommend against it. MULE support in recent versions seems to be a bit limited due to obvious lack of testing.) Another possibility is yudit (GPL, too -- sorry, don't know where I got it from, but I can look it up if you are interested). It does support Unicode (and several encodings of it) and quite a few languages as well. Major drawbacks: it works best with Bitsream's CyberBit TrueType Unicode font, which once was freely available from Bitstream, but this offer doesn't seem to exist anymore, and there's no visual feedback during the composition of characters (which is especially helpful to beginners). In addition, the choice of input methods seems to be rather limited in comparision to Emacs (the X input method extension might cure that, but I didn't test it at all). Of course, I can't confirm that one of these tools is suitable for production use. (I'm already glad if someone understands my clumsy English. ;) In fact, I doubt it. Nevertheless, you should be able to create suitable sample text files using both programs togher. (And you can always hope for spam from Asia -- I'm getting a lot of it these days. :-/) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wide_String, Chinese & Japanese text files 1999-08-22 0:00 ` Florian Weimer @ 1999-08-25 0:00 ` Georg Bauhaus 0 siblings, 0 replies; 6+ messages in thread From: Georg Bauhaus @ 1999-08-25 0:00 UTC (permalink / raw) Florian Weimer (fw@s.netic.de) wrote: : "Thierry Lelegard" <lelegard@club-internet.fr> writes: : > Does anyone have some pointers to these standards and to some : > simple free text utilities which can create a few sample : > text files with a US or European keyboard on UNIX (for : > test purpose, not production of course). Rob Pike's Editor sam writes UTF8-Textfiles; there is also a Windows version that comes with a sample document showing three or four languages from around the world. (look for sam.exe) the utility tcs transforms from one encoding to another. ftp://plan9.att.com/plan9/unixsrc/tcs.shar.Z At least on the debian GNU sites, you can find the UNIX versions. -# Georg ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wide_String, Chinese & Japanese text files 1999-08-21 0:00 ` Thierry Lelegard 1999-08-22 0:00 ` Florian Weimer @ 1999-08-22 0:00 ` Robert Dewar 1 sibling, 0 replies; 6+ messages in thread From: Robert Dewar @ 1999-08-22 0:00 UTC (permalink / raw) In article <7pmitf$r71$1@front3.grolier.fr>, "Thierry Lelegard" <lelegard@club-internet.fr> wrote: > Does anyone have some pointers to these standards and to some > simple free text utilities which can create a few sample > text files with a US or European keyboard on UNIX (for > test purpose, not production of course). One thing to realize here is that, unlike the typical situation with 8-bit codes, there are two separate things to worry about: 1. The encoding of each character into its 16-bit value 2. The manner in which 16-bit values are encoded, typically into a stream of 8-bit bytes. Ada has a lot to say about 1, but nothing at all to say about 2. In particular you cannot look at an encoding standard that gives the 16-bit codes and then ask for a sample text file, because a sample text file is about 2. rather than 1. Sent via Deja.com http://www.deja.com/ Share what you know. Learn what you don't. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~1999-08-25 0:00 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 1999-08-20 0:00 Wide_String, Chinese & Japanese text files Thierry Lelegard 1999-08-21 0:00 ` Robert Dewar 1999-08-21 0:00 ` Thierry Lelegard 1999-08-22 0:00 ` Florian Weimer 1999-08-25 0:00 ` Georg Bauhaus 1999-08-22 0:00 ` Robert Dewar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox