comp.lang.ada
 help / color / mirror / Atom feed
* Avatox 1.0: Trouble with encoding in Windows
@ 2006-09-11  8:24 Manuel Collado
  2006-09-11 10:35 ` Georg Bauhaus
  2006-09-12  9:52 ` Stephen Leake
  0 siblings, 2 replies; 50+ messages in thread
From: Manuel Collado @ 2006-09-11  8:24 UTC (permalink / raw)


The XML generated by Avatox 1.0 on my Windows XP machine declares:

<?xml version="1.0" encoding="UTF-8" ?>

But it contains text fragments taken from the Ada source code, in the 
native CP-1252 encoding, without any translation. The result is that for 
Ada source with non ASCII characteres (like accented letters) the generated 
XML is not well-formed, and rejected by all XML utilities.


1. The ASIS API should provide a way to know the character encoding of the 
source file (I think it doesn't).

2. The non-ASCII characters could be converted to XML character references 
(&#nnn;) by Avatox.

-- 
Manuel Collado



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-11  8:24 Avatox 1.0: Trouble with encoding in Windows Manuel Collado
@ 2006-09-11 10:35 ` Georg Bauhaus
  2006-09-11 13:49   ` Avatox 1.1: " Manuel Collado
  2006-09-13  0:01   ` Avatox 1.0: " Randy Brukardt
  2006-09-12  9:52 ` Stephen Leake
  1 sibling, 2 replies; 50+ messages in thread
From: Georg Bauhaus @ 2006-09-11 10:35 UTC (permalink / raw)


Manuel Collado wrote:

> 1. The ASIS API should provide a way to know the character encoding of
> the source file (I think it doesn't).

Yes! This will help a lot in avoiding character set issues.
And it might help prevent dodgy arguments like the ones presented
by implementers against the clever requirement to write the
identifier π in the Ada 2005 library. :-)


> 2. The non-ASCII characters could be converted to XML character
> references (&#nnn;) by Avatox.

This is beyond my comprehension, in particular when XML does
have standardized character set support. Numeric character
entities will force me to look at geographical names
like España (Spain) or Łódź (in Poland)
written  Espa&#xF1;a  and  &#x141;&#xF3;d&#x17A;
respectively.

Will any Ada programmer find it pleasing to even write them as
character strings in this equivalent way?

   Country: Wide_String := "Espa" & Wide_Character'Val(241) & "a";
   Town: Wide_String := (
      Wide_Character'Val(16#141#),
      Wide_Character'Val(16#F3#),
      'd',
      Wide_Character'Val(16#17A#));

You could then go on and recommend writing Ada in the following style,
just because some text editing tool that you remember might not properly
handle white space:

   Town: String := "New" & Character'Val(32) & "York";

<steam>
If programmers don't start accepting that there are more
characters than can be expressed in 7bin ASCII and start
making less buggy text tools (including Ada tools)
then anyone will continue to have difficulties in international
communications.

I hope that ASIS is a chance to get this done.
</>

-- Georg



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.1: Trouble with encoding in Windows
  2006-09-11 10:35 ` Georg Bauhaus
@ 2006-09-11 13:49   ` Manuel Collado
  2006-09-11 16:43     ` Georg Bauhaus
  2006-09-11 17:50     ` Björn Persson
  2006-09-13  0:01   ` Avatox 1.0: " Randy Brukardt
  1 sibling, 2 replies; 50+ messages in thread
From: Manuel Collado @ 2006-09-11 13:49 UTC (permalink / raw)


Georg Bauhaus escribió:
> Manuel Collado wrote:
> 
>> 1. The ASIS API should provide a way to know the character encoding of
>> the source file (I think it doesn't).
> 
> Yes! This will help a lot in avoiding character set issues.
> And it might help prevent dodgy arguments like the ones presented
> by implementers against the clever requirement to write the
> identifier π in the Ada 2005 library. :-)

Spanish identifiers like 'tamaño' (size) or 'año' (year) are currently 
accepted by GNAT.

> 
> 
>> 2. The non-ASCII characters could be converted to XML character
>> references (&#nnn;) by Avatox.
> 
> This is beyond my comprehension, in particular when XML does
> have standardized character set support. Numeric character
> entities will force me to look at geographical names
> like España (Spain) or Łódź (in Poland)
> written  Espa&#xF1;a  and  &#x141;&#xF3;d&#x17A;
> respectively.

XML markup is meant to be written and read mostly by tools, not by humans. 
So it doesn't matter if a text fragment is coded as 'España' or as 
'Espa&#xF1;a'. In fact, after parsing, an XML processing agent cannot know 
how it was coded.

My suggestion is that the Avatox encoding issue can be solved by simply 
writing non-ASCII characters as XML character references just when the 
final XML output is generated.

> 
> Will any Ada programmer find it pleasing to even write them as
> character strings in this equivalent way?
> 
>    Country: Wide_String := "Espa" & Wide_Character'Val(241) & "a";
>    Town: Wide_String := (
>       Wide_Character'Val(16#141#),
>       Wide_Character'Val(16#F3#),
>       'd',
>       Wide_Character'Val(16#17A#));
> 
> You could then go on and recommend writing Ada in the following style,
> just because some text editing tool that you remember might not properly
> handle white space:
> 
>    Town: String := "New" & Character'Val(32) & "York";
> 

This is outside of scope. I've not spoken about adequate character 
representation in Ada sources, just in XML documents.


> <steam>
> If programmers don't start accepting that there are more
> characters than can be expressed in 7bin ASCII and start
> making less buggy text tools (including Ada tools)
> then anyone will continue to have difficulties in international
> communications.
> 
> I hope that ASIS is a chance to get this done.
> </>

Amen to that.

> 
> -- Georg

Regards.
-- 
Manuel Collado



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.1: Trouble with encoding in Windows
  2006-09-11 13:49   ` Avatox 1.1: " Manuel Collado
@ 2006-09-11 16:43     ` Georg Bauhaus
  2006-09-11 17:50     ` Björn Persson
  1 sibling, 0 replies; 50+ messages in thread
From: Georg Bauhaus @ 2006-09-11 16:43 UTC (permalink / raw)


Manuel Collado wrote:

>> And it might help prevent dodgy arguments like the ones presented
>> by implementers against the clever requirement to write the
>> identifier π in the Ada 2005 library. :-)
> 
> Spanish identifiers like 'tamaño' (size) or 'año' (year) are currently
> accepted by GNAT.

Which makes the argument against π in the library even more bogus
in my book ;-)

> XML markup is meant to be written and read mostly by tools, not by
> humans. So it doesn't matter if a text fragment is coded as 'España' or
> as 'Espa&#xF1;a'. In fact, after parsing, an XML processing agent cannot
> know how it was coded.

Oh, there is nothing stopping an XML processor from keeping track of
input properties, even when the character representation is not an
issue after parsing.
Just like an ASIS tool could (should?) know the character encoding
of the Ada sources it has read.

> it doesn't matter if a text fragment is coded as 'España' or as
> 'Espa&#xF1;a'.

>>    Country: Wide_String := "Espa" & Wide_Character'Val(241) & "a";
...
>>    Town: String := "New" & Character'Val(32) & "York";
>>
> 
> This is outside of scope. I've not spoken about adequate character
> representation in Ada sources, just in XML documents.

Right, this was meant as an analogy: When I have to look at the
text, not process it, I'll be glad if identifiers and literals
are easy to read.

I think there is still a tradeoff between a 7bit external
represenation of ASIS in XML and its usability[1].
For example, when you look at ASIS streams in order to find out
why one of them is broken, XML processors can't do much, because,
their input is broken as a consequence.
Or when I am developing an XSL  transformation for
"refactoring" some of the identifiers in a program,
then I will have to look hard at "tama&#xF1;o" in order
to see that it just is "tamaño". That's not productive in my view.

 [1]  7bit might seem simple bitwise, but it isn't necessarily
easier to process because character entities must be handled, too.

-- Georg



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.1: Trouble with encoding in Windows
  2006-09-11 13:49   ` Avatox 1.1: " Manuel Collado
  2006-09-11 16:43     ` Georg Bauhaus
@ 2006-09-11 17:50     ` Björn Persson
  2006-09-12  0:06       ` Marc A. Criley
  1 sibling, 1 reply; 50+ messages in thread
From: Björn Persson @ 2006-09-11 17:50 UTC (permalink / raw)


Manuel Collado wrote:
> My suggestion is that the Avatox encoding issue can be solved by simply 
> writing non-ASCII characters as XML character references just when the 
> final XML output is generated.

It would be much easier to just put the right encoding in the XML 
declaration. In your case:
<?xml version="1.0" encoding="windows-1252" ?>

Then the rest of the file could be exactly as it is now, and Avatox 
wouldn't have to scan each and every identifier for non-ASCII characters.

-- 
Björn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.1: Trouble with encoding in Windows
  2006-09-11 17:50     ` Björn Persson
@ 2006-09-12  0:06       ` Marc A. Criley
  2006-09-12  8:35         ` Manuel Collado
  0 siblings, 1 reply; 50+ messages in thread
From: Marc A. Criley @ 2006-09-12  0:06 UTC (permalink / raw)


Björn Persson wrote:
> Manuel Collado wrote:
> 
>> My suggestion is that the Avatox encoding issue can be solved by 
>> simply writing non-ASCII characters as XML character references just 
>> when the final XML output is generated.
> 
> 
> It would be much easier to just put the right encoding in the XML 
> declaration. In your case:
> <?xml version="1.0" encoding="windows-1252" ?>
> 
> Then the rest of the file could be exactly as it is now, and Avatox 
> wouldn't have to scan each and every identifier for non-ASCII characters.

Alright!  Alright!  Y'all SHAMED me into it!!    :-)   :-)  :-)

I've been putting off dealing with character encoding for as long as I 
could, but it's time I finally got into it and did things right.

Gimme a little time and I'll get an encoding-friendly version of Avatox out 
as soon as I can--day job and other responsibilities permitting.

Thanks for the beating, I will now became a better person because of it :-)

(Manuel, if you'd be so kind as to send me a standalone compilable example 
of Ada code that uses an alternate character set that I could use as an 
Avatox test case, I would be most grateful.)


-- Marc A. Criley
-- McKae Technologies
-- www.mckae.com
-- DTraq - Avatox - XIA - XML EZ Out




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.1: Trouble with encoding in Windows
  2006-09-12  0:06       ` Marc A. Criley
@ 2006-09-12  8:35         ` Manuel Collado
  0 siblings, 0 replies; 50+ messages in thread
From: Manuel Collado @ 2006-09-12  8:35 UTC (permalink / raw)


Marc A. Criley escribió:
> ...
> (Manuel, if you'd be so kind as to send me a standalone compilable 
> example of Ada code that uses an alternate character set that I could 
> use as an Avatox test case, I would be most grateful.)

Here you are:

----------------------------------------------------------------
-- Detección de años bisiestos

-- NOTE: This program contains non-ASCII characters in comments,
-- string literals, and even identifiers. It compiles OK with
-- GNAT 3.15p and GNAT GPL 2005 on Windows XP.
-- (Encoding = WINDOWS-1252 or ISO-8859-1 or LATIN1)

with Ada.Text_Io; use Ada.Text_Io;
with Ada.Integer_Text_Io; use Ada.Integer_Text_Io;

procedure Bisiesto is
    Año: Integer;
begin
    loop
       Put_Line( "¿Año? (0 para terminar)" );
       Get( Año );
       exit when Año <= 0;
       Put( Año );
       if Año mod 4 /= 0 or else (Año mod 100 = 0 and then Año mod 400 /= 
0) then
          Put( " no" );
       end if;
       Put_Line( " es bisiesto" );
    end loop;
end Bisiesto;
----------------------------------------------------------------

Regards.
-- 
Manuel Collado



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-11  8:24 Avatox 1.0: Trouble with encoding in Windows Manuel Collado
  2006-09-11 10:35 ` Georg Bauhaus
@ 2006-09-12  9:52 ` Stephen Leake
  2006-09-19  1:16   ` Marc A. Criley
  1 sibling, 1 reply; 50+ messages in thread
From: Stephen Leake @ 2006-09-12  9:52 UTC (permalink / raw)


Manuel Collado <m.collado@fi.upm.es> writes:

> 1. The ASIS API should provide a way to know the character encoding of
> the source file (I think it doesn't).

Please send this suggestion to the ARG, at ada-comment@ada-auth.org;
they are currently revising the ASIS standard for Ada 2005.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-11 10:35 ` Georg Bauhaus
  2006-09-11 13:49   ` Avatox 1.1: " Manuel Collado
@ 2006-09-13  0:01   ` Randy Brukardt
  2006-09-13  9:01     ` Georg Bauhaus
                       ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: Randy Brukardt @ 2006-09-13  0:01 UTC (permalink / raw)


"Georg Bauhaus" <bauhaus@futureapps.de> wrote in message
news:45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net...
> Manuel Collado wrote:
>
> > 1. The ASIS API should provide a way to know the character encoding of
> > the source file (I think it doesn't).
>
> Yes! This will help a lot in avoiding character set issues.
> And it might help prevent dodgy arguments like the ones presented
> by implementers against the clever requirement to write the
> identifier ? in the Ada 2005 library. :-)

ASIS 99 currently returns identifiers in Wide_Strings. That is enough to
handle all possible Ada 95 programs. I suspect that the problem is in the
XML conversion tool not handling Wide_Characters properly and not with ASIS.
(Or just as likely, the XML processing tools not handling UTF-8 properly.)

I suspect that the new version of ASIS will provide an option to get
identifiers in Wide_Wide_Strings.

In any case, one of the big advantages of using ASIS over writing your own
parser is that the resulting program is independent of the character set
used. So it works with anything supported by your compiler vendor (and still
does if you change vendors). ASIS code that depends on the input source
representation (which is not defined by Ada anyway) is probably broken. And
there is no chance of any sort of agreement on source representations for
ASIS (or even the naming of them) if there isn't be any for Ada.

                           Randy.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13  0:01   ` Avatox 1.0: " Randy Brukardt
@ 2006-09-13  9:01     ` Georg Bauhaus
  2006-09-13 19:28       ` Björn Persson
  2006-09-13 10:32     ` Manuel Collado
  2006-09-13 11:04     ` vgodunko
  2 siblings, 1 reply; 50+ messages in thread
From: Georg Bauhaus @ 2006-09-13  9:01 UTC (permalink / raw)


Randy Brukardt wrote:

> And
> there is no chance of any sort of agreement on source representations for
> ASIS (or even the naming of them) if there isn't be any for Ada.


Maybe a standard configuration pragma can be devised that informs
Ada source processors of the encoding used in files/compilation
units/...?


-- Georg



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13  0:01   ` Avatox 1.0: " Randy Brukardt
  2006-09-13  9:01     ` Georg Bauhaus
@ 2006-09-13 10:32     ` Manuel Collado
  2006-09-13 18:28       ` Björn Persson
  2006-09-13 23:05       ` Randy Brukardt
  2006-09-13 11:04     ` vgodunko
  2 siblings, 2 replies; 50+ messages in thread
From: Manuel Collado @ 2006-09-13 10:32 UTC (permalink / raw)


Randy Brukardt escribi�:
> "Georg Bauhaus" <bauhaus@futureapps.de> wrote in message
> news:45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net...
> 
>>Manuel Collado wrote:
>>
>>>1. The ASIS API should provide a way to know the character encoding of
>>>the source file (I think it doesn't).
>>
>>Yes! This will help a lot in avoiding character set issues.
>>And it might help prevent dodgy arguments like the ones presented
>>by implementers against the clever requirement to write the
>>identifier ? in the Ada 2005 library. :-)
> 
> ASIS 99 currently returns identifiers in Wide_Strings. That is enough to
> handle all possible Ada 95 programs. I suspect that the problem is in the
> XML conversion tool not handling Wide_Characters properly and not with ASIS.
> (Or just as likely, the XML processing tools not handling UTF-8 properly.)
> 
> I suspect that the new version of ASIS will provide an option to get
> identifiers in Wide_Wide_Strings.

Sorry, the use of [Wide_]Wide_Strings doesn't imply anything about 
encoding. The Avatox problem appears just with characters with 
codepoints < 256. Example: the character with codepoint 0xC1 is

    0xC1	0x00C1	#	LATIN CAPITAL LETTER A WITH ACUTE

if encoded as ISO-8859-1 (western countries), but it its

    0xC1	0x0391	#	GREEK CAPITAL LETTER ALPHA

if encoded as ISO-8859-7 (Greece). The use of wide_chars just extends 
the codepoint range.

To solve the problem a translation is required from the original source 
file encoding to a specific standard encoding (Unicode?) for strings 
reported via the ASIS API. Or else, don't make a translation, and report 
also the original source code encoding. This way the ASIS application 
can interpret (or simply report) strings in a meaningful way.

> 
> In any case, one of the big advantages of using ASIS over writing your own
> parser is that the resulting program is independent of the character set
> used. So it works with anything supported by your compiler vendor (and still
> does if you change vendors). ASIS code that depends on the input source
> representation (which is not defined by Ada anyway) is probably broken. And
> there is no chance of any sort of agreement on source representations for
> ASIS (or even the naming of them) if there isn't be any for Ada.

I'm not sure to understand you. Some style checks depend on source code 
representation. Like non-uniform casing for identifiers (mixing alpha 
and Alpha in the same source).

Am I missing anything?

> 
>                            Randy.

Regards.
-- 
To reply by e-mail, please remove the extra dot
in the given address:  m.collado -> mcollado




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13  0:01   ` Avatox 1.0: " Randy Brukardt
  2006-09-13  9:01     ` Georg Bauhaus
  2006-09-13 10:32     ` Manuel Collado
@ 2006-09-13 11:04     ` vgodunko
  2006-09-14  8:56       ` Martin Krischik
  2 siblings, 1 reply; 50+ messages in thread
From: vgodunko @ 2006-09-13 11:04 UTC (permalink / raw)


Randy Brukardt wrote:
>
> ASIS 99 currently returns identifiers in Wide_Strings. That is enough to
> handle all possible Ada 95 programs. I suspect that the problem is in the
> XML conversion tool not handling Wide_Characters properly and not with ASIS.
> (Or just as likely, the XML processing tools not handling UTF-8 properly.)
>
This is known ASIS for GNAT problem. AdaCore's GNAT/ASIS don't support
source code recoding. It correctly work only for ASCII and UTF-8
encodings (this require compiling with -gnatW8 switch).




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13 10:32     ` Manuel Collado
@ 2006-09-13 18:28       ` Björn Persson
  2006-09-14  8:11         ` Manuel Collado
  2006-09-13 23:05       ` Randy Brukardt
  1 sibling, 1 reply; 50+ messages in thread
From: Björn Persson @ 2006-09-13 18:28 UTC (permalink / raw)


Manuel Collado wrote:
> Sorry, the use of [Wide_]Wide_Strings doesn't imply anything about 
> encoding.

ARM95 3.5.2(3) says:
"The predefined type Wide_Character is a character type whose values 
correspond to the 65536 code positions of the ISO 10646 Basic 
Multilingual Plane (BMP)."

This is essentially unchanged in the draft Ada 2005 standard. Paragraph 
3.5.2(3/2) says:
"The predefined type Wide_Character is a character type whose values 
correspond to the 65536 code positions of the ISO/IEC 10646:2003 Basic 
Multilingual Plane (BMP)."

And the next paragraph, 3.1/2, adds:
"The predefined type Wide_Wide_Character is a character type whose 
values correspond to the 2147483648 code positions of the ISO/IEC 
10646:2003 character set."

This means that a Wide_String is UCS-2LE on a little-endian machine and 
UCS-2BE on a big-endian machine, and a Wide_Wide_String is UCS-4LE or 
UCS-4BE.

> To solve the problem a translation is required from the original source 
> file encoding to a specific standard encoding (Unicode?)

Unicode is not a character encoding. Unicode defines several encodings. 
Anyway, the standard encoding you're asking for is Wide_String, 
according to Randy. (I won't be surprised if Gnat does the translation 
wrong though.)

-- 
Bj�rn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13  9:01     ` Georg Bauhaus
@ 2006-09-13 19:28       ` Björn Persson
  2006-09-14  6:34         ` Georg Bauhaus
                           ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Björn Persson @ 2006-09-13 19:28 UTC (permalink / raw)


Georg Bauhaus skrev:
> Randy Brukardt wrote:
> 
>> And
>> there is no chance of any sort of agreement on source representations for
>> ASIS (or even the naming of them) if there isn't be any for Ada.
> 
> Maybe a standard configuration pragma can be devised that informs
> Ada source processors of the encoding used in files/compilation
> units/...?

That would be great, seeing that filesystems in Unix and Windows (and 
probably most other OSes) typically can't keep track of what character 
encoding is used in each file.

Theoretically it's better to store the encoding outside the file so that 
you can know what encoding to use *before* you start reading the file. 
In practice this is usually impossible. The Unix approach is to have a 
system-wide locale setting that specifies a character encoding, and 
assume that all text files on the system use that encoding. This 
assumption collapses when you connect your system to the Internet and 
start exchanging data with others. Thus it's pretty much necessary to 
specify the encoding inside each file, like XML does.

Python has this feature. It allows an encoding declaration in a comment 
at the beginning of the file, and tries to be compatible with text editors:
http://docs.python.org/ref/encodings.html

-- 
Björn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13 10:32     ` Manuel Collado
  2006-09-13 18:28       ` Björn Persson
@ 2006-09-13 23:05       ` Randy Brukardt
  1 sibling, 0 replies; 50+ messages in thread
From: Randy Brukardt @ 2006-09-13 23:05 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2166 bytes --]

"Manuel Collado" <m.collado@lml.ls.fi.upm.es> wrote in message
news:4507de42@news.upm.es...
> Randy Brukardt escribi�:
...
> > In any case, one of the big advantages of using ASIS over writing your
own
> > parser is that the resulting program is independent of the character set
> > used. So it works with anything supported by your compiler vendor (and
still
> > does if you change vendors). ASIS code that depends on the input source
> > representation (which is not defined by Ada anyway) is probably broken.
And
> > there is no chance of any sort of agreement on source representations
for
> > ASIS (or even the naming of them) if there isn't be any for Ada.
>
> I'm not sure to understand you. Some style checks depend on source code
> representation. Like non-uniform casing for identifiers (mixing alpha
> and Alpha in the same source).
>
> Am I missing anything?

Apparently. The source of an Ada 2005 program is described in terms of
Unicode characters. (Ada 95 is similar). Similarly, Wide_String  and
Wide_Wide_String are defined in terms of Unicode. The actual source
representation is implementation-defined, but it is logically converted into
Unicode characters when it is processed. (Not all compilers actually do this
for efficiency reasons, but that's what the Standard says.)

So, an ASIS routine that returns an identifier in a Wide_String should be
returning it in a particular Unicode encoding. If it doesn't do that, it's
wrong.

Indeed, Ada 2005 defines identifier equivalence in terms of the Unicode
casing rules; if you are using a non-Unicode encoding, that will require
some translation somewhere.

Because, of this, an ASIS program to check style only needs to be written in
terms of Wide_String and/or Wide_Wide_String encoding -- you shouldn't see
anything else. (Another message here says that GNAT gets this wrong, which
doesn't surprise me at all given past ARG discussions on this topic.)
Encodings (other than that defined for Wide_String) have nothing to do with
it (unless you want to write a modified version of the program in a
different encoding - but I suggest just sticking to UTF-8).

                         Randy.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13 19:28       ` Björn Persson
@ 2006-09-14  6:34         ` Georg Bauhaus
  2006-09-14 23:09           ` Björn Persson
  2006-09-14 22:13         ` Björn Persson
  2006-09-16  7:40         ` Martin Krischik
  2 siblings, 1 reply; 50+ messages in thread
From: Georg Bauhaus @ 2006-09-14  6:34 UTC (permalink / raw)


On Wed, 2006-09-13 at 19:28 +0000, Björn Persson wrote:


> Theoretically it's better to store the encoding outside the file so that 
> you can know what encoding to use *before* you start reading the file. 
> In practice this is usually impossible. [...] Thus it's pretty much necessary to 
> specify the encoding inside each file, like XML does.

Yes, though Ada offers a standard notion for connecting a compilation
unit with pragmas. You needn't necessarily place them right next to
the source. So perhaps there is yet another option for
interchanging sources between compilers, operating environments,
etc. Somewhat like SGML, which I believe has ESIS for data interchange
(can be used with ASN.1 IIRC), we could adopt an extended
"SRC archive" format. It simply traces encodings, has some checksums,
carries PGP signatures, and so on.

Or use Strings that carry their encoding with them ;-)






^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13 18:28       ` Björn Persson
@ 2006-09-14  8:11         ` Manuel Collado
  0 siblings, 0 replies; 50+ messages in thread
From: Manuel Collado @ 2006-09-14  8:11 UTC (permalink / raw)


Bj�rn Persson escribi�:
> Manuel Collado wrote:
>> Sorry, the use of [Wide_]Wide_Strings doesn't imply anything about 
>> encoding.
> 
> ARM95 3.5.2(3) says:
> "The predefined type Wide_Character is a character type whose values 
> correspond to the 65536 code positions of the ISO 10646 Basic 
> Multilingual Plane (BMP)."
> 
> This is essentially unchanged in the draft Ada 2005 standard. Paragraph 
> 3.5.2(3/2) says:
> "The predefined type Wide_Character is a character type whose values 
> correspond to the 65536 code positions of the ISO/IEC 10646:2003 Basic 
> Multilingual Plane (BMP)."
> 
> And the next paragraph, 3.1/2, adds:
> "The predefined type Wide_Wide_Character is a character type whose 
> values correspond to the 2147483648 code positions of the ISO/IEC 
> 10646:2003 character set."
> 
> This means that a Wide_String is UCS-2LE on a little-endian machine and 
> UCS-2BE on a big-endian machine, and a Wide_Wide_String is UCS-4LE or 
> UCS-4BE.

Thanks for the pointer. I've certainly missed that.

-- 
Manuel Collado



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13 11:04     ` vgodunko
@ 2006-09-14  8:56       ` Martin Krischik
  2006-09-14 21:16         ` Jeffrey R. Carter
  2006-09-15  5:34         ` Simon Wright
  0 siblings, 2 replies; 50+ messages in thread
From: Martin Krischik @ 2006-09-14  8:56 UTC (permalink / raw)



vgodunko@rostel.ru schrieb:

> Randy Brukardt wrote:
> >
> > ASIS 99 currently returns identifiers in Wide_Strings. That is enough to
> > handle all possible Ada 95 programs. I suspect that the problem is in the
> > XML conversion tool not handling Wide_Characters properly and not with ASIS.
> > (Or just as likely, the XML processing tools not handling UTF-8 properly.)
> >
> This is known ASIS for GNAT problem. AdaCore's GNAT/ASIS don't support
> source code recoding. It correctly work only for ASCII and UTF-8
> encodings (this require compiling with -gnatW8 switch).

Well, that effectivly means we all need to compile with:

         "-gnatiw",                          -- Wide-character codes
allowed in identifiers
         "-gnatW8",                          -- UTF-8 encoding

in order to be fully ASIS compatible. Good to know that both VIM and
new GPS 4.0 support Utf-8.

Mattin




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-14  8:56       ` Martin Krischik
@ 2006-09-14 21:16         ` Jeffrey R. Carter
  2006-09-14 22:55           ` Björn Persson
                             ` (3 more replies)
  2006-09-15  5:34         ` Simon Wright
  1 sibling, 4 replies; 50+ messages in thread
From: Jeffrey R. Carter @ 2006-09-14 21:16 UTC (permalink / raw)


Martin Krischik wrote:
> 
> Well, that effectivly means we all need to compile with:
> 
>          "-gnatiw",                          -- Wide-character codes
> allowed in identifiers
>          "-gnatW8",                          -- UTF-8 encoding
> 
> in order to be fully ASIS compatible. Good to know that both VIM and
> new GPS 4.0 support Utf-8.

I'm sure no one will agree with me, but I don't see the value of 
allowing characters outside 'a' .. 'z' & 'A' .. 'Z' in Ada identifiers. 
Ada is designed to read like English, so in most cases identifiers 
should be in English.

if Something then

makes sense, but

if Kelko_Koza then

doesn't (and it doesn't even use characters not generally found in English).

Of course, this argues that each language should have its own set of 
reserved words. In many cases that could be achieved by a preliminary 
translation phase that converts the reserved words into Ada's English 
reserved words. For a truly international language, instead of reserved 
words, maybe we should have symbols:

? Kelko_Koza =>
    ...
?? Koza_2 =>
    ...
?=>
    ...
?<;

That smacks too much of C for my tastes.

-- 
Jeff Carter
"Nobody expects the Spanish Inquisition!"
Monty Python's Flying Circus
22



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13 19:28       ` Björn Persson
  2006-09-14  6:34         ` Georg Bauhaus
@ 2006-09-14 22:13         ` Björn Persson
  2006-09-16  7:40         ` Martin Krischik
  2 siblings, 0 replies; 50+ messages in thread
From: Björn Persson @ 2006-09-14 22:13 UTC (permalink / raw)


I wrote:
> Georg Bauhaus skrev:
>> Maybe a standard configuration pragma can be devised that informs
>> Ada source processors of the encoding used in files/compilation
>> units/...?
> 
> That would be great,

I may need to revise that. I just noticed the definition of 
"configuration pragma":
"Certain pragmas are defined to be /configuration pragmas/; they shall 
appear before the first compilation_unit of a compilation. [They are 
generally used to select a partition-wide or system-wide option.] The 
pragma applies to all compilation_units appearing in the compilation, 
unless there are none, in which case it applies to all future 
compilation_units compiled into the same environment." (ARM95 10.1.5(8))

I can't find a definition of "compilation", other than that it consists 
of any number of compilation units, but I guess that, in the case of 
Gnat, each invocation of Gnatmake is a single compilation. If so, then 
pragma Character_Encoding must not be a configuration pragma, because it 
would apply to any library units that Gnatmake decides to recompile, and 
those may be written by someone else in some other encoding.

We could let pragma Character_Encoding apply to a single file, a single 
compilation unit, an entire hierarchy of child packages or whatever, but 
we can't have it apply to everything that happens to be compiled at the 
same time. It must be tied to a specific piece of code.

-- 
Björn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-14 21:16         ` Jeffrey R. Carter
@ 2006-09-14 22:55           ` Björn Persson
  2006-09-15 23:15             ` Jeffrey R. Carter
  2006-09-16  7:38             ` Martin Krischik
  2006-09-15  5:47           ` Martin Krischik
                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 50+ messages in thread
From: Björn Persson @ 2006-09-14 22:55 UTC (permalink / raw)


Jeffrey R. Carter wrote:
> I'm sure no one will agree with me, but I don't see the value of 
> allowing characters outside 'a' .. 'z' & 'A' .. 'Z' in Ada identifiers. 
> Ada is designed to read like English, so in most cases identifiers 
> should be in English.

Mixing languages in prose disturbs me. I can get quite upset when people 
use English words in Swedish texts even though there are equivalent 
Swedish words. Yet I often program with Swedish identifiers. Program 
code � even Ada code � is so different from English that I don't read it 
as English text anyway.

(Actually, I think I didn't know any English at all when I first started 
learning programming. This may have something to do with it.)

> Of course, this argues that each language should have its own set of 
> reserved words. In many cases that could be achieved by a preliminary 
> translation phase that converts the reserved words into Ada's English 
> reserved words.

That'd impose an English-like grammar on other languages, which would 
often not work at all. There may well be languages where "if" isn't a 
separate word.

> For a truly international language, instead of reserved 
> words, maybe we should have symbols:
> 
> ? Kelko_Koza =>
>    ...
> ?? Koza_2 =>
>    ...
> ?=>
>    ...
> ?<;

That's still not international. A Greek question mark looks like a 
semicolon for starters.

Don't try. That way lies madness.

-- 
Bj�rn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-14  6:34         ` Georg Bauhaus
@ 2006-09-14 23:09           ` Björn Persson
  0 siblings, 0 replies; 50+ messages in thread
From: Björn Persson @ 2006-09-14 23:09 UTC (permalink / raw)


Georg Bauhaus wrote:
> Or use Strings that carry their encoding with them ;-)

Wow, that sounds like a great idea! ;-)

Seriously though, there's not much EAstrings can do if you don't know 
the character encoding of your source code. You have to input the right 
encoding, in one way or another, when you first construct an EAstring.

It *would* be possible to work around a defective compiler that 
consistently produces mis-encoded Strings, but you still have to know 
the true encoding (and you'd be in for a surprise if the compiler 
suddenly gets fixed).

-- 
Björn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-14  8:56       ` Martin Krischik
  2006-09-14 21:16         ` Jeffrey R. Carter
@ 2006-09-15  5:34         ` Simon Wright
  1 sibling, 0 replies; 50+ messages in thread
From: Simon Wright @ 2006-09-15  5:34 UTC (permalink / raw)


"Martin Krischik" <krischik@users.sourceforge.net> writes:

> vgodunko@rostel.ru schrieb:
>
>> Randy Brukardt wrote:
>> >
>> > ASIS 99 currently returns identifiers in Wide_Strings. That is enough to
>> > handle all possible Ada 95 programs. I suspect that the problem is in the
>> > XML conversion tool not handling Wide_Characters properly and not with ASIS.
>> > (Or just as likely, the XML processing tools not handling UTF-8 properly.)
>> >
>> This is known ASIS for GNAT problem. AdaCore's GNAT/ASIS don't support
>> source code recoding. It correctly work only for ASCII and UTF-8
>> encodings (this require compiling with -gnatW8 switch).
>
> Well, that effectivly means we all need to compile with:
>
>          "-gnatiw",                          -- Wide-character codes
> allowed in identifiers
>          "-gnatW8",                          -- UTF-8 encoding
>
> in order to be fully ASIS compatible. Good to know that both VIM and
> new GPS 4.0 support Utf-8.

Manuel's bisiesto example makes GNAT 4.0.0, GPL 2005 and GPL 2006 hang
if compiled with -gnatW8. On this Powerbook, anyway.

If compiled without, asis2xml fails with the exception
UNICODE.CES.INVALID_ENCODING but since I haven't actually thought
about unicode this will probably be my fault!



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-14 21:16         ` Jeffrey R. Carter
  2006-09-14 22:55           ` Björn Persson
@ 2006-09-15  5:47           ` Martin Krischik
  2006-09-15 23:16             ` Jeffrey R. Carter
  2006-09-15  9:41           ` Georg Bauhaus
  2006-09-15 18:11           ` Pascal Obry
  3 siblings, 1 reply; 50+ messages in thread
From: Martin Krischik @ 2006-09-15  5:47 UTC (permalink / raw)


Jeffrey R. Carter schrieb:

> Martin Krischik wrote:
> >
> > Well, that effectivly means we all need to compile with:
> >
> >          "-gnatiw",                          -- Wide-character codes
> > allowed in identifiers
> >          "-gnatW8",                          -- UTF-8 encoding
> >
> > in order to be fully ASIS compatible. Good to know that both VIM and
> > new GPS 4.0 support Utf-8.
>
> I'm sure no one will agree with me, but I don't see the value of
> allowing characters outside 'a' .. 'z' & 'A' .. 'Z' in Ada identifiers.
> Ada is designed to read like English, so in most cases identifiers
> should be in English.

I worked on several project were identifiers form system requironments
where indeed English but identifiers resutling from business
requironments where German. The  buisiness speaks German and we never
saw a reason to translate them.

> if Something then
>
> makes sense, but
>
> if Kelko_Koza then
>
> doesn't (and it doesn't even use characters not generally found in English).

> Of course, this argues that each language should have its own set of
> reserved words. In many cases that could be achieved by a preliminary
> translation phase that converts the reserved words into Ada's English
> reserved words. For a truly international language, instead of reserved
> words, maybe we should have symbols:

Well, language keywords are a system requironment.

Martin




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-14 21:16         ` Jeffrey R. Carter
  2006-09-14 22:55           ` Björn Persson
  2006-09-15  5:47           ` Martin Krischik
@ 2006-09-15  9:41           ` Georg Bauhaus
  2006-09-15 23:28             ` Jeffrey R. Carter
  2006-09-16  5:10             ` Simon Wright
  2006-09-15 18:11           ` Pascal Obry
  3 siblings, 2 replies; 50+ messages in thread
From: Georg Bauhaus @ 2006-09-15  9:41 UTC (permalink / raw)


Jeffrey R. Carter wrote:

> I'm sure no one will agree with me, but I don't see the value of 
> allowing characters outside 'a' .. 'z' & 'A' .. 'Z' in Ada identifiers. 

One value of characters outside the "crippled" range for English
is communication which relates to the problem domain, and specifically
adresses those who have to solve it.
The problem might have its own language, as Martin explains.
In particular, not every program is written by an international
team. Even if it is written in a common language, the common
language is only _like_ English, it is rather some computese. At
least I get this impression when I look at some programs and texts
including my own...

Mixing languages in every day talk has been very common for a long
time before English took over as the lingua franca of the Western
World. Martin Luther and friends used a mix of Latin and German
in speech but not when writing.



> Ada is designed to read like English, so in most cases identifiers 
> should be in English.

But words of grammar are as formal as the symbols in the grammar.
Other formal languages in the Ada camp even use artificial words
like "fi" and "od". I think the argument that Ada's grammar implies
English does not apply even though some of its reserved words
are English (like "then"). "Procedure" and "function" are not
specifically English, I'd say. If Ada is designed to read *like* English,
then we have to consider that the European languages are very much
*like* each other (communication barriers notwithstanding).
For example, "when" reads "wenn", "then" reads "dann" (or even "denn"),
and so on, in German. I'm sure people from other countries west of
the slavic borders can add similar comparisons.

So using your native language or problem domain language might add
value to the local mode of expression.

The word "resent" is an example of the effects of people trying
to write Enlish when they probably shouldn't. "Resent" is to be
understood as a passive form of the word "resend". This word doesn't
exist in my fairly recent edition of an Oxford dictionary.  But it has
been added to a popular online dictionary (dict.leo.org).
Nevertheless, I bet few people know that "resent" means something
very different when English isn't their native language.
(But it reads like English...)

I hope I didn't make too many language related mistakes in this
post.



-- Georg 



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-14 21:16         ` Jeffrey R. Carter
                             ` (2 preceding siblings ...)
  2006-09-15  9:41           ` Georg Bauhaus
@ 2006-09-15 18:11           ` Pascal Obry
  2006-09-15 18:53             ` Dmitry A. Kazakov
  2006-09-15 23:35             ` Jeffrey R. Carter
  3 siblings, 2 replies; 50+ messages in thread
From: Pascal Obry @ 2006-09-15 18:11 UTC (permalink / raw)
  To: Jeffrey R. Carter

Jeffrey R. Carter a écrit :

> I'm sure no one will agree with me, but I don't see the value of
> allowing characters outside 'a' .. 'z' & 'A' .. 'Z' in Ada identifiers.
> Ada is designed to read like English, so in most cases identifiers
> should be in English.

In most cases probably. But for example it is nice to be able to support
Unicode characters as the π (greek pi) in Ada.Numerics. I bet
mathematicians will love to be able to use all sort of greek letters in
their Ada prorams :)

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|              http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15 18:11           ` Pascal Obry
@ 2006-09-15 18:53             ` Dmitry A. Kazakov
  2006-09-15 22:29               ` Georg Bauhaus
  2006-09-15 23:35             ` Jeffrey R. Carter
  1 sibling, 1 reply; 50+ messages in thread
From: Dmitry A. Kazakov @ 2006-09-15 18:53 UTC (permalink / raw)


On Fri, 15 Sep 2006 20:11:00 +0200, Pascal Obry wrote:

> Jeffrey R. Carter a écrit :
> 
>> I'm sure no one will agree with me, but I don't see the value of
>> allowing characters outside 'a' .. 'z' & 'A' .. 'Z' in Ada identifiers.
>> Ada is designed to read like English, so in most cases identifiers
>> should be in English.
> 
> In most cases probably. But for example it is nice to be able to support
> Unicode characters as the π (greek pi) in Ada.Numerics. I bet
> mathematicians will love to be able to use all sort of greek letters in
> their Ada prorams :)

I doubt they would enjoy the idea that Ω/=Ω (code positions 16#3A9# and
16#2126#.

IMO, the idea to use Unicode for program sources is wrong. The language (be
it formal or natural) should have a finite and reasonably small alphabet.
Unicode is practically an open-end set of symbols most of them you wouldn't
be able to either recognize or remember again.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15 18:53             ` Dmitry A. Kazakov
@ 2006-09-15 22:29               ` Georg Bauhaus
  2006-09-16  7:46                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 50+ messages in thread
From: Georg Bauhaus @ 2006-09-15 22:29 UTC (permalink / raw)


On Fri, 2006-09-15 at 20:53 +0200, Dmitry A. Kazakov wrote:

> IMO, the idea to use Unicode for program sources is wrong. The language (be
> it formal or natural) should have a finite and reasonably small alphabet.
> Unicode is practically an open-end set of symbols most of them you wouldn't
> be able to either recognize or remember again.

Unicode is quite flexible and allows a project to choose a reasonable
subset of characters. A portable subset is fairly easy to describe
because both Ada and UCS define a common character set from which you
can choose. No lengthy discussions of how to interpret 8 bits,
no issues with conforming compilers.
Greek.Ω /= Electric.Ω is an issue in Ada 95, too, when you
use local character sets for two different files.

Shou1d the number l, sorry, 1, not occur in source text, because it
is too easy to miss the difference, so please, remove it from the
Ada grammar? ;-)

You can extend the Unicode subset chosen for the project later, without
introducing ambiguity or a configuration issue. Using Unicode for
program source text lets you write identifiers that just cannot coexists
in Latin_1, or any 8bit character set.



-- Georg 





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-14 22:55           ` Björn Persson
@ 2006-09-15 23:15             ` Jeffrey R. Carter
  2006-09-16  7:38             ` Martin Krischik
  1 sibling, 0 replies; 50+ messages in thread
From: Jeffrey R. Carter @ 2006-09-15 23:15 UTC (permalink / raw)


Bj�rn Persson wrote:
> 
> That'd impose an English-like grammar on other languages, which would 
> often not work at all. There may well be languages where "if" isn't a 
> separate word.

That's why I said "In many cases". For other languages, something more 
complex would be needed.

>> For a truly international language, instead of reserved words, maybe 
>> we should have symbols:
>>
>> ? Kelko_Koza =>
>>    ...
>> ?? Koza_2 =>
>>    ...
>> ?=>
>>    ...
>> ?<;
> 
> That's still not international. A Greek question mark looks like a 
> semicolon for starters.

'?' has no real meaning is such a language except as the language 
defines it. I could have used '@' just as well. All that matters is that 
the Greek developer has the character on his keyboard.

> Don't try. That way lies madness.

Much too C-like for me to try.

-- 
Jeff Carter
"My mind is aglow with whirling, transient nodes of
thought, careening through a cosmic vapor of invention."
Blazing Saddles
85



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15  5:47           ` Martin Krischik
@ 2006-09-15 23:16             ` Jeffrey R. Carter
  2006-09-16  7:31               ` Martin Krischik
  0 siblings, 1 reply; 50+ messages in thread
From: Jeffrey R. Carter @ 2006-09-15 23:16 UTC (permalink / raw)


Martin Krischik wrote:
> 
> I worked on several project were identifiers form system requironments
> where indeed English but identifiers resutling from business
> requironments where German. The  buisiness speaks German and we never
> saw a reason to translate them.

That sounds even worse than having them all be in German.

-- 
Jeff Carter
"My mind is aglow with whirling, transient nodes of
thought, careening through a cosmic vapor of invention."
Blazing Saddles
85



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15  9:41           ` Georg Bauhaus
@ 2006-09-15 23:28             ` Jeffrey R. Carter
  2006-09-16  9:52               ` Georg Bauhaus
  2006-09-16 10:31               ` Björn Persson
  2006-09-16  5:10             ` Simon Wright
  1 sibling, 2 replies; 50+ messages in thread
From: Jeffrey R. Carter @ 2006-09-15 23:28 UTC (permalink / raw)


Georg Bauhaus wrote:
> Jeffrey R. Carter wrote:
> 
> One value of characters outside the "crippled" range for English
> is communication which relates to the problem domain, and specifically
> adresses those who have to solve it.
> The problem might have its own language, as Martin explains.
> In particular, not every program is written by an international
> team. Even if it is written in a common language, the common
> language is only _like_ English, it is rather some computese. At
> least I get this impression when I look at some programs and texts
> including my own...

If the problem domain is not described in English, and the developers 
don't use English, why should they use a language that looks like English?

> But words of grammar are as formal as the symbols in the grammar.
> Other formal languages in the Ada camp even use artificial words
> like "fi" and "od". I think the argument that Ada's grammar implies
> English does not apply even though some of its reserved words
> are English (like "then"). "Procedure" and "function" are not
> specifically English, I'd say. If Ada is designed to read *like* English,
> then we have to consider that the European languages are very much
> *like* each other (communication barriers notwithstanding).
> For example, "when" reads "wenn", "then" reads "dann" (or even "denn"),
> and so on, in German. I'm sure people from other countries west of
> the slavic borders can add similar comparisons.

The origin of "fi" and "od" is fairly obvious, at least to an English 
speaker. They're English words written backwards, serving the purpose of 
Modula's "end", and the words they "end" are themselves reserved words.

With the exception of "elsif", all of Ada's reserved words are English 
words. Other western European languages may have similar words, but that 
may not be a good thing. I recall the misuse of "eventual" in English by 
Netherlands and French speakers in Belgium.

> So using your native language or problem domain language might add
> value to the local mode of expression.

Sure. So would using a programming language with reserved words in that 
language.

> The word "resent" is an example of the effects of people trying
> to write Enlish when they probably shouldn't. "Resent" is to be
> understood as a passive form of the word "resend". This word doesn't
> exist in my fairly recent edition of an Oxford dictionary.  But it has
> been added to a popular online dictionary (dict.leo.org).
> Nevertheless, I bet few people know that "resent" means something
> very different when English isn't their native language.
> (But it reads like English...)

I resent the implications :) Actually, that would be a past tense and 
past participle. The past participle, of course, is used in forming the 
passive voice.

-- 
Jeff Carter
"My mind is aglow with whirling, transient nodes of
thought, careening through a cosmic vapor of invention."
Blazing Saddles
85



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15 18:11           ` Pascal Obry
  2006-09-15 18:53             ` Dmitry A. Kazakov
@ 2006-09-15 23:35             ` Jeffrey R. Carter
  1 sibling, 0 replies; 50+ messages in thread
From: Jeffrey R. Carter @ 2006-09-15 23:35 UTC (permalink / raw)


Pascal Obry wrote:
> 
> In most cases probably. But for example it is nice to be able to support
> Unicode characters as the ? (greek pi) in Ada.Numerics. I bet
> mathematicians will love to be able to use all sort of greek letters in
> their Ada prorams :)

Perhaps. I recall one project that involved implementing a lot of matrix 
and vector operations, hand written using Latin and Greek characters. I 
had a variable named W_Hat. It was actually a small omega with a 
circumflex, but luckily there were no Ws in the math.

It will probably improve, but frequently such characters do not display 
as intended. For the present, I'd rather not see code with meaningless 
or non-graphic characters that display as something else for the originator.

-- 
Jeff Carter
"My mind is aglow with whirling, transient nodes of
thought, careening through a cosmic vapor of invention."
Blazing Saddles
85



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15  9:41           ` Georg Bauhaus
  2006-09-15 23:28             ` Jeffrey R. Carter
@ 2006-09-16  5:10             ` Simon Wright
  1 sibling, 0 replies; 50+ messages in thread
From: Simon Wright @ 2006-09-16  5:10 UTC (permalink / raw)


Georg Bauhaus <bauhaus@futureapps.de> writes:

> The word "resent" is an example of the effects of people trying to
> write Enlish when they probably shouldn't. "Resent" is to be
> understood as a passive form of the word "resend". This word doesn't
> exist in my fairly recent edition of an Oxford dictionary.  But it
> has been added to a popular online dictionary (dict.leo.org).
> Nevertheless, I bet few people know that "resent" means something
> very different when English isn't their native language.  (But it
> reads like English...)

A lot of the people "trying to write English who probably shouldn't"
are native English writers :-)

In the computer context I would use 'resent' and think nothing of it,
native English speakers would be very very likely to understand, it's
a regular (if Latin) construction. But outside that context I would
probably write 'sent again'.

Speech is a different matter; 'rezzent' with stress on second syllable
for 'be annoyed by' vs. 'ree-sent' with stress on first syllable for
'sent again'. But we native speakers get that sort of thing wrong too;
for example, far too many (BBC reporters even) think that 'diffuse'
and 'defuse' (or 'defuze' I suppose) are pronounced the same.



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15 23:16             ` Jeffrey R. Carter
@ 2006-09-16  7:31               ` Martin Krischik
  2006-09-17 19:43                 ` Jeffrey R. Carter
  0 siblings, 1 reply; 50+ messages in thread
From: Martin Krischik @ 2006-09-16  7:31 UTC (permalink / raw)


Jeffrey R. Carter wrote:

> Martin Krischik wrote:
>> 
>> I worked on several project were identifiers form system requironments
>> where indeed English but identifiers resutling from business
>> requironments where German. The  buisiness speaks German and we never
>> saw a reason to translate them.
> 
> That sounds even worse than having them all be in German.

It is just a bad or good idea as "File_Type" or INotifier" or any other
naming convention for identifiers. And at least it save me from looking up
what "Abschluss_Datum" is in English.

Martin
-- 
mailto://krischik@users.sourceforge.net
Ada programming at: http://ada.krischik.com



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-14 22:55           ` Björn Persson
  2006-09-15 23:15             ` Jeffrey R. Carter
@ 2006-09-16  7:38             ` Martin Krischik
  2006-09-17 19:41               ` Jeffrey R. Carter
  1 sibling, 1 reply; 50+ messages in thread
From: Martin Krischik @ 2006-09-16  7:38 UTC (permalink / raw)


Bjï¿œrn Persson wrote:

> Mixing languages in prose disturbs me. I can get quite upset when people
> use English words in Swedish texts even though there are equivalent
> Swedish words. Yet I often program with Swedish identifiers. Program
> code ? even Ada code ? is so different from English that I don't read it
> as English text anyway.

But you can't translate "Ada.Text_IO" so you will allways  have English
identifiers in your program code. You Swedish  for the identifiers under
your control.

Martin
-- 
mailto://krischik@users.sourceforge.net
Ada programming at: http://ada.krischik.com



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-13 19:28       ` Björn Persson
  2006-09-14  6:34         ` Georg Bauhaus
  2006-09-14 22:13         ` Björn Persson
@ 2006-09-16  7:40         ` Martin Krischik
  2006-09-16  9:43           ` Björn Persson
  2 siblings, 1 reply; 50+ messages in thread
From: Martin Krischik @ 2006-09-16  7:40 UTC (permalink / raw)


Bjï¿œrn Persson wrote:

> Theoretically it's better to store the encoding outside the file so that
> you can know what encoding to use before you start reading the file.
> In practice this is usually impossible.

Actually it is possible. The technique employed is called "extended
attributes" and is available for all modern operating systems.

Martin
-- 
mailto://krischik@users.sourceforge.net
Ada programming at: http://ada.krischik.com



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15 22:29               ` Georg Bauhaus
@ 2006-09-16  7:46                 ` Dmitry A. Kazakov
  0 siblings, 0 replies; 50+ messages in thread
From: Dmitry A. Kazakov @ 2006-09-16  7:46 UTC (permalink / raw)


On Sat, 16 Sep 2006 00:29:24 +0200, Georg Bauhaus wrote:

> On Fri, 2006-09-15 at 20:53 +0200, Dmitry A. Kazakov wrote:
> 
>> IMO, the idea to use Unicode for program sources is wrong. The language (be
>> it formal or natural) should have a finite and reasonably small alphabet.
>> Unicode is practically an open-end set of symbols most of them you wouldn't
>> be able to either recognize or remember again.
> 
> Unicode is quite flexible and allows a project to choose a reasonable
> subset of characters. A portable subset is fairly easy to describe
> because both Ada and UCS define a common character set from which you
> can choose. No lengthy discussions of how to interpret 8 bits,
> no issues with conforming compilers.

Are you disagree with the point? How can a language be based on multiple
alphabets? [you are talking about subsets] Would it be still one language?
In the history there are examples of written natural languages changing
alphabets.

> Greek.Ω /= Electric.Ω is an issue in Ada 95, too, when you
> use local character sets for two different files.
> 
> Shou1d the number l, sorry, 1, not occur in source text, because it
> is too easy to miss the difference, so please, remove it from the
> Ada grammar? ;-)

That is an issue of choosing a proper typeface. But Omega (glyph) is same.
Code positions (semantic meaning of the symbol, Ohm vs. Greek Omega) are
different. Exactly this is wrong. Because the semantics of a symbol is to
be defined solely by the language, by Ada in our case. Unicode is not a
language, so far, however, nothing would prevent us to define a Unicode
position for any possible Ada program... (:-))

> You can extend the Unicode subset chosen for the project later, without
> introducing ambiguity or a configuration issue. Using Unicode for
> program source text lets you write identifiers that just cannot coexists
> in Latin_1, or any 8bit character set.

There are many ways to make code unmaintainable, like writing identifiers
in linear B syllabary...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-16  7:40         ` Martin Krischik
@ 2006-09-16  9:43           ` Björn Persson
  2006-09-16  9:59             ` Georg Bauhaus
  2006-09-17  9:30             ` Martin Krischik
  0 siblings, 2 replies; 50+ messages in thread
From: Björn Persson @ 2006-09-16  9:43 UTC (permalink / raw)


Martin Krischik wrote:
> Bjï¿œrn Persson wrote:
> 
>> Theoretically it's better to store the encoding outside the file so that
>> you can know what encoding to use before you start reading the file.
>> In practice this is usually impossible.
> 
> Actually it is possible. The technique employed is called "extended
> attributes" and is available for all modern operating systems.

Will editors, compilers, preprocessors et cetera find the encoding 
setting in the extended attributes and obey it? Will version handlers, 
web browsers and other file transfer programs find the setting and send 
it when uploading? Will they set it correctly when downloading? Will all 
these programs use the same attribute for character encoding or will 
each one do it its own way? Will file managers copy and move the 
extended attributes along with the file?

In short, is it really possible *in practice*?

-- 
Bjï¿œrn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15 23:28             ` Jeffrey R. Carter
@ 2006-09-16  9:52               ` Georg Bauhaus
  2006-09-16 10:31               ` Björn Persson
  1 sibling, 0 replies; 50+ messages in thread
From: Georg Bauhaus @ 2006-09-16  9:52 UTC (permalink / raw)


On Fri, 2006-09-15 at 23:28 +0000, Jeffrey R. Carter wrote:

> > So using your native language or problem domain language might add
> > value to the local mode of expression.
> 
> Sure. So would using a programming language with reserved words in that 
> language.

I have seen this in teaching, and in the translations of the Logo
and Basic languages. In normal programming contexts I think it just
adds confusion, because there is no clear choice of the proper
translation. Libraries are frequently written using the "English
edition" of the grammar, so you'd have to switch recognition patterns
anyway.
For example, when we have to write Perl code we don't even use Perl's
translations from English to English so to speak, like writing "unless
(expr)" instead of "if (! expr)" (when positive logic would tear things
apart). We just stick with the language neutral if/then/elsif etc..
Keeps it simple.

When our program text is mixing formal words from the language grammar
and from the problem domain, the result will be formal anyway.
E.g. I have an idea what Martin's identifier "Abschluss_Datum"
might mean. Translating such words can add misunderstandings even
when they are correct in US English, because of different British
traditions. Or worse:
When you fill in a deposit slip (pay-in slip, paying-in slip),
then I guess you wouldn't be surprised to see a field
labeled "DATE".
The German label of this field is probably "Wert". The uninitiated
might translate this as "worth", "value", "asset", etc., but
not as "date". And even "date" might refer to different things,
implying unintended consequences of the translation.
Choosing an unambiguous word from the "profession languages" spoken
by local trades people will reduce a number of risks I think,
including cost. So my preferred rule will be:
Use Ada's formal English words, because you have learned what
they mean, and use words from the problem domain even if they
aren't English words, again because you know what they mean,
as do users and developers of the software. If there
is time and money for a proper translation, delegate.


> Actually, that would be a past tense and 
> past participle. The past participle, of course, is used in
> forming the passive voice.

Thanks for the corrections.


-- Georg 





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-16  9:43           ` Björn Persson
@ 2006-09-16  9:59             ` Georg Bauhaus
  2006-09-16 11:15               ` Björn Persson
  2006-09-17  9:30             ` Martin Krischik
  1 sibling, 1 reply; 50+ messages in thread
From: Georg Bauhaus @ 2006-09-16  9:59 UTC (permalink / raw)


On Sat, 2006-09-16 at 09:43 +0000, Björn Persson wrote:

> > The technique employed is called "extended
> > attributes" and is available for all modern operating systems.
> 
> Will editors, compilers, preprocessors et cetera find the encoding 
> setting in the extended attributes and obey it?
> [...]
> 
> In short, is it really possible *in practice*?

As long as programmers choose deliberate obstruction,
they construct the truth "not possible in practice" :/
Meanwhile HTTP, XML, and ASN.1 are demonstrating that
it is quite possible to inform the other end of the
data type and encoding used in data transfers.






^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-15 23:28             ` Jeffrey R. Carter
  2006-09-16  9:52               ` Georg Bauhaus
@ 2006-09-16 10:31               ` Björn Persson
  2006-09-17 19:57                 ` Jeffrey R. Carter
  1 sibling, 1 reply; 50+ messages in thread
From: Björn Persson @ 2006-09-16 10:31 UTC (permalink / raw)


Jeffrey R. Carter wrote:
> If the problem domain is not described in English, and the developers 
> don't use English, why should they use a language that looks like English?

Because it's a bad idea to invent new programming languages all the 
time. It's not only a waste of time to start your project by designing a 
language and writing a compiler; the language would also most likely be 
much worse than some of the existing languages you could have used. 
Having one well-designed language that can be used in all projects is 
much better. (Those were the reasons why Ada was created, as I've 
understood it.) English is � unfortunately � the /lingua franca/ of our 
time, so using English words for keywords is a better choice than using 
German words.

-- 
Bj�rn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-16  9:59             ` Georg Bauhaus
@ 2006-09-16 11:15               ` Björn Persson
  0 siblings, 0 replies; 50+ messages in thread
From: Björn Persson @ 2006-09-16 11:15 UTC (permalink / raw)


Georg Bauhaus wrote:
> Meanwhile HTTP, XML, and ASN.1 are demonstrating that
> it is quite possible to inform the other end of the
> data type and encoding used in data transfers.

And MIME too. But when I save a document that I've received with HTTP or 
as an email attachment, that information is lost. If it is indeed stored 
somewhere, then I have no idea where, or how to retrieve it. That's why 
the XML declaration is needed.

-- 
Björn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-16  9:43           ` Björn Persson
  2006-09-16  9:59             ` Georg Bauhaus
@ 2006-09-17  9:30             ` Martin Krischik
  1 sibling, 0 replies; 50+ messages in thread
From: Martin Krischik @ 2006-09-17  9:30 UTC (permalink / raw)


Bjï¿œrn Persson wrote:

> Martin Krischik wrote:
>> Bjï¿œrn Persson wrote:
>> 
>>> Theoretically it's better to store the encoding outside the file so that
>>> you can know what encoding to use before you start reading the file.
>>> In practice this is usually impossible.
>> 
>> Actually it is possible. The technique employed is called "extended
>> attributes" and is available for all modern operating systems.
> 
> Will editors, compilers, preprocessors et cetera find the encoding
> setting in the extended attributes and obey it? Will version handlers,
> web browsers and other file transfer programs find the setting and send
> it when uploading? Will they set it correctly when downloading? Will all
> these programs use the same attribute for character encoding or will
> each one do it its own way? Will file managers copy and move the
> extended attributes along with the file?
> 
> In short, is it really possible *in practice*?

Only if the programmers are disciplined enough to consistently use the
features available. So probably: No.

The closest I have ever seen was OS/2. And even there most programmers where
to lazy to look up or set ".TYPE" and used the file extension instead.

Martin
-- 
mailto://krischik@users.sourceforge.net
Ada programming at: http://ada.krischik.com



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-16  7:38             ` Martin Krischik
@ 2006-09-17 19:41               ` Jeffrey R. Carter
  0 siblings, 0 replies; 50+ messages in thread
From: Jeffrey R. Carter @ 2006-09-17 19:41 UTC (permalink / raw)


Martin Krischik wrote:
> 
> But you can't translate "Ada.Text_IO" so you will allways  have English
> identifiers in your program code. You Swedish  for the identifiers under
> your control.

Assuming you use Ada instead of a language with Swedish reserved words.

-- 
Jeff Carter
"Run away! Run away!"
Monty Python and the Holy Grail
58



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-16  7:31               ` Martin Krischik
@ 2006-09-17 19:43                 ` Jeffrey R. Carter
  0 siblings, 0 replies; 50+ messages in thread
From: Jeffrey R. Carter @ 2006-09-17 19:43 UTC (permalink / raw)


Martin Krischik wrote:
> 
> It is just a bad or good idea as "File_Type" or INotifier" or any other
> naming convention for identifiers. And at least it save me from looking up
> what "Abschluss_Datum" is in English.

2 more bad naming conventions.

But you wouldn't have to thing about looking up the English for your 
German concepts if your language had German reserved words.

-- 
Jeff Carter
"Run away! Run away!"
Monty Python and the Holy Grail
58



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-16 10:31               ` Björn Persson
@ 2006-09-17 19:57                 ` Jeffrey R. Carter
  2006-09-18  0:06                   ` Björn Persson
  0 siblings, 1 reply; 50+ messages in thread
From: Jeffrey R. Carter @ 2006-09-17 19:57 UTC (permalink / raw)


Bj�rn Persson wrote:
> 
> Because it's a bad idea to invent new programming languages all the 
> time. It's not only a waste of time to start your project by designing a 
> language and writing a compiler; the language would also most likely be 
> much worse than some of the existing languages you could have used. 
> Having one well-designed language that can be used in all projects is 
> much better. (Those were the reasons why Ada was created, as I've 
> understood it.) English is � unfortunately � the /lingua franca/ of our 
> time, so using English words for keywords is a better choice than using 
> German words.

Designing a language for a project is sometimes a good idea. I recall a 
report about a class project where there was actual hardware to control 
(I think it was a clothes washing machine). When the students tried to 
control the hardware directly from the general-purpose language, there 
were lots of failures. When they created a problem-specific language, 
with statements such as Fill_Tank (Load_Size), Drain_Tank, 
Start_Agitation, and so on, success was much more common.

Of course, these statements were really calls to a library, and the 
program was really in the general-purpose language.

But I agree that one doesn't want to design and implement a language for 
each project. Just for each language. Ada was designed for the US DOD, 
an English-speaking organization. Do you think a similar project in 
China should have used English reserved words? Or that it would have 
been fine for the Green team to have used French reserved words?

-- 
Jeff Carter
"Run away! Run away!"
Monty Python and the Holy Grail
58



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-17 19:57                 ` Jeffrey R. Carter
@ 2006-09-18  0:06                   ` Björn Persson
  2006-09-18 20:14                     ` Jeffrey R. Carter
  0 siblings, 1 reply; 50+ messages in thread
From: Björn Persson @ 2006-09-18  0:06 UTC (permalink / raw)


Jeffrey R. Carter wrote:
> Designing a language for a project is sometimes a good idea. I recall a 
> report about a class project where there was actual hardware to control 
> (I think it was a clothes washing machine). When the students tried to 
> control the hardware directly from the general-purpose language, there 
> were lots of failures. When they created a problem-specific language, 
> with statements such as Fill_Tank (Load_Size), Drain_Tank, 
> Start_Agitation, and so on, success was much more common.
> 
> Of course, these statements were really calls to a library, and the 
> program was really in the general-purpose language.

Yes, libraries are usually good.

> But I agree that one doesn't want to design and implement a language for 
> each project. Just for each language.

So you want to design one programming language for each human language? 
How many would there be? This Wikipedia article says the draft ISO 639-3 
has 7602 language codes:
http://en.wikipedia.org/wiki/ISO_639-3

This page lists 7618 codes:
http://en.wikipedia.org/wiki/List_of_ISO_639-3_codes

That's going to keep you occupied for a while. But maybe you were 
planning to discriminate against all the small languages, and only 
provide programming languages to those with more than some arbitrarily 
chosen number of speakers? Here's a list of some 250 languages with more 
than a million native speakers:
http://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers

You realize of course that you'll then have to reimplement a whole lot 
of libraries in each of these new languages. And debug them. Seven 
thousand times over (or 250 of whatever). Then, whenever a bug is found 
or an improvement is made in a library, the same change will have to be 
made in all the other incarnations too.

If you think that sounds like a good way to spend your time, go ahead. 
Me, I'm going to stick to code reuse.

-- 
Bj�rn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-18  0:06                   ` Björn Persson
@ 2006-09-18 20:14                     ` Jeffrey R. Carter
  0 siblings, 0 replies; 50+ messages in thread
From: Jeffrey R. Carter @ 2006-09-18 20:14 UTC (permalink / raw)


Bj�rn Persson wrote:
> 
> So you want to design one programming language for each human language? 
> How many would there be? This Wikipedia article says the draft ISO 639-3 
> has 7602 language codes:
> http://en.wikipedia.org/wiki/ISO_639-3

I don't. I'm a native English speaker, so a single, well designed, 
English-oriented language is all I need. I want programming languages to 
decide what language they're in, and stick to that language. Then people 
who want to use identifiers not supported by an English-oriented 
language, such as Ada, will have a reason to create their own language.

-- 
Jeff Carter
"If you think you got a nasty taunting this time,
you ain't heard nothing yet!"
Monty Python and the Holy Grail
23



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-12  9:52 ` Stephen Leake
@ 2006-09-19  1:16   ` Marc A. Criley
  2006-09-19  9:20     ` Stephen Leake
  0 siblings, 1 reply; 50+ messages in thread
From: Marc A. Criley @ 2006-09-19  1:16 UTC (permalink / raw)


Stephen Leake wrote:
> Manuel Collado <m.collado@fi.upm.es> writes:
> 
> 
>>1. The ASIS API should provide a way to know the character encoding of
>>the source file (I think it doesn't).
> 
> Please send this suggestion to the ARG, at ada-comment@ada-auth.org;
> they are currently revising the ASIS standard for Ada 2005.

Just to try to drag this thread back to its original topic...

I've found out that the ASIS-for-GNAT that accompanies GNAT GPL 2006 does 
not permit on-the-fly compilation of Ada units using a character set other 
than ASCII.  Largely because there's no way to pass the "-gnatW8" switch 
via the ASIS interface to the compiler.

ASIS can handle such character sets, it's simply that it has to interact 
with the sources via their corresponding tree, ".adt", files.  These are 
generated by specifying the -gnatc and -gnatt options to gnatmake.

For example:

$ gnatmake -c -gnatc -gnatt -gnatW8 bisiesto.adb

AdaCore has already remedied this situation in their current version of 
ASIS-for-GNAT, but which is not yet publically available.

And Avatox 1.2 will be out shortly to handle non-ASCII character sets.

-- Marc A. Criley
-- McKae Technologies
-- www.mckae.com
-- DTraq - Avatox - XIA - XML EZ Out




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Avatox 1.0: Trouble with encoding in Windows
  2006-09-19  1:16   ` Marc A. Criley
@ 2006-09-19  9:20     ` Stephen Leake
  0 siblings, 0 replies; 50+ messages in thread
From: Stephen Leake @ 2006-09-19  9:20 UTC (permalink / raw)


"Marc A. Criley" <mcNOSPAM@mckae.com> writes:

> I've found out that the ASIS-for-GNAT that accompanies GNAT GPL 2006
> does not permit on-the-fly compilation of Ada units using a character
> set other than ASCII.  Largely because there's no way to pass the
> "-gnatW8" switch via the ASIS interface to the compiler.

It is very easy to patch the driver code to allow passing that
parameter. Perhaps you could do that, and post the patch somewhere.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2006-09-19  9:20 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-11  8:24 Avatox 1.0: Trouble with encoding in Windows Manuel Collado
2006-09-11 10:35 ` Georg Bauhaus
2006-09-11 13:49   ` Avatox 1.1: " Manuel Collado
2006-09-11 16:43     ` Georg Bauhaus
2006-09-11 17:50     ` Björn Persson
2006-09-12  0:06       ` Marc A. Criley
2006-09-12  8:35         ` Manuel Collado
2006-09-13  0:01   ` Avatox 1.0: " Randy Brukardt
2006-09-13  9:01     ` Georg Bauhaus
2006-09-13 19:28       ` Björn Persson
2006-09-14  6:34         ` Georg Bauhaus
2006-09-14 23:09           ` Björn Persson
2006-09-14 22:13         ` Björn Persson
2006-09-16  7:40         ` Martin Krischik
2006-09-16  9:43           ` Björn Persson
2006-09-16  9:59             ` Georg Bauhaus
2006-09-16 11:15               ` Björn Persson
2006-09-17  9:30             ` Martin Krischik
2006-09-13 10:32     ` Manuel Collado
2006-09-13 18:28       ` Björn Persson
2006-09-14  8:11         ` Manuel Collado
2006-09-13 23:05       ` Randy Brukardt
2006-09-13 11:04     ` vgodunko
2006-09-14  8:56       ` Martin Krischik
2006-09-14 21:16         ` Jeffrey R. Carter
2006-09-14 22:55           ` Björn Persson
2006-09-15 23:15             ` Jeffrey R. Carter
2006-09-16  7:38             ` Martin Krischik
2006-09-17 19:41               ` Jeffrey R. Carter
2006-09-15  5:47           ` Martin Krischik
2006-09-15 23:16             ` Jeffrey R. Carter
2006-09-16  7:31               ` Martin Krischik
2006-09-17 19:43                 ` Jeffrey R. Carter
2006-09-15  9:41           ` Georg Bauhaus
2006-09-15 23:28             ` Jeffrey R. Carter
2006-09-16  9:52               ` Georg Bauhaus
2006-09-16 10:31               ` Björn Persson
2006-09-17 19:57                 ` Jeffrey R. Carter
2006-09-18  0:06                   ` Björn Persson
2006-09-18 20:14                     ` Jeffrey R. Carter
2006-09-16  5:10             ` Simon Wright
2006-09-15 18:11           ` Pascal Obry
2006-09-15 18:53             ` Dmitry A. Kazakov
2006-09-15 22:29               ` Georg Bauhaus
2006-09-16  7:46                 ` Dmitry A. Kazakov
2006-09-15 23:35             ` Jeffrey R. Carter
2006-09-15  5:34         ` Simon Wright
2006-09-12  9:52 ` Stephen Leake
2006-09-19  1:16   ` Marc A. Criley
2006-09-19  9:20     ` Stephen Leake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox