comp.lang.ada
 help / color / mirror / Atom feed
From: Simon Wright <simon@pushface.org>
Subject: Re: GNAT vs UTF-8 source file names
Date: Sat, 17 Jun 2017 18:20:28 +0100
Date: 2017-06-17T18:20:28+01:00	[thread overview]
Message-ID: <lyefuia5ur.fsf@pushface.org> (raw)
In-Reply-To: lytw55kei5.fsf@pushface.org

Simon Wright <simon@pushface.org> writes:

> ACATS 4.1 test C250002 involves unit names with UTF-8 characters (the
> source has the correct UTF-8 BOM, the relevant unit is named C250002_Z
> where Z is actually UTF-8 C381, latin capital letter a with acute;
> gnatchop correctly generates a source file with the BOM and name
> c250002_z where z is actually UTF-8 C3A1, latin small letter a with
> acute).
>
> On compiling, the compiler (GNAT GPL 2016, FSF GCC 7.0.1) fails to find
> the file; it says e.g.
>
>    GNATMAKE GPL 2016 (20160515-49)
>    Copyright (C) 1992-2016, Free Software Foundation, Inc.
>    gcc -c -I../../../support -gnatW8 c250002.adb
>    gcc -c -I../../../support -gnatW8 c250002_0.ads
>    End of compilation
>    gnatmake: "c250002_?.adb" not found

PR ada/81114 refers[1].

It turns out that this failure occurs on Windows and macOS. The problem
is that GNAT smashes the file name to lower case if it knows that the
file system is case-insensitive (using an ASCII to-lower, so of course
'smash' is the right word if there are UTF-8 characters in there).

There is an undocumented environment variable that affects this:

   $ GNAT_FILE_NAME_CASE_SENSITIVE=1 gnatmake c250002
   gcc -c c250002.adb
   gcc -c c250002_á.adb
   gnatbind -x c250002.ali
   gnatlink c250002.ali
   $ ./c250002

   ,.,. C250002 ACATS 4.1 17-06-17 18:05:55
   ---- C250002 Check that characters above ASCII.Del can be used in
                   identifiers, character literals and strings.
      - C250002 C250002_0.TAGGED_à_ID.
   ==== C250002 PASSED ============================.

I wonder why, if the FS is case-insensitive, GNAT bothers at all? (there
was, I think, some remark about detecting whether two filenames
represented different files).

What do people who actually need to use international character sets do
about this? Do you just avoid using international characters in Ada unit
names? Or have I just missed the relevant part of the manual?

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81114


  reply	other threads:[~2017-06-17 17:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-30 17:10 GNAT vs UTF-8 source file names Simon Wright
2017-06-17 17:20 ` Simon Wright [this message]
2017-06-27 13:22   ` Jacob Sparre Andersen
2017-06-27 21:45     ` Niklas Holsti
2017-06-28  5:05       ` G.B.
2017-07-04 13:57   ` Simon Wright
2017-07-04 17:30     ` Shark8
2017-07-04 18:08       ` Dennis Lee Bieber
2017-07-05  5:25       ` J-P. Rosen
2017-07-06 15:18         ` Shark8
2017-07-07  8:19           ` J-P. Rosen
2017-07-05  5:21     ` J-P. Rosen
2017-07-05  9:47       ` Simon Wright
2017-07-05 11:20         ` J-P. Rosen
2017-07-05 18:42           ` Randy Brukardt
2017-07-06 18:43           ` Simon Wright
2017-07-07  8:26             ` J-P. Rosen
2017-07-07 11:01               ` Simon Wright
2017-07-07 11:49                 ` Jacob Sparre Andersen
2017-07-07 19:44                   ` Randy Brukardt
2017-07-07 19:40                 ` Randy Brukardt
2017-07-07 21:02                   ` Simon Wright
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox