From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Simon Wright Newsgroups: comp.lang.ada Subject: Re: GNAT vs UTF-8 source file names Date: Sat, 17 Jun 2017 18:20:28 +0100 Organization: A noiseless patient Spider Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: mx02.eternal-september.org; posting-host="be3d47e6e775a7190e8813a0115a0279"; logging-data="932"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18oLfcO7ssVo9fMtN3KvFSkc6HWJBNdWWQ=" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (darwin) Cancel-Lock: sha1:zqU2mq+hCBCZJZb+Mj+1YsqYThY= sha1:wlSx2xTOHHA0TAmqIVchI2VwR2s= Xref: news.eternal-september.org comp.lang.ada:46969 Date: 2017-06-17T18:20:28+01:00 List-Id: Simon Wright writes: > ACATS 4.1 test C250002 involves unit names with UTF-8 characters (the > source has the correct UTF-8 BOM, the relevant unit is named C250002_Z > where Z is actually UTF-8 C381, latin capital letter a with acute; > gnatchop correctly generates a source file with the BOM and name > c250002_z where z is actually UTF-8 C3A1, latin small letter a with > acute). > > On compiling, the compiler (GNAT GPL 2016, FSF GCC 7.0.1) fails to find > the file; it says e.g. > > GNATMAKE GPL 2016 (20160515-49) > Copyright (C) 1992-2016, Free Software Foundation, Inc. > gcc -c -I../../../support -gnatW8 c250002.adb > gcc -c -I../../../support -gnatW8 c250002_0.ads > End of compilation > gnatmake: "c250002_?.adb" not found PR ada/81114 refers[1]. It turns out that this failure occurs on Windows and macOS. The problem is that GNAT smashes the file name to lower case if it knows that the file system is case-insensitive (using an ASCII to-lower, so of course 'smash' is the right word if there are UTF-8 characters in there). There is an undocumented environment variable that affects this: $ GNAT_FILE_NAME_CASE_SENSITIVE=1 gnatmake c250002 gcc -c c250002.adb gcc -c c250002_á.adb gnatbind -x c250002.ali gnatlink c250002.ali $ ./c250002 ,.,. C250002 ACATS 4.1 17-06-17 18:05:55 ---- C250002 Check that characters above ASCII.Del can be used in identifiers, character literals and strings. - C250002 C250002_0.TAGGED_à_ID. ==== C250002 PASSED ============================. I wonder why, if the FS is case-insensitive, GNAT bothers at all? (there was, I think, some remark about detecting whether two filenames represented different files). What do people who actually need to use international character sets do about this? Do you just avoid using international characters in Ada unit names? Or have I just missed the relevant part of the manual? [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81114