From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Simon Wright <simon@pushface.org>
Newsgroups: comp.lang.ada
Subject: GNAT vs UTF-8 source file names
Date: Sun, 30 Apr 2017 18:10:42 +0100
Organization: A noiseless patient Spider
Message-ID: <lytw55kei5.fsf@pushface.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: mx02.eternal-september.org;
 posting-host="3f4ccb4531bb202b170725dfbbdde5c1";
	logging-data="7459"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX19NSCnLGakiBUsfrW24wdxcgIER6k6Ys8k="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (darwin)
Cancel-Lock: sha1:i7QHT4DN+Is2+9Y8jEEIraCJrZY=
	sha1:/nqLpfjc/pDOWkW3O8iH08oFueA=
Xref: news.eternal-september.org comp.lang.ada:46647
Date: 2017-04-30T18:10:42+01:00
List-Id: <comp.lang.ada>

ACATS 4.1 test C250002 involves unit names with UTF-8 characters (the
source has the correct UTF-8 BOM, the relevant unit is named C250002_Z
where Z is actually UTF-8 C381, latin capital letter a with acute;
gnatchop correctly generates a source file with the BOM and name
c250002_z where z is actually UTF-8 C3A1, latin small letter a with
acute).

On compiling, the compiler (GNAT GPL 2016, FSF GCC 7.0.1) fails to find
the file; it says e.g.

   GNATMAKE GPL 2016 (20160515-49)
   Copyright (C) 1992-2016, Free Software Foundation, Inc.
   gcc -c -I../../../support -gnatW8 c250002.adb
   gcc -c -I../../../support -gnatW8 c250002_0.ads
   End of compilation
   gnatmake: "c250002_?.adb" not found

I _suspect_ that the problem is down to the .ali file. macOS says

   $ file -I *
   c250002.adb:   text/plain; charset=utf-8
   c250002.ali:   text/plain; charset=unknown-8bit
   c250002.lst:   text/plain; charset=us-ascii
   c250002.o:     application/x-mach-binary; charset=binary
   c250002_0.ads: text/plain; charset=utf-8
   c250002_á.adb: text/plain; charset=utf-8
   c250002_á.ads: text/plain; charset=utf-8

(the last 2 were actually a-acute on the terminal) but the .ali file is
confused about whether the representation of the a-acute is C3A1 (good,
assuming it gets interpreted as UTF-8 without a BOM) or E3A1 (bad),
particularly about the corresponding .ali file name.

Any thoughts? is this a known issue?

(C250001, which has BOMs and UTF-8 identifiers but not file names, works fine
with no -gnatW8 messing)