comp.lang.ada
 help / color / mirror / Atom feed
From: Simon Wright <simon@pushface.org>
Subject: Re: GNAT vs UTF-8 source file names
Date: Wed, 05 Jul 2017 10:47:39 +0100
Date: 2017-07-05T10:47:39+01:00	[thread overview]
Message-ID: <ly60f72p1g.fsf@pushface.org> (raw)
In-Reply-To: ojhspu$sb2$1@dont-email.me

"J-P. Rosen" <rosen@adalog.fr> writes:

> Le 04/07/2017 à 15:57, Simon Wright a écrit :
>> The reason for this apparently-bizarre message is[3] that macOS takes
>> the composed form (lowercase a acute) and converts it under the hood
>> to what HFS+ insists on, the fully decomposed form (lowercase a,
>> combining acute); thus the names are actually different even though
>> they _look_ the same.
> Apparently, they use NFD (Normalization Form D). Normalization forms
> are necessary to avoid a whole lot of problems, although Ada requires
> normalization form C (ARM 2.1 (4.1/3)), or more precisely, it is
> implementation defined if the text is not in NFC.

That reference specifies NFKC which I suppose is near! GNAT uses this if
either you compile with -gnatW8 or the file begins with a UTF8 BOM.

The problems I've noted in this thread in the GNAT implementation are
two:

(1) On Windows and macOS (and possibly on VMS, not sure if that's
relevant any more) the file name corresponding to a unit name is
converted to lower-case assuming it's Latin-1 -
System.Case_Util.To_Lower,

   function To_Lower (A : Character) return Character is
      A_Val : constant Natural := Character'Pos (A);

   begin
      if A in 'A' .. 'Z'
        or else A_Val in 16#C0# .. 16#D6#
        or else A_Val in 16#D8# .. 16#DE#
      then
         return Character'Val (A_Val + 16#20#);
      else
         return A;
      end if;
   end To_Lower;

This is the problem that prevents use of extended characters in unit
names.

(2) On macOS, the expected file name appears to be stored in NFC, but is
retrieved from the file system in NFD.

It seems this will only cause a problem if you compile the file (on its
own, not as part of the closure of another file - weird - possibly
because the wildcard picks up the NFD representation, while compiling as
part of the closure uses the NFC representation in the ALI?) with -gnatwe:

$ GNAT_FILE_NAME_CASE_SENSITIVE=1 gnatmake -c -f p*.ads -gnatwe
gcc -c -gnatwe páck3.ads
páck3.ads:1:10: warning: file name does not match unit name, should be "páck3.ads"
gnatmake: "páck3.ads" compilation error

(this message was copied from Terminal and pasted into Emacs, which
makes clear the difference between the two representations; previously
I've copied from Terminal and pasted into Safari/Bugzilla, which
produced identical glyphs).

  reply	other threads:[~2017-07-05  9:47 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-30 17:10 GNAT vs UTF-8 source file names Simon Wright
2017-06-17 17:20 ` Simon Wright
2017-06-27 13:22   ` Jacob Sparre Andersen
2017-06-27 21:45     ` Niklas Holsti
2017-06-28  5:05       ` G.B.
2017-07-04 13:57   ` Simon Wright
2017-07-04 17:30     ` Shark8
2017-07-04 18:08       ` Dennis Lee Bieber
2017-07-05  5:25       ` J-P. Rosen
2017-07-06 15:18         ` Shark8
2017-07-07  8:19           ` J-P. Rosen
2017-07-05  5:21     ` J-P. Rosen
2017-07-05  9:47       ` Simon Wright [this message]
2017-07-05 11:20         ` J-P. Rosen
2017-07-05 18:42           ` Randy Brukardt
2017-07-06 18:43           ` Simon Wright
2017-07-07  8:26             ` J-P. Rosen
2017-07-07 11:01               ` Simon Wright
2017-07-07 11:49                 ` Jacob Sparre Andersen
2017-07-07 19:44                   ` Randy Brukardt
2017-07-07 19:40                 ` Randy Brukardt
2017-07-07 21:02                   ` Simon Wright
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox