comp.lang.ada
 help / color / mirror / Atom feed
From: Simon Wright <simon@pushface.org>
Subject: Re: gnat_regpat and unexpected handling of alnum and unicode needed
Date: Sun, 17 Feb 2019 12:50:20 +0000
Date: 2019-02-17T12:50:20+00:00	[thread overview]
Message-ID: <lytvh2o9yr.fsf@pushface.org> (raw)
In-Reply-To: 69abfba5-0dae-493a-b39c-91fcf7be8c75@googlegroups.com

19.krause.70@googlemail.com writes:

> The expression [[:alnum:]] matches the underscore in gnat_regpat but
> not in egrep. It feels much more natural to don't match the underscore
> like egrep does. And I think it is more posix compliant.
>
> Question is why?

Because, at s-regpat.adb:2325, we find

   function Is_Alnum (C : Character) return Boolean is
   begin
      return Is_Alphanumeric (C) or else C = '_';
   end Is_Alnum;

(Is_Alphanumeric is in Ada.Characters.Handling), presumably because the
author liked using underscores in identifiers.

> How do I handle unicode strings with gnat_regpat, because [[:alpha:]]
> seems to match only ascii a-zA-Z.

What GNAT does with -gnatW8 is to read UTF-8 from the source file and,
in the case of characters, convert then to the internal Latin-1
(approximately) character. So your 'ö' is converted to the single
character with value 246, LC_O_Diaeresis.

I tried just letters, and got

   fööbär Matched regexp3 ^[[:alpha:]]+$!

No idea what's going on here!


  reply	other threads:[~2019-02-17 12:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-17 11:24 gnat_regpat and unexpected handling of alnum and unicode needed 19.krause.70
2019-02-17 12:50 ` Simon Wright [this message]
2019-02-17 13:15   ` 19.krause.70
2019-02-17 13:21   ` 19.krause.70
2019-02-17 13:09 ` 19.krause.70
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox