From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail From: Simon Wright Newsgroups: comp.lang.ada Subject: Re: gnat_regpat and unexpected handling of alnum and unicode needed Date: Sun, 17 Feb 2019 12:50:20 +0000 Organization: A noiseless patient Spider Message-ID: References: <69abfba5-0dae-493a-b39c-91fcf7be8c75@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: reader02.eternal-september.org; posting-host="16e1b671cf89750f0b7ac1400c8561b1"; logging-data="29262"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+m0/cCdew1PJS+IiIIs6vJ22GkkFqmZho=" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (darwin) Cancel-Lock: sha1:j72YtUbTUi59TzmjAelj1ZNosCk= sha1:J+oYXLT29MuBNm5yZsVBSm0SEqA= Xref: reader01.eternal-september.org comp.lang.ada:55538 Date: 2019-02-17T12:50:20+00:00 List-Id: 19.krause.70@googlemail.com writes: > The expression [[:alnum:]] matches the underscore in gnat_regpat but > not in egrep. It feels much more natural to don't match the underscore > like egrep does. And I think it is more posix compliant. > > Question is why? Because, at s-regpat.adb:2325, we find function Is_Alnum (C : Character) return Boolean is begin return Is_Alphanumeric (C) or else C = '_'; end Is_Alnum; (Is_Alphanumeric is in Ada.Characters.Handling), presumably because the author liked using underscores in identifiers. > How do I handle unicode strings with gnat_regpat, because [[:alpha:]] > seems to match only ascii a-zA-Z. What GNAT does with -gnatW8 is to read UTF-8 from the source file and, in the case of characters, convert then to the internal Latin-1 (approximately) character. So your 'ö' is converted to the single character with value 246, LC_O_Diaeresis. I tried just letters, and got fööbär Matched regexp3 ^[[:alpha:]]+$! No idea what's going on here!