From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a5d:9b95:: with SMTP id r21mr1177782iom.38.1550402646510; Sun, 17 Feb 2019 03:24:06 -0800 (PST) X-Received: by 2002:aca:df55:: with SMTP id w82mr170262oig.6.1550402646402; Sun, 17 Feb 2019 03:24:06 -0800 (PST) Path: eternal-september.org!reader01.eternal-september.org!feeder.eternal-september.org!news.gegeweb.eu!gegeweb.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.166.216.MISMATCH!y42no280816ita.0!news-out.google.com!v188ni666itb.0!nntp.google.com!y22no314626ita.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Sun, 17 Feb 2019 03:24:06 -0800 (PST) Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2001:4dd6:7c6d:0:be5f:f4ff:fe1a:728e; posting-account=-LiPHgoAAADqS6-4oLuX7u5GspMIXfxZ NNTP-Posting-Host: 2001:4dd6:7c6d:0:be5f:f4ff:fe1a:728e User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <69abfba5-0dae-493a-b39c-91fcf7be8c75@googlegroups.com> Subject: gnat_regpat and unexpected handling of alnum and unicode needed From: 19.krause.70@googlemail.com Injection-Date: Sun, 17 Feb 2019 11:24:06 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: reader01.eternal-september.org comp.lang.ada:55537 Date: 2019-02-17T03:24:06-08:00 List-Id: Hello All, I was strugeling about a behavior in gnat-repat which is different from the= behavior of egrep for example and different from the behavior I expect. The expression [[:alnum:]] matches the underscore in gnat_regpat but not in= egrep. It feels much more natural to don't match the underscore like egrep= does. And I think it is more posix compliant. Question is why? Now I could simply use [[:alpha:][0-9]]+ instead but then I got to my secon= d question: How do I handle unicode strings with gnat_regpat, because [[:al= pha:]] seems to match only ascii a-zA-Z. Some sample code (Safe as utf-8 te= xt, compiled with -gnatW8): with Ada.Text_IO; with Gnat.Regpat; procedure gnat_regpat_test is test1 : constant String :=3D "foo_bar"; test2 : constant String :=3D "f=C3=B6=C3=B6b=C3=A4r"; regexp1 : constant String :=3D "^[[:alnum:]]+$"; regexp2 : constant String :=3D "^[[:alpha:][0-9]]+$"; begin if Gnat.Regpat.Match(Expression =3D> regexp1, Data =3D> test1) then Ada.Text_IO.Put_Line(test1 & " Matched regexp1 " & regexp1 & "!"); else Ada.Text_IO.Put_Line(test1 & " doesn't Match regexp1 " & regexp1); end if; if Gnat.Regpat.Match(Expression =3D> regexp2, Data =3D> test1) then Ada.Text_IO.Put_Line(test1 & " Matched regexp2 " & regexp2 & "!"); else Ada.Text_IO.Put_Line(test1 & " doesn't Match regexp2 " & regexp2); end if; if Gnat.Regpat.Match(Expression =3D> regexp1, Data =3D> test2) then Ada.Text_IO.Put_Line(test2 & " Matched regexp1 " & regexp1 & "!"); else Ada.Text_IO.Put_Line(test2 & " doesn't Match regexp1 " & regexp1); end if; =20 if Gnat.Regpat.Match(Expression =3D> regexp2, Data =3D> test2) then Ada.Text_IO.Put_Line(test2 & " Matched regexp2 " & regexp2 & "!"); else Ada.Text_IO.Put_Line(test2 & " doesn't Match regexp2 " & regexp2); end if; =20 end gnat_regpat_test; Best Regards, Hubert