From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,6bf1c4b845bd2160 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII Path: g2news1.google.com!news3.google.com!feeder.news-service.com!feeder1.cambriumusenet.nl!feed.tweaknews.nl!193.141.40.65.MISMATCH!npeer.de.kpn-eurorings.net!npeer-ng0.de.kpn-eurorings.net!newsfeed.arcor.de!newsspool3.arcor-online.net!news.arcor.de.POSTED!not-for-mail Date: Wed, 25 Aug 2010 10:57:45 +0200 From: Georg Bauhaus User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2 MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: What about a glob standard method in Ada.Command_Line ? References: <4c6f9837$0$5420$ba4acef3@reader.news.orange.fr> <4c739e76$0$6992$9b4e6d93@newsspool4.arcor-online.net> <152a2z5en4z2o$.xjsuqr7s8yak$.dlg@40tude.net> <4c73e859$0$6991$9b4e6d93@newsspool4.arcor-online.net> <4c73fcf6$0$6992$9b4e6d93@newsspool4.arcor-online.net> <1jxm50y65grlo.sjyb9hm4y1xp$.dlg@40tude.net> <4c743a59$0$6893$9b4e6d93@newsspool2.arcor-online.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Message-ID: <4c74db09$0$6890$9b4e6d93@newsspool2.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 25 Aug 2010 10:57:45 CEST NNTP-Posting-Host: 90163476.newsspool2.arcor-online.net X-Trace: DXC=:0PVORGn8jOFJ3]dH>I?oEA9EHlD;3YcB4Fo<]lROoRA8kFejVH9O5mUkGODIK@\B]:`MCHGJ X-Complaints-To: usenet-abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:13717 Date: 2010-08-25T10:57:45+02:00 List-Id: On 8/25/10 9:55 AM, Dmitry A. Kazakov wrote: >>> Does the wildcard pattern "R*" >> >> In what RE syntax? > > It is a wildcard pattern. Wildcards is the most frequently used pattern > language. Does "wildcard" include both Latin-xyz character � and UTF-8 �? Yes. Many Wildcards do. And it can be handled. See whether or not encoding matters in the following program. with GNAT.SPITBOL.Patterns; use GNAT.SPITBOL.Patterns; with Ada.Characters.Latin_1; use Ada.Characters.Latin_1; procedure Find_Ruecken (Text : String; Result : VString_Var) is In_UTF_8 : constant String := (Character'Val(16#c3#), Character'Val(16#bc#)); Ue : Pattern; begin Ue := (Any("Rr") & (In_UTF_8 or LC_U_Diaeresis) & "cken") ** Result; if not Match (Text, Ue) then raise Constraint_Error; end if; end Find_Ruecken; with GNAT.SPITBOL; use GNAT.SPITBOL; with Ada.Text_IO; with Find_Ruecken; procedure Test_Find_Ruecken is Found : VString; begin Find_Ruecken(Text => "Recken, die R�cken ohne R�ckgrat dr�cken", Result => Found); Ada.Text_IO.Put_Line ("Found """ & S(Found) & '"'); end Test_Find_Ruecken; >> > match "readme"? Does it match "R�cken", when >>> � is (16#c3#, 16#bc#) (UTF-8)? >> >> When the Pattern_Type is properly defined, there are no questions. > > How do define it properly? Does it match Latin-1's �, UTF-8's �, UTF-16's > �, UTF-32's �? Don't you get that it cannot be done without abstracting > *encoding* away? When the Pattern_Type is properly defined, there are no questions. Since I have to process a lot of text file and text streams of unknown encoding, I'm used to REs that just find "R�cken" in whatever encoding. That's called programming. Think of Google or Yahoo or Bing searching the WWW and tons of email ... There is no such thing as clean external data. That including file names. Georg