From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,da3af210412d89fd X-Google-Attributes: gid103376,public,usenet X-Google-Language: ENGLISH,ASCII Path: g2news2.google.com!news1.google.com!eweka.nl!lightspeed.eweka.nl!txtfeed2.tudelft.nl!tudelft.nl!binfeed1.tudelft.nl!kanaga.switch.ch!switch.ch!news.grnet.gr!newsfd02.forthnet.gr!not-for-mail From: Christos Chryssochoidis Newsgroups: comp.lang.ada Subject: Re: Range types Date: Mon, 22 Oct 2007 14:14:44 +0300 Organization: FORTHnet S.A., Atthidon 4, GR-17671 Kalithea, Greece, Tel: +30 2109559000, Fax: +30 2109559333, url: http://www.forthnet.gr Message-ID: <1193051690.350063@athprx04> References: <1192994157.867598@athprx04> <471bb318$0$27835$39db0f71@news.song.fi> <471BC497.5060601@gmail.com> NNTP-Posting-Host: athprx04.forthnet.gr Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Trace: athprx02.forthnet.gr 1193051690 27703 193.92.150.70 (22 Oct 2007 11:14:50 GMT) X-Complaints-To: abuse@forthnet.gr NNTP-Posting-Date: Mon, 22 Oct 2007 11:14:50 +0000 (UTC) User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) In-Reply-To: Cache-Post-Path: newsfd02!unknown@acro.ath.forthnet.gr Xref: g2news2.google.com comp.lang.ada:2539 Date: 2007-10-22T14:14:44+03:00 List-Id: Jacob Sparre Andersen wrote: > Christos Chryssochoidis wrote: > >> I would like to define a subtype of Wide_Character for a program >> that processes (unicode) text. This type would represent the Greek >> letters. > > This sounds like what enumerated types are for. You could do it like > this: > > type Faroese_Letter is ('a', 'A', 'b', 'B', 'd', 'D', '�', '�', > 'e', 'E', [...], > 'y', 'Y', '�', '�', '�', '�', '�', '�'); > -- optional representation clause > > function To_Wide_Wide_Character (Item : in Faroese_Letter) > return Wide_Wide_Character; > > function To_Faroese_Letter (Item : in Wide_Wide_Character) > return Faroese_Letter; > > The conversion functions could make use of representation clauses, > "Image" and "Value" functions, or tables. > >> Greek letters in Unicode, with all their diacritics, are >> located in two separate ranges: 0370 - 03D7 and 1F00 - 1FFF. That's >> 360 characters to write in an enumeration... Since gaps are not >> allowed in ranges, I 'm thinking instead of defining such a type, to >> define a function that would accept a Wide_Character as argument and >> return a boolean value indicating whether the given Wide_Character >> falls in the ranges of the Greek characters. > > This could be done very simply using Ada.Strings.Maps. > > How you should do it depends strongly on what you actually need your > Greek_Letter type for. > > Greetings, > > Jacob Thanks! Ada.Strings.Wide_Maps seems very helpful for what I want to do. Basically, what I would like to do is to write a program that given a text file in utf8 encoding, which would contain ancient greek text, which is written with all the diacritic marks on the letters, this program would load the contents of the file in memory, strip the in-memory text contents from all the diacritics except those used in today's "modern" Greek, and write the modified contents to a new file of the user's choosing. For this it would be nice if there were some package for regular expressions for Ada. Then if I succeeded in the mentioned task, I 'd like to do some natural language processing (NLP, that is linguistics processing) with my program, but I don't know if Ada would be an appropriate language for such a task (NLP). I've seen on the web references to NLP applications with functional languages or logic programming languages, but not many implemented with imperative languages... (Sorry for getting of topic...) Thanks very much, Christos