From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,5bcc293dc5642650 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.11.199 with SMTP id s7mr1748641pbb.5.1318924477444; Tue, 18 Oct 2011 00:54:37 -0700 (PDT) Path: d5ni25917pbc.0!nntp.google.com!news2.google.com!goblin2!goblin.stu.neva.ru!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Why no Ada.Wide_Directories? Date: Tue, 18 Oct 2011 09:55:07 +0200 Organization: cbb software GmbH Message-ID: <1tggwi1yicf5z.1q3xra9r00oyb$.dlg@40tude.net> References: <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32> <418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com> <7156122c-b63f-487e-ad1b-0edcc6694a7a@u10g2000prl.googlegroups.com> Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: FbOMkhMtVLVmu7IwBnt1tw.user.speranza.aioe.org Mime-Version: 1.0 X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 Xref: news2.google.com comp.lang.ada:14025 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Date: 2011-10-18T09:55:07+02:00 List-Id: On Mon, 17 Oct 2011 18:10:35 -0700 (PDT), Adam Beneschan wrote: > I have a feeling you're fundamentally confused about what UTF-8 is, as > compared to "Latin-1". Latin-1 is a character mapping. It defines, > for all integers in the range 0..255, what character that integer > represents (e.g. 77 represents 'M', etc.). Unicode is a character > mapping that defines characters for a much larger integer range. No, Unicode is a standard describes character mappings. Both UTF-8 and Latin-1 are encodings. Latin-1 as an encoding has a property that there is 1-1 octet to code point correspondence, at the cost that some (most) of code points cannot be represented by the encoding. UTF-8 lacks this property, but is capable to represent all code points. > Because of this, it is not feasible to work with strings or characters > in UTF-8 encoding. Suppose you declare a string > > S : String (1 .. 100); > > but you want it to be a UTF-8 string. How would that work? If you > want to look at S(50), the computer would have to start at the > beginning of the string and figure out whether each character is > represented as 1 or 2 bytes. Nobody wants that. Nobody actually cares, because strings are not processed that way. String indices are obtained in the course of operations which keep them at the beginnings of properly encoded code points. It is a language problem to distinguish index (some index type) and position (cardinal number). Ada does this BTW. When you write S(50), what is 50 here? 50th character (code point) counting from the beginning of the string or the index 50 of a character which position is unknown without looking into the string? Considering the declaration of String, it is not clear if Positive is a position or proper index. For the latter S(50) just does is not read as "50th character". Furthermore it is not guaranteed that of 50 is a valid index then 51 is valid too. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de