From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.107.155.66 with SMTP id d63mr1037983ioe.7.1514471309321; Thu, 28 Dec 2017 06:28:29 -0800 (PST) X-Received: by 10.157.88.6 with SMTP id r6mr413136oth.6.1514471309199; Thu, 28 Dec 2017 06:28:29 -0800 (PST) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!border1.nntp.ams1.giganews.com!nntp.giganews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.am4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!i6no3347491itb.0!news-out.google.com!b73ni12851ita.0!nntp.google.com!i6no3347488itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Thu, 28 Dec 2017 06:28:28 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=185.30.132.97; posting-account=hya6vwoAAADTA0O27Aq3u6Su3lQKpSMz NNTP-Posting-Host: 185.30.132.97 References: <0cc30dc8-4528-4e5c-91dd-24dfbe3cbcb2@googlegroups.com> <96764e4c-48df-4042-845e-12341149bc87@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <37c30172-9386-45fb-86d0-a10998fcade8@googlegroups.com> Subject: Re: When to use Bounded_String? From: vincent.diemunsch@gmail.com Injection-Date: Thu, 28 Dec 2017 14:28:29 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Received-Body-CRC: 901836974 X-Received-Bytes: 2746 Xref: reader02.eternal-september.org comp.lang.ada:49677 Date: 2017-12-28T06:28:28-08:00 List-Id: Le jeudi 28 d=C3=A9cembre 2017 13:00:46 UTC+1, Dmitry A. Kazakov a =C3=A9cr= it=C2=A0: > > Yes, they are really a great improvement. But they would be perfect if = : > > 1. they handled UTF-8 as the de-facto standard encoding, for strings. >=20 > You can ignore encoding and use them as if they were UTF-8 >=20 Sure. That's what is done, at least on Unixes (Linux and OSX). > > 2. they could see strings as sequences of 32-bits Unicode Code Points (= Wide_Wide_Characters). >=20 > 23 / 4 =3D 5 characters No. At least 5 characters if they are very complicated. But 23 ASCII=C2=A0C= haracters. The idea here is to decode the UTF-8 string to extract a character and give= it in Unicode in the most common format for integers : 32-bits. =20 The only limitation is that you would have sequential access to the string,= not random access as with the usual array of characters. But I really don'= t see the point of having a random access to the characters in a string ! > P.S. Just never copy strings if you have performance concerns (even if=20 > you have none). Nothing to optimize then. Use string slices, pass string= =20 > + an index to start at, do everything in a single pass, there is no=20 > reason to waste CPU time, memory and brain cells on "tokenizing". True. Except for storing the identifiers in a symbol table... Kind regards, Vincent