From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "J-P. Rosen" Newsgroups: comp.lang.ada Subject: Re: Strange crash on custom iterator Date: Wed, 4 Jul 2018 11:55:06 +0200 Organization: Adalog Message-ID: References: <70c11a71-3832-4f57-8127-f3f1c48a052f@googlegroups.com> <62e38ee4-f72f-4ed8-bef1-952040fb7f8d@googlegroups.com> <64d8b4a1-a92c-4b90-b95c-e821749de969@googlegroups.com> <887212304.552080112.848502.laguest-archeia.com@nntp.aioe.org> <87muvan83x.fsf@adaheads.home> <1449870001.552246132.581310.laguest-archeia.com@nntp.aioe.org> NNTP-Posting-Host: vtydEJu0RziDZHka7ZZ6bg.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 Content-Language: fr Openpgp: preference=signencrypt X-Notice: Filtered by postfilter v. 0.8.3 Xref: reader02.eternal-september.org comp.lang.ada:53568 Date: 2018-07-04T11:55:06+02:00 List-Id: Le 04/07/2018 à 09:53, Dmitry A. Kazakov a écrit : > On 2018-07-04 09:33, J-P. Rosen wrote: >> Le 03/07/2018 à 18:36, Dmitry A. Kazakov a écrit : >>> E.g. UTF8_String and String must share interfaces but have >>> different representations. >> No. UTF_8 is useful only for IOs, as soon as you want to use a UTF >> string, you need to convert it to a Wide_String. > > I cannot. Wide_String is UCS-2 which is not full Unicode. For most purposes, Wide_String is sufficient, unless you really need to support emojis or ancient chinese. In those cases, decode to Wide_Wide_String, no problem. > Anyway, whatever conversion of representations needed it must be > transparent to the user. > >> Why? Because even the simplest operation (Length, Indexing) are >> O(N) and are mostly equivalent to decoding the whole string. > > Premature optimization, huh? And you still need UTF-8 string type > even if you are going to convert it to something else. Back to the > square one, how to design an UTF-8 string type? > Choosing a representation that allows a more efficient algorithm is proper design, not premature optimization. And the point is that when you receive a string, you don't know before looking at the BOM (or other recognition techniques) whether the octets you received are pure Latin-1 or UTF_8 encoded. So you need to store it in a plain String. We discussed that point, and the agreement was that making a different type would force the user to many conversions that would bring nothing but trouble, and make Ada once again look impractical out of excessive purism. -- J-P. Rosen Adalog 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 http://www.adalog.fr