From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!nntp-feed.chiark.greenend.org.uk!ewrotcd!newsfeed.xs3.de!io.xs3.de!news.jacob-sparre.dk!franka.jacob-sparre.dk!pnx.dk!.POSTED.109.59.4.58.mobile.3.dk!not-for-mail From: Jacob Sparre Andersen Newsgroups: comp.lang.ada Subject: Re: Community Input for the Maintenance and Revision of the Ada Programming Language Date: Thu, 31 Aug 2017 16:09:04 +0200 Organization: JSA Research & Innovation Message-ID: <87pobbakxr.fsf@jacob-sparre.dk> References: <79e06550-67d7-45b3-88f8-b7b3980ecb20@googlegroups.com> <9d4bc8aa-cc44-4c30-8385-af0d29d49b36@googlegroups.com> <1395655516.524005222.638450.laguest-archeia.com@nntp.aioe.org> <4527d955-a6fe-4782-beea-e59c3bb69f21@googlegroups.com> <22c5d2f4-6b96-4474-936c-024fdbed6ac7@googlegroups.com> <1919594098.524164165.354468.laguest-archeia.com@nntp.aioe.org> <85d4930c-d4dc-4e4f-af7a-fd7c213b8290@googlegroups.com> <725b229b-f768-4603-b564-4751e5e7136f@googlegroups.com> <87ziag9ois.fsf@jacob-sparre.dk> <87val3aoly.fsf@jacob-sparre.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: franka.jacob-sparre.dk; posting-host="109.59.4.58.mobile.3.dk:109.59.4.58"; logging-data="14609"; mail-complaints-to="news@jacob-sparre.dk" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) Cancel-Lock: sha1:49zt5oTdJiC1P8k01inkUC2k5Io= Xref: news.eternal-september.org comp.lang.ada:47857 Date: 2017-08-31T16:09:04+02:00 List-Id: Dmitry A. Kazakov wrote: > Not really. E.g. parsing is done in octets for obvious reasons. That > was the reason why UTF-8 was designed this way. What obvious reasons? Performance? As I see it, there is nothing wrong with reading a sequence of octets containing an UTF-8 encoded string, mapping it to the internal encoding, and *then* parse the text. It may be slightly slower, but it allows for modularity, so you easily can swap in a decoder for a different encoding. > What is the use of a string type without literals? The point is that you shouldn't ever treat an encoded string as a string. If you need to treat it as a string, you map it to Standard.String, and do what you have to do. > It is all about having an ability to choose a representation > (encoding) rather than getting it enforced upon you by the > language. The whole point is that enforcing a single internal representation simplifies things. Encoding of characters is purely an interfacing/serialization issue. It isn't something the programmer should have to worry about when not interfacing. > It is no solution if you simply create yet another type with > the required representation losing the original type's interface and > forced to convert forth and back between two types all over the > place. Not all over the place. Only where you need to (de)serialize the strings. > Many, if not most, applications never care about code points. They usually do. They just tend to call them "characters". Greetings, Jacob -- "Friends don't let friends program in C++." -- Ludovic Brenta.