From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a6b:2306:: with SMTP id j6-v6mr1325647ioj.51.1530734000220; Wed, 04 Jul 2018 12:53:20 -0700 (PDT) X-Received: by 2002:a54:4e94:: with SMTP id c20-v6mr658760oiy.5.1530734000026; Wed, 04 Jul 2018 12:53:20 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.linkpendium.com!news.linkpendium.com!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!u78-v6no1978141itb.0!news-out.google.com!l67-v6ni1905itl.0!nntp.google.com!u78-v6no1978138itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Wed, 4 Jul 2018 12:53:19 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=76.113.16.86; posting-account=lJ3JNwoAAAAQfH3VV9vttJLkThaxtTfC NNTP-Posting-Host: 76.113.16.86 References: <70c11a71-3832-4f57-8127-f3f1c48a052f@googlegroups.com> <887212304.552080112.848502.laguest-archeia.com@nntp.aioe.org> <87muvan83x.fsf@adaheads.home> <1449870001.552246132.581310.laguest-archeia.com@nntp.aioe.org> <5de5f768-40bf-4518-a647-22788658de74@googlegroups.com> <64454862-b293-4ed7-9c3e-c8a1252344db@googlegroups.com> <0ebf920a-61fa-47e8-a34f-54da2e143bb6@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Strange crash on custom iterator From: Shark8 Injection-Date: Wed, 04 Jul 2018 19:53:20 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: reader02.eternal-september.org comp.lang.ada:53612 Date: 2018-07-04T12:53:19-07:00 List-Id: On Wednesday, July 4, 2018 at 12:57:40 PM UTC-6, Dmitry A. Kazakov wrote: > On 2018-07-04 20:01, Shark8 wrote: > > On Wednesday, July 4, 2018 at 10:55:08 AM UTC-6, Dan'l Miller wrote: >=20 > > As an example, the sentence "The Hebrew word for 'man' is '=D7=90=D7=93= =D7=9D' (Adam)." is *NOT* merely a sequence of graphemes, codepoints, and/= or bytes. It is a semantically meaningful text consisting of multiple langu= ages... and *this* is what Unicode discards. >=20 > And rightly so. Like 91093835.6 is just a number instead "meaningful":=20 > the mass of a stationary electron. >=20 > One fundamental principle of software design is abstraction in the sense= =20 > of throwing away unnecessary information. A printer may know nothing=20 > about Hebrew. Interesting how you're ready, willing and able to conflate all portions of = data-storage/-management into a single operation: printing. But let's take a step backward; what about displaying the text? One certain= ly could argue that Unicode is a good solution in this arena, after all hav= ng the ability to encode all of human language is it's stated design-goal, = so surely it must be well-suited to that, right? Not really. The mere fact of "combining characters" makes unicode no more suited to tex= tual display than a sort of hypothetical Forth/PostScript where each word/t= oken/character is processed by the display driver and rendered appropriatel= y. (The aforementioned Lisp-like structure being executed is the procedure:= "painting/displaying the english text, switch-to-hebrew, print/display heb= rew text, switch-to-english, print/display english text", which of course c= an be further decomposed to "print 'T' [horizontal-stroke, vertical stroke]= print 'h' [vertical stroke, curved stroke, vertical stroke] print 'e' [hor= izontal stroke, curved stroke] ....") This is the essential idea behind PostScript printers; and it works well. (= The same/analogous procedure must be executed in SW and transmitted to the = printer in non-PostScript printers; usually using some proprietary printer-= control-language, which is essentially what printer-drivers *ARE*.) So, even working backward from your example of printing, where you claim th= at "knowledge of Hebrew is unneeded" is... well dubious. It's certainly nee= ded somewhere along the line for this example. My contention that "sequenc= e of codepoints + font" is flatly stupid for a multi-language system. Arguably it's stupid for a single-language system, too. As an example we co= uld use paths: "root\projects\x\source" is flatly moronic*, and we can see = this by how it pops up in multi-platform development: should there be a ter= minal '\'? do those '\' characters need escaped? do they need to be replace= d with '/'? What we have is a sequence (root, projects, x, source) which co= rresponds to a path down a tree, but the common "industry practice" is to t= hink of this as a string of characters and "parse/reparse/regex/reparse/wha= tever" textual manipulations to read what the structure is rather than sens= ibly save the structural information. * forced upon us by stupid, thin APIs to the OS.