From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 2002:a6b:2306:: with SMTP id
 j6-v6mr1325647ioj.51.1530734000220;
        Wed, 04 Jul 2018 12:53:20 -0700 (PDT)
X-Received: by 2002:a54:4e94:: with SMTP id c20-v6mr658760oiy.5.1530734000026;
 Wed, 04 Jul 2018 12:53:20 -0700 (PDT)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.linkpendium.com!news.linkpendium.com!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!u78-v6no1978141itb.0!news-out.google.com!l67-v6ni1905itl.0!nntp.google.com!u78-v6no1978138itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Wed, 4 Jul 2018 12:53:19 -0700 (PDT)
In-Reply-To: <phj5b0$b0n$1@gioia.aioe.org>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=76.113.16.86;
 posting-account=lJ3JNwoAAAAQfH3VV9vttJLkThaxtTfC
NNTP-Posting-Host: 76.113.16.86
References: <70c11a71-3832-4f57-8127-f3f1c48a052f@googlegroups.com>
 <887212304.552080112.848502.laguest-archeia.com@nntp.aioe.org>
 <87muvan83x.fsf@adaheads.home> <ly4lhiafs8.fsf@pushface.org>
 <1449870001.552246132.581310.laguest-archeia.com@nntp.aioe.org>
 <lyzhz98lvh.fsf@pushface.org>
 <b0d7482d-3c02-4e0b-8720-58ee5b65af03@googlegroups.com>
 <phg0h7$10dd$1@gioia.aioe.org>
 <c980d621-6d5d-4a23-8005-733bb024285d@googlegroups.com>
 <phg5nk$1a46$1@gioia.aioe.org> <phg6cg$1ba2$1@gioia.aioe.org>
 <bd52280b-662a-49b3-891d-e39044e2bf32@googlegroups.com>
 <phg8lo$1fnq$1@gioia.aioe.org>
 <phht7f$1vj2$1@gioia.aioe.org> <phhuei$1v6$1@gioia.aioe.org>
 <phi5hp$fbv$1@gioia.aioe.org> <phi5t5$g0q$1@gioia.aioe.org>
 <phib5i$pt9$1@gioia.aioe.org> <phii0m$1698$1@gioia.aioe.org>
 <d35454dc-f982-49d7-b727-45a9cc69822b@googlegroups.com>
 <5de5f768-40bf-4518-a647-22788658de74@googlegroups.com>
 <64454862-b293-4ed7-9c3e-c8a1252344db@googlegroups.com>
 <0ebf920a-61fa-47e8-a34f-54da2e143bb6@googlegroups.com>
 <phj5b0$b0n$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c95a071c-3555-4ac0-abb3-75a4904676ea@googlegroups.com>
Subject: Re: Strange crash on custom iterator
From: Shark8 <onewingedshark@gmail.com>
Injection-Date: Wed, 04 Jul 2018 19:53:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Xref: reader02.eternal-september.org comp.lang.ada:53612
Date: 2018-07-04T12:53:19-07:00
List-Id: <comp.lang.ada>

On Wednesday, July 4, 2018 at 12:57:40 PM UTC-6, Dmitry A. Kazakov wrote:
> On 2018-07-04 20:01, Shark8 wrote:
> > On Wednesday, July 4, 2018 at 10:55:08 AM UTC-6, Dan'l Miller wrote:
>=20
> > As an example, the sentence "The Hebrew word for 'man' is '=D7=90=D7=93=
=D7=9D' (Adam)."  is *NOT* merely a sequence of graphemes, codepoints, and/=
or bytes. It is a semantically meaningful text consisting of multiple langu=
ages... and *this* is what Unicode discards.
>=20
> And rightly so. Like 91093835.6 is just a number instead "meaningful":=20
> the mass of a stationary electron.
>=20
> One fundamental principle of software design is abstraction in the sense=
=20
> of throwing away unnecessary information. A printer may know nothing=20
> about Hebrew.

Interesting how you're ready, willing and able to conflate all portions of =
data-storage/-management into a single operation: printing.

But let's take a step backward; what about displaying the text? One certain=
ly could argue that Unicode is a good solution in this arena, after all hav=
ng the ability to encode all of human language is it's stated design-goal, =
so surely it must be well-suited to that, right?

Not really.
The mere fact of "combining characters" makes unicode no more suited to tex=
tual display than a sort of hypothetical Forth/PostScript where each word/t=
oken/character is processed by the display driver and rendered appropriatel=
y. (The aforementioned Lisp-like structure being executed is the procedure:=
 "painting/displaying the english text, switch-to-hebrew, print/display heb=
rew text, switch-to-english, print/display english text", which of course c=
an be further decomposed to "print 'T' [horizontal-stroke, vertical stroke]=
 print 'h' [vertical stroke, curved stroke, vertical stroke] print 'e' [hor=
izontal stroke, curved stroke] ....")

This is the essential idea behind PostScript printers; and it works well. (=
The same/analogous procedure must be executed in SW and transmitted to the =
printer in non-PostScript printers; usually using some proprietary printer-=
control-language, which is essentially what printer-drivers *ARE*.)

So, even working backward from your example of printing, where you claim th=
at "knowledge of Hebrew is unneeded" is... well dubious. It's certainly nee=
ded  somewhere along the line for this example. My contention that "sequenc=
e of codepoints + font" is flatly stupid for a multi-language system.

Arguably it's stupid for a single-language system, too. As an example we co=
uld use paths: "root\projects\x\source" is flatly moronic*, and we can see =
this by how it pops up in multi-platform development: should there be a ter=
minal '\'? do those '\' characters need escaped? do they need to be replace=
d with '/'? What we have is a sequence (root, projects, x, source) which co=
rresponds to a path down a tree, but the common "industry practice" is to t=
hink of this as a string of characters and "parse/reparse/regex/reparse/wha=
tever" textual manipulations to read what the structure is rather than sens=
ibly save the structural information.

* forced upon us by stupid, thin APIs to the OS.