From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 2002:a24:52ca:: with SMTP id
 d193-v6mr1204912itb.48.1530811199414;
        Thu, 05 Jul 2018 10:19:59 -0700 (PDT)
X-Received: by 2002:aca:c744:: with SMTP id
 x65-v6mr1501552oif.2.1530811199243;
 Thu, 05 Jul 2018 10:19:59 -0700 (PDT)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.uzoreto.com!newsreader5.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.am4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!u78-v6no2796439itb.0!news-out.google.com!l67-v6ni2881itl.0!nntp.google.com!u78-v6no2796436itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Thu, 5 Jul 2018 10:19:58 -0700 (PDT)
In-Reply-To: <1064fcce-1b1c-4672-bd4e-9eb93c3a0240@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=47.185.195.62;
 posting-account=zwxLlwoAAAChLBU7oraRzNDnqQYkYbpo
NNTP-Posting-Host: 47.185.195.62
References: <c980d621-6d5d-4a23-8005-733bb024285d@googlegroups.com>
 <phg5nk$1a46$1@gioia.aioe.org> <phg6cg$1ba2$1@gioia.aioe.org>
 <bd52280b-662a-49b3-891d-e39044e2bf32@googlegroups.com>
 <phg8lo$1fnq$1@gioia.aioe.org>
 <phht7f$1vj2$1@gioia.aioe.org> <phhuei$1v6$1@gioia.aioe.org>
 <phi5hp$fbv$1@gioia.aioe.org> <phi5t5$g0q$1@gioia.aioe.org>
 <phib5i$pt9$1@gioia.aioe.org> <phii0m$1698$1@gioia.aioe.org>
 <d35454dc-f982-49d7-b727-45a9cc69822b@googlegroups.com>
 <5de5f768-40bf-4518-a647-22788658de74@googlegroups.com>
 <64454862-b293-4ed7-9c3e-c8a1252344db@googlegroups.com>
 <0ebf920a-61fa-47e8-a34f-54da2e143bb6@googlegroups.com>
 <phj5b0$b0n$1@gioia.aioe.org>
 <c95a071c-3555-4ac0-abb3-75a4904676ea@googlegroups.com>
 <6af9d974-b2b4-4ab9-82e6-690ffaee2901@googlegroups.com>
 <ecee1aa4-330a-4b3c-9566-030c1c460b8b@googlegroups.com>
 <795161eb-b58c-4146-9721-9b553039868a@googlegroups.com>
 <f8872879-623f-4712-8600-9f6ce8360c24@googlegroups.com>
 <176034645.552448963.078419.laguest-archeia.com@nntp.aioe.org>
 <1064fcce-1b1c-4672-bd4e-9eb93c3a0240@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dea073e8-d8ad-4772-b649-04499ada9b76@googlegroups.com>
Subject: Re: Strange crash on custom iterator
From: "Dan'l Miller" <optikos@verizon.net>
Injection-Date: Thu, 05 Jul 2018 17:19:59 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6142
X-Received-Body-CRC: 1771245760
Xref: reader02.eternal-september.org comp.lang.ada:53656
Date: 2018-07-05T10:19:58-07:00
List-Id: <comp.lang.ada>

On Thursday, July 5, 2018 at 11:47:33 AM UTC-5, Shark8 wrote:
> On Wednesday, July 4, 2018 at 8:07:56 PM UTC-6, Luke A. Guest wrote:
> > Shark8 wrote:
> >=20
> > >> Shark8, what would be the better solution for character-encoding its=
elf?
> > >>  (not whole words)
> > >=20
> > > Whole-word isn't a terrible idea, per se. But the thrust I was gettin=
g at
> > > is the delination between languages: with Unicode it's a sequence of
> > > codepoints, independent of the actual item (word, sentence, etc) othe=
r
> > > than [perhaps] graphic-presented. That the example is (Eng,Eng,Eng...=
Eng,
> > > Heb,Heb,Heb,Heb, Eng,Eng,Eng...) codepoints is not the problem, thoug=
h
> > > related, because it discards all information in favor of (num, num, n=
um,
> > > num, ...) rather than actually considering alternate languages: IMO,
> > > ("The Hebrew word for man" (quote ADAM) (quote "Adam") ".") is much
> > > better as 'text' because we're preserving structure: [ENGLISH [THIS
> > > SECTION HEBREW] ENGLISH].
> > >=20
> >=20
> > I don=E2=80=99t understand why you think Unicode should carry linguisti=
c
> > information when all it has ever been designed to do is encode symbols
> > across all languages and their direction.
>=20
> I'm not saying that "Unicode should" do *anything* -- I'm saying Unicode =
solves *the wrong problem*.
>=20
> "Encoding symbols" ties everything to a stupidly primitive level, forcing=
 everything to such lowest
> common denominator so as to apply "the unix way" processing to text: disc=
ard all structural information,
> all semantic information, and have "some tool" regenerate it later... jus=
t like "the unix way" discards
> type-information in favor of forcing ad-hoc parsing on unstructured-text =
at every step between it's
> "small tools" connected together with 'pipes'.

At some level I could conceivably agree with you in principle that a strict=
ly-linear sequence of unadorned symbols is too low-level is some designs to=
 be useful.  For example, there was a time in the 1970s through early 1980s=
 when Texas Instruments microprocessors excessively modeled a Turing machin=
e's tapes (dual-tape model).  No one nowadays would think that a processor =
should be strictly & intentionally designed to overtly model a Turing machi=
ne directly right down to the linear streams/tapes of symbols.

Unicode/ISO10646 is asinine in its insistence on a sequence of =E2=80=A2mul=
tiple=E2=80=A2 codepoints being the =E2=80=A2=E2=80=A2shortest possible=E2=
=80=A2=E2=80=A2 representation of some individual letter in some natural la=
nguage.  Programmers want one-letter-one-codepoint representation in all la=
nguages=E2=80=94not some Turing-machine tape to process sequentially statef=
ully, as Unicode demands even in its 32-bit UCS4 or UTF-32 representations.=
  Programmers don't want any =E2=80=9Cwell, yeah but =E2=80=A6=E2=80=9D sit=
uations at all when they just finished executing the fully-normalize-all-th=
e-codepoints-in-this-string subprogram (but that =E2=80=9Cwell yeah but =E2=
=80=A6=E2=80=9D is the world we suffer in with Unicode/ISO10646 as currentl=
y defined).

But, Shark8, you seem to criticizing something a little different than that=
.  In some alternate universe where Unicode or ISO10646 transpired entirely=
 differently, what would Unicode-done-right* look like, especially w.r.t. A=
da strings.  It seems that you are alluding to some sort of multiple-strand=
 string or something like that (not merely allocating the billion nonBMP co=
depoints better so that we would have a one-letter-one-codepoint axiom).=20

* Yeah, I know, in Unicode done right, there wouldn't be any Unicode or ISO=
10646 at all, but what would there be instead and what would the strawman l=
ook like at all in Ada?