From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a24:52ca:: with SMTP id d193-v6mr1204912itb.48.1530811199414; Thu, 05 Jul 2018 10:19:59 -0700 (PDT) X-Received: by 2002:aca:c744:: with SMTP id x65-v6mr1501552oif.2.1530811199243; Thu, 05 Jul 2018 10:19:59 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.uzoreto.com!newsreader5.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.am4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!u78-v6no2796439itb.0!news-out.google.com!l67-v6ni2881itl.0!nntp.google.com!u78-v6no2796436itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Thu, 5 Jul 2018 10:19:58 -0700 (PDT) In-Reply-To: <1064fcce-1b1c-4672-bd4e-9eb93c3a0240@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=47.185.195.62; posting-account=zwxLlwoAAAChLBU7oraRzNDnqQYkYbpo NNTP-Posting-Host: 47.185.195.62 References: <5de5f768-40bf-4518-a647-22788658de74@googlegroups.com> <64454862-b293-4ed7-9c3e-c8a1252344db@googlegroups.com> <0ebf920a-61fa-47e8-a34f-54da2e143bb6@googlegroups.com> <6af9d974-b2b4-4ab9-82e6-690ffaee2901@googlegroups.com> <795161eb-b58c-4146-9721-9b553039868a@googlegroups.com> <176034645.552448963.078419.laguest-archeia.com@nntp.aioe.org> <1064fcce-1b1c-4672-bd4e-9eb93c3a0240@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Strange crash on custom iterator From: "Dan'l Miller" Injection-Date: Thu, 05 Jul 2018 17:19:59 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Received-Bytes: 6142 X-Received-Body-CRC: 1771245760 Xref: reader02.eternal-september.org comp.lang.ada:53656 Date: 2018-07-05T10:19:58-07:00 List-Id: On Thursday, July 5, 2018 at 11:47:33 AM UTC-5, Shark8 wrote: > On Wednesday, July 4, 2018 at 8:07:56 PM UTC-6, Luke A. Guest wrote: > > Shark8 wrote: > >=20 > > >> Shark8, what would be the better solution for character-encoding its= elf? > > >> (not whole words) > > >=20 > > > Whole-word isn't a terrible idea, per se. But the thrust I was gettin= g at > > > is the delination between languages: with Unicode it's a sequence of > > > codepoints, independent of the actual item (word, sentence, etc) othe= r > > > than [perhaps] graphic-presented. That the example is (Eng,Eng,Eng...= Eng, > > > Heb,Heb,Heb,Heb, Eng,Eng,Eng...) codepoints is not the problem, thoug= h > > > related, because it discards all information in favor of (num, num, n= um, > > > num, ...) rather than actually considering alternate languages: IMO, > > > ("The Hebrew word for man" (quote ADAM) (quote "Adam") ".") is much > > > better as 'text' because we're preserving structure: [ENGLISH [THIS > > > SECTION HEBREW] ENGLISH]. > > >=20 > >=20 > > I don=E2=80=99t understand why you think Unicode should carry linguisti= c > > information when all it has ever been designed to do is encode symbols > > across all languages and their direction. >=20 > I'm not saying that "Unicode should" do *anything* -- I'm saying Unicode = solves *the wrong problem*. >=20 > "Encoding symbols" ties everything to a stupidly primitive level, forcing= everything to such lowest > common denominator so as to apply "the unix way" processing to text: disc= ard all structural information, > all semantic information, and have "some tool" regenerate it later... jus= t like "the unix way" discards > type-information in favor of forcing ad-hoc parsing on unstructured-text = at every step between it's > "small tools" connected together with 'pipes'. At some level I could conceivably agree with you in principle that a strict= ly-linear sequence of unadorned symbols is too low-level is some designs to= be useful. For example, there was a time in the 1970s through early 1980s= when Texas Instruments microprocessors excessively modeled a Turing machin= e's tapes (dual-tape model). No one nowadays would think that a processor = should be strictly & intentionally designed to overtly model a Turing machi= ne directly right down to the linear streams/tapes of symbols. Unicode/ISO10646 is asinine in its insistence on a sequence of =E2=80=A2mul= tiple=E2=80=A2 codepoints being the =E2=80=A2=E2=80=A2shortest possible=E2= =80=A2=E2=80=A2 representation of some individual letter in some natural la= nguage. Programmers want one-letter-one-codepoint representation in all la= nguages=E2=80=94not some Turing-machine tape to process sequentially statef= ully, as Unicode demands even in its 32-bit UCS4 or UTF-32 representations.= Programmers don't want any =E2=80=9Cwell, yeah but =E2=80=A6=E2=80=9D sit= uations at all when they just finished executing the fully-normalize-all-th= e-codepoints-in-this-string subprogram (but that =E2=80=9Cwell yeah but =E2= =80=A6=E2=80=9D is the world we suffer in with Unicode/ISO10646 as currentl= y defined). But, Shark8, you seem to criticizing something a little different than that= . In some alternate universe where Unicode or ISO10646 transpired entirely= differently, what would Unicode-done-right* look like, especially w.r.t. A= da strings. It seems that you are alluding to some sort of multiple-strand= string or something like that (not merely allocating the billion nonBMP co= depoints better so that we would have a one-letter-one-codepoint axiom).=20 * Yeah, I know, in Unicode done right, there wouldn't be any Unicode or ISO= 10646 at all, but what would there be instead and what would the strawman l= ook like at all in Ada?