From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!matrix.darkstorm.co.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!fu-berlin.de!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail Date: Fri, 06 Dec 2013 03:17:02 +0100 From: Georg Bauhaus User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: strange behaviour of utf-8 files References: <73e0853b-454a-467f-9dc7-84ca5b9c29b2@googlegroups.com> <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net> <5bf1b290-70bc-4240-b27c-120ce6b0b840@googlegroups.com> <7464679c-6b98-4e23-a337-83b671473553@googlegroups.com> <672ce4f6-8c65-43b5-b04b-a7b858205af8@googlegroups.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Message-ID: <52a1339e$0$9505$9b4e6d93@newsspool1.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 06 Dec 2013 03:17:02 CET NNTP-Posting-Host: f94f014b.newsspool1.arcor-online.net X-Trace: DXC=T4UI\_==hJ\<<0iRN7DLEQic==]BZ:af^4Fo<]lROoRQnkgeX?EC@@P8WEnDiKGIRRPCY\c7>ejVX>Ra@5[>jGjYZO[dR^6_JVR X-Complaints-To: usenet-abuse@arcor.de Xref: news.eternal-september.org comp.lang.ada:17856 Date: 2013-12-06T03:17:02+01:00 List-Id: On 23.11.13 05:14, Randy Brukardt wrote: > "Shark8" wrote >> Not a lot of demand for UTF-8, or not a lot of demand for Ada-2012 [from >> the customers]? > > Not a lot of demand for UTF-8 or wide characters in general. As far as Ada > 2012 goes, if I want to use a feature, it somehow gets in the compiler. :-) > Customer demand not required (but it always helps). Actually, programmers seem to suppress existing demand. Equating "customers" to "consumers" of software for the moment (who pays?), customers suffer from ASCII-fied communication in ways that would not be accepted if written on paper. I got a terribly malformed computer generated messages from no lesser company than DHL (inspiring this follow up). "??" in the mail text quoted below has obviously been put in place of what was perfectly UTF-8 encoded character data. (In the mail's source text, to be sure.) The non-ASCII character is 'ü' (16#FC#) in both cases (L.8, L.10): +======================================================================+ Subject: Ihre Sendung wurde in eine FILIALE umgeleitet MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 7bit Guten Tag Herr Georg Bauhaus, leider konnte Ihre Sendung NICHT in die gew??nschte PACKSTATION eingestellt werden. Die Sendung liegt f??r Sie in der FILIALE (...) +======================================================================+ Ironically, the messages are produced using an industry standard Java framework while Java's char data are not 7bit ASCII: Message-ID: <...48667.JavaMail.ypqbson@HANPQ021> These messages used to be o.K. in the past. Judging by the count of excess spaces and long and empty lines in the message, I guess they are having some competitive programming shop streamline their software. Character set support can be a real issue when the use of ASCII leads to misprints of addresses, or to ambiguity in legal documents. Consider families Joseph Müller (16#FC#) and Joseph Möller (16#F6#) each owning a flat in the same house. If rendered Fam. Joseph M??ller X Str. 15 ... and Fam. Joseph M??ller X Str. 15 ... respectively, what is the postman to do? Proper support for encoding all characters is a necessity!