comp.lang.ada
 help / color / mirror / Atom feed
From: David Starner <dvdeug@email.ro>
Subject: Re: Supporting full Unicode
Date: Wed, 12 May 2004 18:55:54 GMT
Date: 2004-05-12T18:55:54+00:00	[thread overview]
Message-ID: <pan.2004.05.12.18.40.52.452322@email.ro> (raw)
In-Reply-To: DTqoc.92307$dP1.289702@newsc.telia.net

On Wed, 12 May 2004 14:53:23 +0000, Bj�rn Persson wrote:

> Looks troublesome, eh? For UTF-8 I don't think it's even possible to 
> define such a type. I'd rather just define UTF-16 and UTF-8 strings as 
> byte sequences and represent even single characters as strings.

Why would encode UTF-16 as a byte sequences when they could encode it as
a series of words? You can't use UTF-16 internally as byte sequences
without worry about byte-order marks, because UTF-16 is constructively
ambigious as to whether it's big-endian or little-endian. Anything you
defined would either be UTF-16BE or UTF-16LE, and spend a lot of time
reassembling character pieces on the wrong endian architecture. UTF-16
should usually be encoded as words.

As for characters, they're not much use with Unicode. Even with Latin-1,
you can't uppercase a character to character, and any system that does
it is wrong, include Ada. The German esszett (�) uppercases to SS. You
can't even hold a whole "character" in a character in Unicode, because
you can't fit any attached combining characters in. 



  reply	other threads:[~2004-05-12 18:55 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-11 17:45 Supporting full Unicode Brian Catlin
2004-05-12  7:44 ` Ludovic Brenta
2004-05-12  8:23   ` Marius Amado Alves
2004-05-12 10:43     ` Martin Krischik
2004-05-12 14:56       ` Björn Persson
2004-05-12 19:09       ` David Starner
2004-05-12 19:25     ` David Starner
2004-05-12  9:41   ` David Starner
2004-05-12 10:16     ` Björn Persson
2004-05-12 10:57       ` Ludovic Brenta
2004-05-12 14:53         ` Björn Persson
2004-05-12 18:55           ` David Starner [this message]
2004-05-12  9:30 ` Martin Krischik
2004-05-13  1:15 ` Randy Brukardt
2004-05-13 17:58   ` Brian Catlin
2004-05-13 19:42     ` Randy Brukardt
2004-05-14  8:40       ` Andersen Jacob Sparre
2004-05-14 20:20         ` Randy Brukardt
2004-05-14  4:00 ` Vadim Godunko
2004-05-14 17:51   ` Brian Catlin
  -- strict thread matches above, loose matches on Subject: below --
2004-05-12 12:40 amado.alves
2004-05-12 14:34 ` Martin Krischik
2004-05-12 18:24   ` David Starner
2004-05-12 20:04   ` Florian Weimer
2004-05-12 14:12 amado.alves
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox