comp.lang.ada
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: Checking to see if a string is a letter
  @ 2012-04-03 13:46  7%           ` Dmitry A. Kazakov
  0 siblings, 0 replies; 2+ results
From: Dmitry A. Kazakov @ 2012-04-03 13:46 UTC (permalink / raw)


On Tue, 03 Apr 2012 09:26:40 +0100, Simon Wright wrote:

> * use the standard library, Ada.Characters.Handling.Is_Letter (probably
>   the easiest for you!)

Ada.Wide_Wide_Characters.Handling.Is_Letter for Unicode.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[relevance 7%]

* Re: Why no Ada.Wide_Directories?
  @ 2011-10-18 22:54  6%             ` ytomino
  0 siblings, 0 replies; 2+ results
From: ytomino @ 2011-10-18 22:54 UTC (permalink / raw)


On Oct 19, 12:02 am, Adam Beneschan <a...@irvine.com> wrote:
> I think we have a terminology problem.
OK, sorry that my point of the argument was not put in order well.
Do confirming.

> Latin-1 is a set of characters (a subset of the full Unicode character set).
Yes.
And it's also used as name of encoding. (ISO 8859-1, like Yannick
calls)

> So I get
> confused when people talk about Latin-1 versus UTF-8 strings as if
> they were mutually exclusive.  They're not, the way I understand the
> terms.  You can have a string composed of Latin-1 characters that's
> represented using UTF-8 encoding; and the bits in that string would be
> different from a string of the same Latin-1 characters using the
> "regular" encoding, if any character in the string is in the 16#80#..
> 16#FF# range.

Yes.
"Latin-1 as character set" is not exclusive with Unicode (UCS-2 or
UCS-4).
"Latin-1 as encoding" is exclusive with UTF-8.
And then, I (we?) talked about "Latin-1 as encoding".

> On the other hand, I was confused by your statement
> "Ada.Character.Handling.To_Upper breaks UTF-8".  I don't even see a
> way for this to make sense.  Ada.Characters.Handling works on
> character types, and a character type is an enumeration type; but a
> UTF-8 "character" can't be an enumeration type at all, since it's a
> variable-length sequence of 8-bit bytes.  I'm not quite sure what you
> meant here.

Ada.Characters and Ada.Strings are defined to work with "Latin-1 as
encoding" in String type.
Some subprograms (like To_Upper) in these will replace upper half
characters (16#80#..) to meaningless values in String holding UTF-8,
if we invoke these with UTF-8 String. (Equal_Case_Insensitive does not
replace characters, but returns meaningless value if parameters have
upper half characters encoded as UTF-8.)

Of course, Ada.Wide_Wide_Characters.Handling.To_Upper
(UTF_Encoding.Wide_Wide_Strings.Decode (any UTF-8 encoded string))
works fine.

> As to having utilities such as versions of Ada.Strings.Unbounded or
> Ada.Strings.Fixed that work directly on UTF-8-encoded strings (and
> versions of Ada.Characters that operate on single UTF-8-encoded
> characters): it's certainly possible to write a package like that, and
> anyone is free to do so, but I just don't think they'd be widely used
> enough to add to the Standard.  I could be wrong.

I throught the standard library is going to be separated UTF-8 from
Latin-1, when read about UTF-8 mode of Form parameter that Randy says.
Latin-1 is not familiar for me usually, so I has wanted UTF-8 versions
of Ada.Characters. Sorry that my personal wish was mixed.
But it's certain that the standard library has some lacks for handling
non-ASCII file names.

By the way...

I probably will confuse you more :-)
Do you know that single code-point is NOT single letter for display?
Unicode has "composed character". The cases is existing that plural
code-points represent single real letter.
(refer http://www.unicode.org/reports/tr15/tr15-33.html)
In addition, Unicode has "variation selector", This is a decorator for
previous letter (possible to mix with composed character).
(refer http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html)

Therefore, the difficulty of handling Wide_Wide_String is similar to
the difficulty of handling encoded (UTF-8 or other format) string, in
fact.



^ permalink raw reply	[relevance 6%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2011-10-14  6:58     Why no Ada.Wide_Directories? Michael Rohan
2011-10-15  1:06     ` ytomino
2011-10-17 21:33       ` Randy Brukardt
2011-10-17 23:47         ` ytomino
2011-10-18  1:10           ` Adam Beneschan
2011-10-18  2:32             ` ytomino
2011-10-18 15:02               ` Adam Beneschan
2011-10-18 22:54  6%             ` ytomino
2012-04-03  2:11     Checking to see is a string is a letter deuteros
2012-04-03  4:18     ` Leo Brewin
2012-04-03  4:52       ` Checking to see if " deuteros
2012-04-03  5:15         ` Jeffrey Carter
2012-04-03  6:07           ` deuteros
2012-04-03  8:26             ` Simon Wright
2012-04-03 13:46  7%           ` Dmitry A. Kazakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox