* Why no Ada.Wide_Directories? @ 2011-10-14 6:58 Michael Rohan 2011-10-14 7:39 ` Yannick Duchêne (Hibou57) ` (2 more replies) 0 siblings, 3 replies; 100+ messages in thread From: Michael Rohan @ 2011-10-14 6:58 UTC (permalink / raw) Hi, I've working a little on accessing files and directories using Ada.Directories and have been using a thin wrapper layer to convert from Wide_String to UTF8 and back. It does, however, seem strange there is no Wide_Directories version in the std library. Was there a technical reason it wasn't included? Take care, Michael ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-14 6:58 Why no Ada.Wide_Directories? Michael Rohan @ 2011-10-14 7:39 ` Yannick Duchêne (Hibou57) 2011-10-14 9:07 ` Dmitry A. Kazakov 2011-10-15 1:06 ` ytomino 2011-10-27 17:40 ` anon 2 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-14 7:39 UTC (permalink / raw) Le Fri, 14 Oct 2011 08:58:45 +0200, Michael Rohan <michael.k.rohan@gmail.com> a écrit: > Hi, > > I've working a little on accessing files and directories using > Ada.Directories and have been using a thin wrapper layer to convert from > Wide_String to UTF8 and back. Does it mean you pass UTF-8 encoded strings to Ada directory operations ? -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-14 7:39 ` Yannick Duchêne (Hibou57) @ 2011-10-14 9:07 ` Dmitry A. Kazakov 2011-10-14 12:48 ` Yannick Duchêne (Hibou57) 2011-10-14 12:54 ` Yannick Duchêne (Hibou57) 0 siblings, 2 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-14 9:07 UTC (permalink / raw) On Fri, 14 Oct 2011 09:39:32 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Fri, 14 Oct 2011 08:58:45 +0200, Michael Rohan > <michael.k.rohan@gmail.com> a �crit: > >> I've working a little on accessing files and directories using >> Ada.Directories and have been using a thin wrapper layer to convert from >> Wide_String to UTF8 and back. > Does it mean you pass UTF-8 encoded strings to Ada directory operations ? In most cases this is how it works under Linux. Under Windows that would depend what kind of operations xA, xW etc the implementation uses. I would strongly recommend not to use Ada.Directory until it gets fixed, e.g. *at least* all its calls made Wide_Wide_String or explicitly mandated as UTF-8 encoded. Until then as an alternative use GIO binding of Gtk or an equivalent in Qt. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-14 9:07 ` Dmitry A. Kazakov @ 2011-10-14 12:48 ` Yannick Duchêne (Hibou57) 2011-10-14 12:54 ` Yannick Duchêne (Hibou57) 1 sibling, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-14 12:48 UTC (permalink / raw) Le Fri, 14 Oct 2011 11:07:20 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > On Fri, 14 Oct 2011 09:39:32 +0200, Yannick Duchêne (Hibou57) wrote: > >> Le Fri, 14 Oct 2011 08:58:45 +0200, Michael Rohan >> <michael.k.rohan@gmail.com> a écrit: >> >>> I've working a little on accessing files and directories using >>> Ada.Directories and have been using a thin wrapper layer to convert >>> from >>> Wide_String to UTF8 and back. >> Does it mean you pass UTF-8 encoded strings to Ada directory operations >> ? > > In most cases this is how it works under Linux. Under Windows that would > depend what kind of operations xA, xW etc the implementation uses. This is indeed not safe to use. I get it raising exceptions when facing file names (perfectly valid file names for the OS), which it did not like (containing characters outside of ISO-8859). As platform independent, this package should not impose its own conventions on a platform, so better avoid it, indeed. It's not safe for a program to fail scanning sanely the content of a directory when it should be able to (unless its OK for the program to randomly miss some files). -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-14 9:07 ` Dmitry A. Kazakov 2011-10-14 12:48 ` Yannick Duchêne (Hibou57) @ 2011-10-14 12:54 ` Yannick Duchêne (Hibou57) 1 sibling, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-14 12:54 UTC (permalink / raw) Le Fri, 14 Oct 2011 11:07:20 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > In most cases this is how it works under Linux. Under Windows that would > depend what kind of operations xA, xW etc the implementation uses. It will not work at all if the implementation use xxW, and if the implementation use xxA, it may randomly work or fail depending on individual file names. So that's not safe (I guess GNAT on Windows use xxxA). I guess this may work better on Linux, as it use UTF-8 internally. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-14 6:58 Why no Ada.Wide_Directories? Michael Rohan 2011-10-14 7:39 ` Yannick Duchêne (Hibou57) @ 2011-10-15 1:06 ` ytomino 2011-10-15 6:55 ` Vadim Godunko ` (2 more replies) 2011-10-27 17:40 ` anon 2 siblings, 3 replies; 100+ messages in thread From: ytomino @ 2011-10-15 1:06 UTC (permalink / raw) Hello. In RM 3.5.2, Ada's Character/String types are not UTF-8 but Latin-1 (except Ada.Strings.UTF_Encoding). I'm afraid that is violation of the standard even if the implementation accepts UTF-8. Of course, I think that the standard is impractical, too. If we must keep the standard, there is no way to access a file (and other environment features) named with non-ASCII, at all. I'm unlikely to bear... But that's another problem. I do not know why the standard does not have Wide_Directories and Text_IO.Wide_Open and Wide_Command_Line and Wide_Environment_Variables and..., Still, too, I hope these (or the standard allows that Character/String represent UTF-8). ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 1:06 ` ytomino @ 2011-10-15 6:55 ` Vadim Godunko 2011-10-15 12:34 ` ytomino 2011-10-15 8:38 ` Dmitry A. Kazakov 2011-10-17 21:33 ` Randy Brukardt 2 siblings, 1 reply; 100+ messages in thread From: Vadim Godunko @ 2011-10-15 6:55 UTC (permalink / raw) On Oct 15, 5:06 am, ytomino <aghi...@gmail.com> wrote: > > Of course, I think that the standard is impractical, too. > If we must keep the standard, there is no way to access a file (and > other environment features) named with non-ASCII, at all. > I'm unlikely to bear... But that's another problem. > It is always possible to use non-standard library. For example you can look at Matreshka http://forge.ada-ru.org/matreshka; it has own string type which is equivalent to Wide_Wide_String, but more space and performance efficient. It provides access to command line switches and environment variables in platform and encoding independent way using this string. Unfortunately, directory operations is not implemented now, but we will implement them in some point. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 6:55 ` Vadim Godunko @ 2011-10-15 12:34 ` ytomino 0 siblings, 0 replies; 100+ messages in thread From: ytomino @ 2011-10-15 12:34 UTC (permalink / raw) On Oct 15, 3:55 pm, Vadim Godunko <vgodu...@gmail.com> wrote: > It is always possible to use non-standard library. For example you can > look at Matreshkahttp://forge.ada-ru.org/matreshka;it has own string > type which is equivalent to Wide_Wide_String, but more space and > performance efficient. It provides access to command line switches and > environment variables in platform and encoding independent way using > this string. Unfortunately, directory operations is not implemented > now, but we will implement them in some point. Matreshka seems good designed! Anyway, we can do anything if using non-standard library. However these are not a reasonable method in fact. (By the way, I've making another runtime https://github.com/ytomino/drake like you. It has intentional violation of the standard. Character/ String are just UTF-8. And Ada.Strings works according to code-point in my runtime. This is a result of avoiding inefficient that the standard library and non-standard library (doing same things) are linked. Of course, it's illegal. I do not recommend using my runtime.) ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 1:06 ` ytomino 2011-10-15 6:55 ` Vadim Godunko @ 2011-10-15 8:38 ` Dmitry A. Kazakov 2011-10-15 13:12 ` Peter C. Chapin 2011-10-17 21:33 ` Randy Brukardt 2 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-15 8:38 UTC (permalink / raw) On Fri, 14 Oct 2011 18:06:05 -0700 (PDT), ytomino wrote: > In RM 3.5.2, Ada's Character/String types are not UTF-8 but Latin-1 > (except Ada.Strings.UTF_Encoding). > I'm afraid that is violation of the standard even if the > implementation accepts UTF-8. The same applies to Wide_String, which is UCS-2 not UTF-16. Implementations pretending otherwise are wrong. For that matter Windows xW calls are UTF-16. Passing Wide_String there is wrong. > Of course, I think that the standard is impractical, too. There are two problems with the standard: 1. It does not define strings and characters in terms of a code point type to be consistent with Unicode; 2. It does not provide automatic conversions between character/string types, because of the problem #1, and because the Ada type system is too weak for that. Clearly file operations, directory operations, character maps should be defined using code points rather than characters. There should be only one instance of each operation/package independent on the encoding and the combinations of encodings. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 8:38 ` Dmitry A. Kazakov @ 2011-10-15 13:12 ` Peter C. Chapin 2011-10-15 13:22 ` Ludovic Brenta ` (2 more replies) 0 siblings, 3 replies; 100+ messages in thread From: Peter C. Chapin @ 2011-10-15 13:12 UTC (permalink / raw) On 2011-10-15 04:38, Dmitry A. Kazakov wrote: > There are two problems with the standard: > > 1. It does not define strings and characters in terms of a code point type > to be consistent with Unicode; > > 2. It does not provide automatic conversions between character/string > types, because of the problem #1, and because the Ada type system is too > weak for that. > > Clearly file operations, directory operations, character maps should be > defined using code points rather than characters. There should be only one > instance of each operation/package independent on the encoding and the > combinations of encodings. Disclaimer: I haven't thought about this very much. It seems like you are expecting too much from the standard. If a standard program writes files with names that the standard understands then a standard program can read those files back and manipulate them via Ada.Directories. Yes? The problem arises when you try to ask a standard program to delve into system specific details such as reading arbitrary ("exotic") file names supported by the system. That doesn't work, but I wouldn't expect it to work so what's the problem? C avoids this complexity by just not including directory manipulation in the standard at all. Ada at least allows a standard program to manipulate directories containing files written by another standard program. I can understand that it might be nice to extend the standard to include proper support for Unicode file names and such. But I don't think the lack of that support can be interpreted as some kind of failure of the standard. Peter ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 13:12 ` Peter C. Chapin @ 2011-10-15 13:22 ` Ludovic Brenta 2011-10-15 14:47 ` Dmitry A. Kazakov 2011-10-16 5:51 ` Yannick Duchêne (Hibou57) 2 siblings, 0 replies; 100+ messages in thread From: Ludovic Brenta @ 2011-10-15 13:22 UTC (permalink / raw) "Peter C. Chapin" <PChapin@vtc.vsc.edu> writes on comp.lang.ada: > It seems like you are expecting too much from the standard. If a > standard program writes files with names that the standard understands > then a standard program can read those files back and manipulate them > via Ada.Directories. Yes? > > The problem arises when you try to ask a standard program to delve > into system specific details such as reading arbitrary ("exotic") file > names supported by the system. That doesn't work, but I wouldn't > expect it to work so what's the problem? > > C avoids this complexity by just not including directory manipulation > in the standard at all. Ada at least allows a standard program to > manipulate directories containing files written by another standard > program. > > I can understand that it might be nice to extend the standard to > include proper support for Unicode file names and such. But I don't > think the lack of that support can be interpreted as some kind of > failure of the standard. +1 -- Ludovic Brenta. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 13:12 ` Peter C. Chapin 2011-10-15 13:22 ` Ludovic Brenta @ 2011-10-15 14:47 ` Dmitry A. Kazakov 2011-10-16 5:48 ` Yannick Duchêne (Hibou57) 2011-10-17 0:15 ` Peter C. Chapin 2011-10-16 5:51 ` Yannick Duchêne (Hibou57) 2 siblings, 2 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-15 14:47 UTC (permalink / raw) On Sat, 15 Oct 2011 09:12:39 -0400, Peter C. Chapin wrote: > On 2011-10-15 04:38, Dmitry A. Kazakov wrote: > >> There are two problems with the standard: >> >> 1. It does not define strings and characters in terms of a code point type >> to be consistent with Unicode; >> >> 2. It does not provide automatic conversions between character/string >> types, because of the problem #1, and because the Ada type system is too >> weak for that. >> >> Clearly file operations, directory operations, character maps should be >> defined using code points rather than characters. There should be only one >> instance of each operation/package independent on the encoding and the >> combinations of encodings. > > Disclaimer: I haven't thought about this very much. > > It seems like you are expecting too much from the standard. If a > standard program writes files with names that the standard understands > then a standard program can read those files back and manipulate them > via Ada.Directories. Yes? Maybe, it is difficult to guess. Anyway, what you say is a much bigger expectation than mine. I wished mere consistency of Ada string types with Unicode (after all Ada adopted Unicode), which is all internal language matter. What you are expecting is certain behavior of the language environment, which Ada cannot control at all. > The problem arises when you try to ask a standard program to delve into > system specific details such as reading arbitrary ("exotic") file names > supported by the system. There is no such thing. Unicode was introduced in order to support any thinkable names. > C avoids this complexity by just not including directory manipulation in > the standard at all. Ada at least allows a standard program to > manipulate directories containing files written by another standard program. As a matter of fact, it does not. Consider an Ada program creating a file in its current directory. Let the directory path contain Unicode characters outside Latin-1. Then another Ada program running in a different directory won't be able to find this file. In fact you cannot even walk the file system tree/forest using Ada.Directories. You cannot do this neither portably, nor even system-dependent. And it is just silly to make the point that Ada programs should read/write only files created by other Ada programs [compiled by the same compiler, I guess]. However even this is not guaranteed. > I can understand that it might be nice to extend the standard to include > proper support for Unicode file names and such. Ada is Unicode. > But I don't think the > lack of that support can be interpreted as some kind of failure of the > standard. Maybe it was a success, but some people really wished Ada.Directory be usable for developing portable GUI programs. Presently I am using GIO instead of Ada.Directory, and it does not make me happy. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 14:47 ` Dmitry A. Kazakov @ 2011-10-16 5:48 ` Yannick Duchêne (Hibou57) 2011-10-17 0:15 ` Peter C. Chapin 1 sibling, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-16 5:48 UTC (permalink / raw) Le Sat, 15 Oct 2011 16:47:55 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > And it is just silly to make the point that Ada programs should > read/write > only files created by other Ada programs +1 > Ada is Unicode. +1 -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 14:47 ` Dmitry A. Kazakov 2011-10-16 5:48 ` Yannick Duchêne (Hibou57) @ 2011-10-17 0:15 ` Peter C. Chapin 2011-10-17 3:23 ` Yannick Duchêne (Hibou57) ` (2 more replies) 1 sibling, 3 replies; 100+ messages in thread From: Peter C. Chapin @ 2011-10-17 0:15 UTC (permalink / raw) On 2011-10-15 10:47, Dmitry A. Kazakov wrote: > And it is just silly to make the point that Ada programs should read/write > only files created by other Ada programs [compiled by the same compiler, I > guess]. It's not that silly. In order to talk sensibly about files the standard needs to define a model of "file" and, in this case even "file system." This needs to be a model that will be applicable to the widest range of platforms possible. Such is the nature of a standard. Thus the standard model of "file" and "file system" will be a simplified abstraction of the real thing on any particular system. A portable program can only make use of that simplified abstraction if it expects to remain portable. If other files on the system also conform to that simplified model, that is good. A portable program will be able to manipulate them. However, if a program wishes to manipulate all files on a particular system, with their full generality, system-specific techniques are going to be necessary. For example, I don't believe the Ada standard allows one to access information about a file's owner. Yet every file on my Linux system has an owner. If I want to write a portable Ada program I have to live without that information. If the Ada standard goes on to say that I can't access files with names containing "exotic" characters, how is that any different in principle? I can appreciate that accessing files with Unicode names might be a useful thing to do in a standard program. What happens when such a program tries to create files with such names on a system that doesn't support them? I suppose a solution could be found, but I can also see how it would get ugly. Peter ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 0:15 ` Peter C. Chapin @ 2011-10-17 3:23 ` Yannick Duchêne (Hibou57) 2011-10-17 7:12 ` Simon Wright 2011-10-17 7:59 ` Dmitry A. Kazakov 2 siblings, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-17 3:23 UTC (permalink / raw) Le Mon, 17 Oct 2011 02:15:11 +0200, Peter C. Chapin <PChapin@vtc.vsc.edu> a écrit: > It's not that silly. > > In order to talk sensibly about files the standard needs to define a > model of "file" and, in this case even "file system." This needs to be a > model that will be applicable to the widest range of platforms possible. > Such is the nature of a standard. Thus the standard model of "file" and > "file system" will be a simplified abstraction of the real thing on any > particular system. A portable program can only make use of that > simplified abstraction if it expects to remain portable. > [etc] You are going too far, turning the matter into something it was not. After that, your conclusions can be just wrong or can just apply to other matters, but the actual. It was just about character set, and that model exist in Ada (in a non‑homogeneous way, which seems a failure). > What happens when such a program tries to create files with such names > on a system that doesn't support them? What about Text I/O then ? “Surprisingly”, no body complained. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 0:15 ` Peter C. Chapin 2011-10-17 3:23 ` Yannick Duchêne (Hibou57) @ 2011-10-17 7:12 ` Simon Wright 2011-10-17 7:59 ` Dmitry A. Kazakov 2 siblings, 0 replies; 100+ messages in thread From: Simon Wright @ 2011-10-17 7:12 UTC (permalink / raw) "Peter C. Chapin" <PChapin@vtc.vsc.edu> writes: > I can appreciate that accessing files with Unicode names might be a > useful thing to do in a standard program. What happens when such a > program tries to create files with such names on a system that doesn't > support them? I suppose a solution could be found, but I can also see > how it would get ugly. Exception Name_Error, I'd think! ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 0:15 ` Peter C. Chapin 2011-10-17 3:23 ` Yannick Duchêne (Hibou57) 2011-10-17 7:12 ` Simon Wright @ 2011-10-17 7:59 ` Dmitry A. Kazakov 2011-10-18 10:55 ` Peter C. Chapin 2 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-17 7:59 UTC (permalink / raw) On Sun, 16 Oct 2011 20:15:11 -0400, Peter C. Chapin wrote: > In order to talk sensibly about files the standard needs to define a > model of "file" and, in this case even "file system." This needs to be a > model that will be applicable to the widest range of platforms possible. Right > Such is the nature of a standard. Thank you for making my point. The standard disregards the above principle by using Latin-1 encoding for file names. > Thus the standard model of "file" and > "file system" will be a simplified abstraction of the real thing on any > particular system. "Simplified" is in contradiction with "widest range". In order to support widest range it must generalized, abstracted, rather than simplified, degraded. Ada adopted Unicode. Unicode is a generalized model capable to handle any encoding the target platform might use. The programmer need not to know the actual encoding, it becomes irrelevant. > If other files on the system also conform to that simplified model, that > is good. A portable program will be able to manipulate them. However, if > a program wishes to manipulate all files on a particular system, with > their full generality, system-specific techniques are going to be necessary. Wrong. All file systems share common features, which can and must be properly abstracted. System-specific are the implementations, not the package specifications. > For example, I don't believe the Ada standard allows one to access > information about a file's owner. This has nothing to do with file names, but if the standard wished to address access rights, it could do it as well. > Yet every file on my Linux system has > an owner. If I want to write a portable Ada program I have to live > without that information. If the Ada standard goes on to say that I > can't access files with names containing "exotic" characters, how is > that any different in principle? Because inability to spell the file name is not same as lacking access rights. Access rights are external to the program code. The file name, coded as a string literal is a part of the program. Failure of the former is not a bug. The latter is a bug, because the file exists, is accessible and has proper name. A program bug which cannot be fixed is a language design bug. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 7:59 ` Dmitry A. Kazakov @ 2011-10-18 10:55 ` Peter C. Chapin 2011-10-18 12:27 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Peter C. Chapin @ 2011-10-18 10:55 UTC (permalink / raw) On 2011-10-17 03:59, Dmitry A. Kazakov wrote: > Wrong. All file systems share common features, which can and must be > properly abstracted. System-specific are the implementations, not the > package specifications. Not all possible file system features, even common ones, are abstracted by the standard. So the standard must pick and choose which ones to expose. > Because inability to spell the file name is not same as lacking access > rights. Access rights are external to the program code. The file name, > coded as a string literal is a part of the program. Failure of the former > is not a bug. The latter is a bug, because the file exists, is accessible > and has proper name. A program bug which cannot be fixed is a language > design bug. I don't see it the same way. Extended attributes also exist, are accessible (to the system), and have names. Yet the standard doesn't allow you to access them. Anyway it seems like this is drifting off the main topic as it sounds like the standard does have a mechanism for accessing general Unicode file names... or at least that's what I'm gathering from the discussion. The issue of character set handling is slippery business, as you know. Perhaps the fundamental problem is that Unicode text is essentially binary data. For example when reading a Unicode file one needs to treat it as a binary file and then decode the contents (into String, Wide_String or Wide_Wide_String as desired) as it is read. Personally the idea of holding on to encoded data in memory seems like a bad idea. I know some programming languages store strings internally in "UTF-8 format" but that never made sense to me. UTF-8 encoded data is binary data. It should be put into an array of bytes or have a new type for it. I definitely don't want to accidentally mix "normal" strings of (decoded) characters with UTF-8 encoded strings. I have a feeling, Dmitry, this is what you are also saying. Peter ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 10:55 ` Peter C. Chapin @ 2011-10-18 12:27 ` Dmitry A. Kazakov 0 siblings, 0 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-18 12:27 UTC (permalink / raw) On Tue, 18 Oct 2011 06:55:54 -0400, Peter C. Chapin wrote: > On 2011-10-17 03:59, Dmitry A. Kazakov wrote: > >> Wrong. All file systems share common features, which can and must be >> properly abstracted. System-specific are the implementations, not the >> package specifications. > > Not all possible file system features, even common ones, are abstracted > by the standard. Maybe, but the code point of a file name is not that kind of feature. Each file system in the end operates Unicode code points, even if it does not support Unicode. >> Because inability to spell the file name is not same as lacking access >> rights. Access rights are external to the program code. The file name, >> coded as a string literal is a part of the program. Failure of the former >> is not a bug. The latter is a bug, because the file exists, is accessible >> and has proper name. A program bug which cannot be fixed is a language >> design bug. > > I don't see it the same way. Extended attributes also exist, are > accessible (to the system), and have names. Yet the standard doesn't > allow you to access them. It would be same if the standard would not allow to access file names at all. But it allows that, though inconsistently. Not doing something is not a bug. Bug is when something is done wrong. > The issue of character set handling is slippery business, as you know. > Perhaps the fundamental problem is that Unicode text is essentially > binary data. No, Unicode text is a sequence of code points, which can be represented using various encodings. That particular representation is binary data. > For example when reading a Unicode file one needs to treat > it as a binary file and then decode the contents (into String, > Wide_String or Wide_Wide_String as desired) as it is read. Well, that depends on the semantics of these types. If we consider them character strings, then you are wrong. Character strings are not representations they are just chains of Unicode code points constrained to some set of code points like Wide_String is [*]. Reading lines of a *text* file as Wide_String or as Wide_Wide_String assumes an appropriate decoding rather than mindless shuffling of chunks of memory. Ideally, from an *Ada* implementation I would expect that when an UTF-8 encoded text file is read as Wide_String, I would get exactly same sequences of code points as in UTF-8 or Data_Error for those, which cannot be represented. I see no problem in implementing it this way and requiring such implementations by the standard. For raw binary I/O there are streams and direct I/O of Unsigned_8 or whatever octet/memory unit type. > Personally the idea of holding on to encoded data in memory seems like a > bad idea. I know some programming languages store strings internally in > "UTF-8 format" but that never made sense to me. UTF-8 encoded data is > binary data. It should be put into an array of bytes or have a new type > for it. I definitely don't want to accidentally mix "normal" strings of > (decoded) characters with UTF-8 encoded strings. I have a feeling, > Dmitry, this is what you are also saying. Yes, I too wished to have separate string types for UTF-8 and UTF-16. It is IMO bad to mandate Ada.Directories UTF-8. Rather it should be extended with Wide_Wide_String versions as well as Ada.Text_IO and all other packages where file names appear. I would also have file paths, file names, file extensions etc properly typed, i.e. not as raw strings, but that is another story for another day. ----------------------- * An alternative interpretation could be that Wide_String is UCS-2 (+endianness specification) encoding. But that would a bad idea for a higher level language as Ada. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 13:12 ` Peter C. Chapin 2011-10-15 13:22 ` Ludovic Brenta 2011-10-15 14:47 ` Dmitry A. Kazakov @ 2011-10-16 5:51 ` Yannick Duchêne (Hibou57) 2011-10-17 21:41 ` Randy Brukardt 2 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-16 5:51 UTC (permalink / raw) Le Sat, 15 Oct 2011 15:12:39 +0200, Peter C. Chapin <PChapin@vtc.vsc.edu> a écrit: > It seems like you are expecting too much from the standard. I feel to expect safe execution. The actual behavior is unsafe. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-16 5:51 ` Yannick Duchêne (Hibou57) @ 2011-10-17 21:41 ` Randy Brukardt 2011-10-18 7:29 ` Dmitry A. Kazakov 2011-10-18 14:06 ` Pascal Obry 0 siblings, 2 replies; 100+ messages in thread From: Randy Brukardt @ 2011-10-17 21:41 UTC (permalink / raw) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 839 bytes --] "Yannick Duch�ne (Hibou57)" <yannick_duchene@yahoo.fr> wrote in message news:op.v3fjvf1vule2fv@index.ici... >Le Sat, 15 Oct 2011 15:12:39 +0200, Peter C. Chapin <PChapin@vtc.vsc.edu> >a �crit: >> It seems like you are expecting too much from the standard. >I feel to expect safe execution. The actual behavior is unsafe. That's clearly an implementation problem rather than a language one. Ada.Directories was designed with the intent that UTF-8 encoding could be used throughout (as an option) and it would work. To the extent that that is not true, there would be a bug, but I know of no such problems. Now, if an implementation on Windows doesn't have a way to use UTF-8 encoding, that is an implementation problem, but not one that the Standard can do much about. Randy. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 21:41 ` Randy Brukardt @ 2011-10-18 7:29 ` Dmitry A. Kazakov 2011-10-18 14:06 ` Pascal Obry 1 sibling, 0 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-18 7:29 UTC (permalink / raw) On Mon, 17 Oct 2011 16:41:12 -0500, Randy Brukardt wrote: > "Yannick Duch�ne (Hibou57)" <yannick_duchene@yahoo.fr> wrote in message > news:op.v3fjvf1vule2fv@index.ici... >>Le Sat, 15 Oct 2011 15:12:39 +0200, Peter C. Chapin <PChapin@vtc.vsc.edu> >>a �crit: >>> It seems like you are expecting too much from the standard. >>I feel to expect safe execution. The actual behavior is unsafe. > > That's clearly an implementation problem rather than a language one. > Ada.Directories was designed with the intent that UTF-8 encoding could be > used throughout (as an option) and it would work. How could it be an option? String is either Latin-1 or UTF-8. The standard must explicitly require UTF-8 (breaking some existing programs). > Now, if an implementation on Windows doesn't have a way to use UTF-8 > encoding, that is an implementation problem, but not one that the Standard > can do much about. It is the standard problem so long such Windows implementations are conform to the standard. Implementations, which would recode String from UTF-8 to UTF-16 and pass that to a xW Windows call, look illegal to me because String is proclaimed Latin-1. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 21:41 ` Randy Brukardt 2011-10-18 7:29 ` Dmitry A. Kazakov @ 2011-10-18 14:06 ` Pascal Obry 2011-10-18 14:08 ` Pascal Obry 2011-10-19 21:32 ` Randy Brukardt 1 sibling, 2 replies; 100+ messages in thread From: Pascal Obry @ 2011-10-18 14:06 UTC (permalink / raw) To: Randy Brukardt Randy, > Now, if an implementation on Windows doesn't have a way to use UTF-8 > encoding, that is an implementation problem, but not one that the Standard > can do much about. But I can tell you that supporting UTF-8 on Windows is not trivial at all as there is encoding/decoding needed in many places. Doing that is not trivial and we had the need to invent the "encoding=[UTF8|8BITS]" mode for Text_IO.Open for example. As you say, implementation details, but can be easily defeated: If in my file I have: Filename : constant String := "été"; And this file is saved using UTF-8 encoding, then: Text_IO.Open (Filename, ..., Mode => "encoding=8bits"); Will just fail. A programmer error? Ok... Now: Text_IO.Get (Filename, Last); Text_IO.Open (Filename, ..., Mode => "encoding=8bits"); What if the console is UTF-8? Pascal. -- --|------------------------------------------------------ --| Pascal Obry Team-Ada Member --| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE --|------------------------------------------------------ --| http://www.obry.net - http://v2p.fr.eu.org --| "The best way to travel is by means of imagination" --| --| gpg --keyserver keys.gnupg.net --recv-key F949BD3B ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 14:06 ` Pascal Obry @ 2011-10-18 14:08 ` Pascal Obry 2011-10-19 21:32 ` Randy Brukardt 1 sibling, 0 replies; 100+ messages in thread From: Pascal Obry @ 2011-10-18 14:08 UTC (permalink / raw) Cc: Randy Brukardt Le 18/10/2011 16:06, Pascal Obry a écrit : > But I can tell you that supporting UTF-8 on Windows is not trivial at > all as there is encoding/decoding needed in many places. Doing that is > not trivial and we had the need to invent the "encoding=[UTF8|8BITS]" > mode for Text_IO.Open for example. As you say, implementation details, ^^^^ form > but can be easily defeated: > > If in my file I have: > > Filename : constant String := "été"; > > And this file is saved using UTF-8 encoding, then: > > Text_IO.Open (Filename, ..., Mode => "encoding=8bits"); ^^^^ Form Pascal. -- --|------------------------------------------------------ --| Pascal Obry Team-Ada Member --| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE --|------------------------------------------------------ --| http://www.obry.net - http://v2p.fr.eu.org --| "The best way to travel is by means of imagination" --| --| gpg --keyserver keys.gnupg.net --recv-key F949BD3B ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 14:06 ` Pascal Obry 2011-10-18 14:08 ` Pascal Obry @ 2011-10-19 21:32 ` Randy Brukardt 1 sibling, 0 replies; 100+ messages in thread From: Randy Brukardt @ 2011-10-19 21:32 UTC (permalink / raw) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 2218 bytes --] "Pascal Obry" <pascal@obry.net> wrote in message news:4E9D87EB.6040203@obry.net... ... >> Now, if an implementation on Windows doesn't have a way to use UTF-8 >> encoding, that is an implementation problem, but not one that the >> Standard >> can do much about. > > But I can tell you that supporting UTF-8 on Windows is not trivial at all > as there is encoding/decoding needed in many places. Doing that is not > trivial and we had the need to invent the "encoding=[UTF8|8BITS]" mode for > Text_IO.Open for example. I would not claim that it is easy. I haven't done anything about it for Janus/Ada, for example. (This falls into the "no one has complained, so other things have priority" category). > As you say, implementation details, but can be easily defeated: > > If in my file I have: > > Filename : constant String := "�t�"; > > And this file is saved using UTF-8 encoding, then: > > Text_IO.Open (Filename, ..., Mode => "encoding=8bits"); > > Will just fail. A programmer error? Ok... Right. That's the problem with the weak-typing that we've adopted for UTF-8 and other encodings. It really has nothing to do with Open, it's a general problem with Ada.. The obvious solution (if this is a real problem in practice) would be to layer a strongly-typed layer on top of the existing facilities. Easy enough to do, but probably not something that will be in the Standard. > Now: > > Text_IO.Get (Filename, Last); > Text_IO.Open (Filename, ..., Mode => "encoding=8bits"); > > What if the console is UTF-8? If you're expecting to get Wide_Wide_Characters, you really ought to read a Wide_Wide_Character string. But I'm well aware that this solution is sub-optimal (especially in that it wastes huge amounts of space). Short of completely abandoning the existing I/O system (not the worst idea, IMHO, but unlikely), I don't think there is any practical way to "fix" Ada to deal easily with the *rare* possibility of non-Latin-1 characters. If I was doing this from scratch, I would simply decree that all I/O strings are represented in UTF-8, and use a dedicated type for them so that they can't be mixed with "String" or "Wide_String". Randy. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-15 1:06 ` ytomino 2011-10-15 6:55 ` Vadim Godunko 2011-10-15 8:38 ` Dmitry A. Kazakov @ 2011-10-17 21:33 ` Randy Brukardt 2011-10-17 23:47 ` ytomino 2011-10-18 2:59 ` Yannick Duchêne (Hibou57) 2 siblings, 2 replies; 100+ messages in thread From: Randy Brukardt @ 2011-10-17 21:33 UTC (permalink / raw) "ytomino" <aghia05@gmail.com> wrote in message news:418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com... > Hello. > In RM 3.5.2, Ada's Character/String types are not UTF-8 but Latin-1 > (except Ada.Strings.UTF_Encoding). > I'm afraid that is violation of the standard even if the > implementation accepts UTF-8. Say what? Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to store UTF-8 encoded strings. As such, I'd find it pretty surprising if doing so was "a violation of the standard". The intent has always been that Open, Ada.Directories, etc. take UTF-8 strings as an option. Presumably the implementation would use a Form to specify that the file names in UTF-8 form rather than Latin-1. (I wasn't able to find a reference for this in a quick search, but I know it has been talked about on several occasions.) One of the primary reasons that Ada.Strings.Encoding uses a subtype of String rather than a separate type is so that it can be passed to Open and the like. It's probably true that we should standardize on the Form needed to use UTF-8 strings in these contexts, or at least come up with Implementation Advice on that point. Randy. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 21:33 ` Randy Brukardt @ 2011-10-17 23:47 ` ytomino 2011-10-18 1:10 ` Adam Beneschan 2011-10-18 8:01 ` Dmitry A. Kazakov 2011-10-18 2:59 ` Yannick Duchêne (Hibou57) 1 sibling, 2 replies; 100+ messages in thread From: ytomino @ 2011-10-17 23:47 UTC (permalink / raw) On Oct 18, 6:33 am, "Randy Brukardt" <ra...@rrsoftware.com> wrote: > > Say what? > > Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to store > UTF-8 encoded strings. As such, I'd find it pretty surprising if doing so > was "a violation of the standard". > > The intent has always been that Open, Ada.Directories, etc. take UTF-8 > strings as an option. Presumably the implementation would use a Form to > specify that the file names in UTF-8 form rather than Latin-1. (I wasn't > able to find a reference for this in a quick search, but I know it has been > talked about on several occasions.) > > One of the primary reasons that Ada.Strings.Encoding uses a subtype of > String rather than a separate type is so that it can be passed to Open and > the like. > > It's probably true that we should standardize on the Form needed to use > UTF-8 strings in these contexts, or at least come up with Implementation > Advice on that point. > > Randy. Good news. Thanks for letting know. My worry is decreased a little. However, even if that is right, Form parameters are missing for many subprograms. Probably, All subprograms in Ada.Directories, Ada.Directories.Hierarchical_File_Names, Ada.Command_Line, Ada.Environment_Variables and other subprograms having Name parameter or returning a file name should have Form parameter. (For example, I do Open (X, Form => "UTF-8"). Which does Name (X) returns UTF-8 or Latin-1?) Moreover, in the future, we will always use I/O subprograms as UTF-8 mode if what you say is realized. But other libraries in the standard are explicitly defined as Latin-1. It's certain that Ada.Character.Handling.To_Upper breaks UTF-8. So we can not use almost subprograms in Ada.Characters and Ada.Strings for handling file names. (For example, Ada.Directories.Name_Case_Equivalence returns Case_Insensitive. We can not use Ada.Strings.Equal_Case_Insensitive to compare two file names.) It means standard libraries are separated UTF-8 from Latin-1. It's not reasonable. I wish it be solved. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 23:47 ` ytomino @ 2011-10-18 1:10 ` Adam Beneschan 2011-10-18 2:32 ` ytomino ` (2 more replies) 2011-10-18 8:01 ` Dmitry A. Kazakov 1 sibling, 3 replies; 100+ messages in thread From: Adam Beneschan @ 2011-10-18 1:10 UTC (permalink / raw) On Oct 17, 4:47 pm, ytomino <aghi...@gmail.com> wrote: > On Oct 18, 6:33 am, "Randy Brukardt" <ra...@rrsoftware.com> wrote: > > > > > > > > > Say what? > > > Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to store > > UTF-8 encoded strings. As such, I'd find it pretty surprising if doing so > > was "a violation of the standard". > > > The intent has always been that Open, Ada.Directories, etc. take UTF-8 > > strings as an option. Presumably the implementation would use a Form to > > specify that the file names in UTF-8 form rather than Latin-1. (I wasn't > > able to find a reference for this in a quick search, but I know it has been > > talked about on several occasions.) > > > One of the primary reasons that Ada.Strings.Encoding uses a subtype of > > String rather than a separate type is so that it can be passed to Open and > > the like. > > > It's probably true that we should standardize on the Form needed to use > > UTF-8 strings in these contexts, or at least come up with Implementation > > Advice on that point. > > > Randy. > > Good news. Thanks for letting know. > My worry is decreased a little. > > However, even if that is right, Form parameters are missing for many > subprograms. > Probably, All subprograms in Ada.Directories, > Ada.Directories.Hierarchical_File_Names, Ada.Command_Line, > Ada.Environment_Variables and other subprograms having Name parameter > or returning a file name should have Form parameter. > (For example, I do Open (X, Form => "UTF-8"). Which does Name (X) > returns UTF-8 or Latin-1?) > > Moreover, in the future, we will always use I/O subprograms as UTF-8 > mode if what you say is realized. > But other libraries in the standard are explicitly defined as Latin-1. > It's certain that Ada.Character.Handling.To_Upper breaks UTF-8. I have a feeling you're fundamentally confused about what UTF-8 is, as compared to "Latin-1". Latin-1 is a character mapping. It defines, for all integers in the range 0..255, what character that integer represents (e.g. 77 represents 'M', etc.). Unicode is a character mapping that defines characters for a much larger integer range. For integers in the range 0..255, the character represented in Unicode is the same as that in Latin-1; higher integers represent characters in other alphabets, other symbols, etc. Those mappings just tell you what symbols go with what numbers, and they don't say anything about how the numbers are supposed to be stored. UTF-8 is an encoding (representation). It defines, for each non- negative integer up to a certain point, what bits are used to represent that integer. The number of bits is not fixed. So even if you're working with characters all in the 0..255 range, some of those characters will be represented in 8 bits (one byte) and some will take 16 bits (two bytes). Because of this, it is not feasible to work with strings or characters in UTF-8 encoding. Suppose you declare a string S : String (1 .. 100); but you want it to be a UTF-8 string. How would that work? If you want to look at S(50), the computer would have to start at the beginning of the string and figure out whether each character is represented as 1 or 2 bytes. Nobody wants that. The only sane way to work with strings in memory is to use a format where every character is the same size (String if all your characters are in the 0..255 range, Wide_String for 0..65535, Wide_Wide_String for 0..2**32-1). Then, if you have a string of bytes in UTF-8 format, you convert it to a regular (Wide_)(Wide_)String with routines in Ada.Strings.UTF_Encoding; and it also has routines for converting regular strings to UTF-8 format. But you don't want to *keep* strings in memory and work with them in UTF-8 format. That's why it doesn't make sense to have string routines (like Ada.Strings.Equal_Case_Insensitive or Ada.Character_Handling.To_Upper) that work with UTF-8. Hope this solves your problem. -- Adam ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 1:10 ` Adam Beneschan @ 2011-10-18 2:32 ` ytomino 2011-10-18 4:46 ` ytomino 2011-10-18 15:02 ` Adam Beneschan 2011-10-18 3:15 ` Yannick Duchêne (Hibou57) 2011-10-18 7:55 ` Dmitry A. Kazakov 2 siblings, 2 replies; 100+ messages in thread From: ytomino @ 2011-10-18 2:32 UTC (permalink / raw) On Oct 18, 10:10 am, Adam Beneschan <a...@irvine.com> wrote: > On Oct 17, 4:47 pm, ytomino <aghi...@gmail.com> wrote: > > > > > > > > > > > On Oct 18, 6:33 am, "Randy Brukardt" <ra...@rrsoftware.com> wrote: > > > > Say what? > > > > Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to store > > > UTF-8 encoded strings. As such, I'd find it pretty surprising if doing so > > > was "a violation of the standard". > > > > The intent has always been that Open, Ada.Directories, etc. take UTF-8 > > > strings as an option. Presumably the implementation would use a Form to > > > specify that the file names in UTF-8 form rather than Latin-1. (I wasn't > > > able to find a reference for this in a quick search, but I know it has been > > > talked about on several occasions.) > > > > One of the primary reasons that Ada.Strings.Encoding uses a subtype of > > > String rather than a separate type is so that it can be passed to Open and > > > the like. > > > > It's probably true that we should standardize on the Form needed to use > > > UTF-8 strings in these contexts, or at least come up with Implementation > > > Advice on that point. > > > > Randy. > > > Good news. Thanks for letting know. > > My worry is decreased a little. > > > However, even if that is right, Form parameters are missing for many > > subprograms. > > Probably, All subprograms in Ada.Directories, > > Ada.Directories.Hierarchical_File_Names, Ada.Command_Line, > > Ada.Environment_Variables and other subprograms having Name parameter > > or returning a file name should have Form parameter. > > (For example, I do Open (X, Form => "UTF-8"). Which does Name (X) > > returns UTF-8 or Latin-1?) > > > Moreover, in the future, we will always use I/O subprograms as UTF-8 > > mode if what you say is realized. > > But other libraries in the standard are explicitly defined as Latin-1. > > It's certain that Ada.Character.Handling.To_Upper breaks UTF-8. > > I have a feeling you're fundamentally confused about what UTF-8 is, as > compared to "Latin-1". Latin-1 is a character mapping. It defines, > for all integers in the range 0..255, what character that integer > represents (e.g. 77 represents 'M', etc.). Unicode is a character > mapping that defines characters for a much larger integer range. For > integers in the range 0..255, the character represented in Unicode is > the same as that in Latin-1; higher integers represent characters in > other alphabets, other symbols, etc. Those mappings just tell you > what symbols go with what numbers, and they don't say anything about > how the numbers are supposed to be stored. > > UTF-8 is an encoding (representation). It defines, for each non- > negative integer up to a certain point, what bits are used to > represent that integer. The number of bits is not fixed. So even if > you're working with characters all in the 0..255 range, some of those > characters will be represented in 8 bits (one byte) and some will take > 16 bits (two bytes). > > Because of this, it is not feasible to work with strings or characters > in UTF-8 encoding. Suppose you declare a string > > S : String (1 .. 100); > > but you want it to be a UTF-8 string. How would that work? If you > want to look at S(50), the computer would have to start at the > beginning of the string and figure out whether each character is > represented as 1 or 2 bytes. Nobody wants that. > > The only sane way to work with strings in memory is to use a format > where every character is the same size (String if all your characters > are in the 0..255 range, Wide_String for 0..65535, Wide_Wide_String > for 0..2**32-1). Then, if you have a string of bytes in UTF-8 format, > you convert it to a regular (Wide_)(Wide_)String with routines in > Ada.Strings.UTF_Encoding; and it also has routines for converting > regular strings to UTF-8 format. But you don't want to *keep* strings > in memory and work with them in UTF-8 format. That's why it doesn't > make sense to have string routines (like > Ada.Strings.Equal_Case_Insensitive or Ada.Character_Handling.To_Upper) > that work with UTF-8. > > Hope this solves your problem. > > -- Adam I'm not confused. Your misreading. Of course, if applications always hold file names as Wide_Wide_String, and encode to UTF-8 only/every calling I/O subprograms as what you say, so it's very simple and it is perhaps intended method. I understand it. But, where do these file names come from? These are usually told by command-line or configuration file (written by user). It is probably encoded UTF-8 if the locale setting of OS is UTF-8. So Form parameters of subprograms in Ada.Command_Line are necessary and it's natural keeping UTF-8. (Some file systems like Linux accept broken code as correct file name. Applications must not (can not?) decode/encode file names in this case. Broken file name may be right file name if user sets LANG variable. Same thing is in NTFS/NFS+. These file systems can accept broken UTF-16. Strictly speaking, always, an application should not encode/ decode file names. But, Ada decides file names are stored into String (as long as Randy says). So we have to give up about UTF-16 file systems.) And, it's popular that text processing functions keep encoded strings in many other libraries or languages. I do not necessarily want to deny the way of Ada, but I feel your opinion is prejudiced. It is not so difficult as you say in fact. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 2:32 ` ytomino @ 2011-10-18 4:46 ` ytomino 2011-10-18 9:32 ` Yannick Duchêne (Hibou57) 2011-10-18 15:02 ` Adam Beneschan 1 sibling, 1 reply; 100+ messages in thread From: ytomino @ 2011-10-18 4:46 UTC (permalink / raw) Well...If my supplement is allowed, in my honest opinion ignoring the existing way of Ada, "File_Name_String" is better. (In addition, It's welcome that UTF_8_String and UTF_16_String be new types like Yannick says.) File_Name_String is UTF-8 on OSX(if only using POSIX API), UTF-16 on Windows and localized string on BSD (I18N of BSD is unique!). And Equal_File_Names/Less_File_Names are necessity. Ada.Directories.Name_Case_Equivalence is useless because case insensitive rules between NTFS and NFS+ are different. (I had implemented case insensitive rules of NFS+. It's difficult and required the different table from UCD. NTFS is easier because CompareString API can be used for this purpose. CoreFoundation possibly have usable function, too. But I do not know. Anyway, these are not portable. I want standard library wrapping these.) It's just supposed story ignoreing existing way. I may be satisfied if Form parameter is usable. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 4:46 ` ytomino @ 2011-10-18 9:32 ` Yannick Duchêne (Hibou57) 2011-10-18 10:00 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-18 9:32 UTC (permalink / raw) Le Tue, 18 Oct 2011 06:46:13 +0200, ytomino <aghia05@gmail.com> a écrit: > Well...If my supplement is allowed, in my honest opinion ignoring the > existing way of Ada, "File_Name_String" is better. > (In addition, It's welcome that UTF_8_String and UTF_16_String be new > types like Yannick says.) For personal and specific use cases, yes, however, for a standard, I would be more in favor of an Unicode_String type. To be honest, my dream would be to replace the Ada String type with that Unicode_String type (a dream… I said). I use to attempt to create packages where the String type was redefined, but failed due to some scope trouble (could never make my mind about wither or not this was a GNAT bug or not). This is important, because UTF-8, vs UTF-16LE, UTF-16BE and even possibly UTF-32BE and UTF-32LE, is only a matter of implementation and is not a good candidate for an interface, unless participating in a specific use case. Unicode_String implementation could be optionally encoded, or not, at the sole discretion of implementation. The implementation could use UTF-32 if it wish to be simple, or be in favor of the same encoding as the target platform. This Unicode_String type would have method to return a conversion into one of UTF-8, UTF-16 and UTF-32, and optionally (may raise runtime error) to ISO-8859-1. For efficiency, this could also provide primitive for common iterated composition, such as concatenation, getting slice, comparison (which can be implemented far more efficiently at the implementation level, that by mean getting and setting character, which involve encoding and decoding each time). I would also suggest a Change_To_Uppercase (Unicode_String, Index), and the same with Change_To_Lower_Case, along with a Remove_Slice and Insert_Slice primitives. These primitive would cover most of use case and help preserve efficiency. This could also solve a glitch. Actually, if you want to store UTF-8 string in an Ada source, you have to cheat the compiler: edit the file as UTF-8, and compile as if it was ISO-8859-1 (*). Unfortunately, this is not clean. If there was a real Unicode_String type (or the String type changed into a Unicode one… in my dreams), this would not be a trouble any more. On the other hand, if this would cause troubles to Ada, I prefer no change, and to go on with personal methods. (*) You can do the same for UTF-16, with some variation: use Wide_Character for your string, edit sources in UTF-16, and cheat the compiler telling him the sources are UCS2 encoded (note: UCS2 is another no-encoding Unicode subset, the same way ISO-8859-1 is, except two bytes wide instead of one byte wide). -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 9:32 ` Yannick Duchêne (Hibou57) @ 2011-10-18 10:00 ` Dmitry A. Kazakov 2011-10-18 10:06 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-18 10:00 UTC (permalink / raw) On Tue, 18 Oct 2011 11:32:07 +0200, Yannick Duchêne (Hibou57) wrote: > Le Tue, 18 Oct 2011 06:46:13 +0200, ytomino <aghia05@gmail.com> a écrit: > >> Well...If my supplement is allowed, in my honest opinion ignoring the >> existing way of Ada, "File_Name_String" is better. >> (In addition, It's welcome that UTF_8_String and UTF_16_String be new >> types like Yannick says.) > For personal and specific use cases, yes, however, for a standard, I would > be more in favor of an Unicode_String type. To be honest, my dream would > be to replace the Ada String type with that Unicode_String type (a dream… No need to replace anything, just fix the type system. It should be capable to have String a subtype of Wide_Wide_String, which is already Unicode. UTF8_String should also be a subtype of Wide_Wide_String, being just an alternative implementation of. All differences between string and character types are differences in their implementations, not in the semantics. Semantically any string is a sequence of code points (with various constraints applied to the set of code points). -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 10:00 ` Dmitry A. Kazakov @ 2011-10-18 10:06 ` Yannick Duchêne (Hibou57) 2011-10-18 12:01 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-18 10:06 UTC (permalink / raw) Le Tue, 18 Oct 2011 12:00:00 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > No need to replace anything, just fix the type system. It should be > capable > to have String a subtype of Wide_Wide_String, which is already Unicode. So you, you are dreaming of an universal_string type ? ;) -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 10:06 ` Yannick Duchêne (Hibou57) @ 2011-10-18 12:01 ` Dmitry A. Kazakov 0 siblings, 0 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-18 12:01 UTC (permalink / raw) On Tue, 18 Oct 2011 12:06:17 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Tue, 18 Oct 2011 12:00:00 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: >> No need to replace anything, just fix the type system. It should be capable >> to have String a subtype of Wide_Wide_String, which is already Unicode. > So you, you are dreaming of an universal_string type ? ;) No, rather a cloud of string types with different implementations and same interface. The problem is that types which are semantically same: String Wide_String Wide_Wide_String Unbounded_String ... are not same in the language. Adding UTF-8, UTF-16 etc would multiply that already grotesque mess. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 2:32 ` ytomino 2011-10-18 4:46 ` ytomino @ 2011-10-18 15:02 ` Adam Beneschan 2011-10-18 15:16 ` Dmitry A. Kazakov 2011-10-18 22:54 ` ytomino 1 sibling, 2 replies; 100+ messages in thread From: Adam Beneschan @ 2011-10-18 15:02 UTC (permalink / raw) On Oct 17, 7:32 pm, ytomino <aghi...@gmail.com> wrote: > > I'm not confused. Your misreading. I think we have a terminology problem. To me, Latin-1 is a set of characters (a subset of the full Unicode character set). So I get confused when people talk about Latin-1 versus UTF-8 strings as if they were mutually exclusive. They're not, the way I understand the terms. You can have a string composed of Latin-1 characters that's represented using UTF-8 encoding; and the bits in that string would be different from a string of the same Latin-1 characters using the "regular" encoding, if any character in the string is in the 16#80#.. 16#FF# range. However, everyone else seems to be using "Latin-1" to talk about the *representation* in addition to the subset of characters that's being represented---in particular, the representation in which each symbol is represented as one 8-bit byte. And I guess we don't really have a good term to describe that representation. I think UCS-1 is best, but it doesn't seem to be commonly used. So I guess I'll have to learn to live with the misuse of the term "Latin-1" to refer to a representation (encoding)---just as we older programmers have learned to live with the terms "Julian Date" and "Gregorian Date" to mean a dates in year/day-of-year form and in year/month/day form despite the fact that this has nothing to do with the Julian or Gregorian calendar. OK, then. I apologize for assuming that this was a sign of your misunderstanding. On the other hand, I was confused by your statement "Ada.Character.Handling.To_Upper breaks UTF-8". I don't even see a way for this to make sense. Ada.Characters.Handling works on character types, and a character type is an enumeration type; but a UTF-8 "character" can't be an enumeration type at all, since it's a variable-length sequence of 8-bit bytes. I'm not quite sure what you meant here. As to having utilities such as versions of Ada.Strings.Unbounded or Ada.Strings.Fixed that work directly on UTF-8-encoded strings (and versions of Ada.Characters that operate on single UTF-8-encoded characters): it's certainly possible to write a package like that, and anyone is free to do so, but I just don't think they'd be widely used enough to add to the Standard. I could be wrong. -- Adam ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 15:02 ` Adam Beneschan @ 2011-10-18 15:16 ` Dmitry A. Kazakov 2011-10-18 23:42 ` Adam Beneschan 2011-10-19 21:43 ` Randy Brukardt 2011-10-18 22:54 ` ytomino 1 sibling, 2 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-18 15:16 UTC (permalink / raw) On Tue, 18 Oct 2011 08:02:31 -0700 (PDT), Adam Beneschan wrote: > On the other hand, I was confused by your statement > "Ada.Character.Handling.To_Upper breaks UTF-8". When String X contains UTF-8 encoded text (means: Character'Pos = octet value), then To_Upper (X) would yield garbage for some texts. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 15:16 ` Dmitry A. Kazakov @ 2011-10-18 23:42 ` Adam Beneschan 2011-10-19 8:12 ` Dmitry A. Kazakov 2011-10-19 21:43 ` Randy Brukardt 1 sibling, 1 reply; 100+ messages in thread From: Adam Beneschan @ 2011-10-18 23:42 UTC (permalink / raw) On Oct 18, 8:16 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote: > On Tue, 18 Oct 2011 08:02:31 -0700 (PDT), Adam Beneschan wrote: > > On the other hand, I was confused by your statement > > "Ada.Character.Handling.To_Upper breaks UTF-8". > > When String X contains UTF-8 encoded text (means: Character'Pos = octet > value), then To_Upper (X) would yield garbage for some texts. Oh, I see. I thought he was actually talking about UTF-8 encoded characters, not "characters" in a UTF-8 encoded string. My impression (apparently wrong) was that when String X contained UTF-8 encoded text, that the programmer would understand that the characters in it weren't really *characters* and thus wouldn't dream of calling To_Upper. But I suppose that somebody working on a part of a program that takes a String parameter and doesn't realize that the String parameter could be an array of not-really-characters could get it wrong. Which I think is more evidence of why it was wrong to have the String type, an array of characters, do double duty as an array-of- encoded-bytes type. -- Adam ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 23:42 ` Adam Beneschan @ 2011-10-19 8:12 ` Dmitry A. Kazakov 0 siblings, 0 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-19 8:12 UTC (permalink / raw) On Tue, 18 Oct 2011 16:42:51 -0700 (PDT), Adam Beneschan wrote: > Which I think is more evidence of why it was wrong to have the > String type, an array of characters, do double duty as an array-of- > encoded-bytes type. There is nothing wrong for a string to have an array interface. Wrong is the language design which requires the implementation of that interface in a certain way that is inconsistent with the type semantics. It should have been: type String_Index is range ...; type Octet_Index is range ...; type UTF8_String is private array (String_Index range <>) of Wide_Wide_Character and private array (Octet_Index range <>) of Unsigned_8; private type UTF8_String is array (Octet_Index range <>) of Unsigned_8; -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 15:16 ` Dmitry A. Kazakov 2011-10-18 23:42 ` Adam Beneschan @ 2011-10-19 21:43 ` Randy Brukardt 2011-10-20 7:37 ` Dmitry A. Kazakov 1 sibling, 1 reply; 100+ messages in thread From: Randy Brukardt @ 2011-10-19 21:43 UTC (permalink / raw) "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:a3j4wzrhrj65$.bkkht9t97w84.dlg@40tude.net... > On Tue, 18 Oct 2011 08:02:31 -0700 (PDT), Adam Beneschan wrote: > >> On the other hand, I was confused by your statement >> "Ada.Character.Handling.To_Upper breaks UTF-8". > > When String X contains UTF-8 encoded text (means: Character'Pos = octet > value), then To_Upper (X) would yield garbage for some texts. You should have just said: When String X contains UTF-8 encoded text (means: Character'Pos = octet value), then virtually all existing string operations will yield garbage for some texts. The only way to safely use a UTF-8 string is opaquely, which means you can store it whole, but any operation on it is performed after decoding it. That's of course the best argument for having it be a separate type. The problem is that Ada doesn't have any reasonable way to define conversions for that type (and having long-winded conversion functions with long winded names like "Ada.Strings.Unbounded.To_Unbounded_String" don't count in my view). And there is just enough need to treat these things as arrays-of-bytes (slicing is needed for storage of variable length UTF-8 strings in "plain Ada", for one example) that treating them as "opaque" isn't ideal. Randy. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-19 21:43 ` Randy Brukardt @ 2011-10-20 7:37 ` Dmitry A. Kazakov 2011-10-20 11:04 ` Yannick Duchêne (Hibou57) 2011-10-20 17:40 ` J-P. Rosen 0 siblings, 2 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-20 7:37 UTC (permalink / raw) On Wed, 19 Oct 2011 16:43:08 -0500, Randy Brukardt wrote: > The only way to safely use a UTF-8 string is opaquely, which means you can > store it whole, but any operation on it is performed after decoding it. > That's of course the best argument for having it be a separate type. Yes. It is worth to remember that Ada once was considered a strongly typed language... > The > problem is that Ada doesn't have any reasonable way to define conversions > for that type (and having long-winded conversion functions with long winded > names like "Ada.Strings.Unbounded.To_Unbounded_String" don't count in my > view). This is a language type system problem, which must be fixed first. > And there is just enough need to treat these things as > arrays-of-bytes (slicing is needed for storage of variable length UTF-8 > strings in "plain Ada", for one example) that treating them as "opaque" > isn't ideal. That is an unrelated issue. Once the type system gets fixed, it would be no problem to have an array view (or fancy "aspect", if you want) of encoded strings. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 7:37 ` Dmitry A. Kazakov @ 2011-10-20 11:04 ` Yannick Duchêne (Hibou57) 2011-10-20 12:21 ` Dmitry A. Kazakov 2011-10-20 17:40 ` J-P. Rosen 1 sibling, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-20 11:04 UTC (permalink / raw) Le Thu, 20 Oct 2011 09:37:57 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > On Wed, 19 Oct 2011 16:43:08 -0500, Randy Brukardt wrote: > >> The only way to safely use a UTF-8 string is opaquely, which means you >> can >> store it whole, but any operation on it is performed after decoding it. >> That's of course the best argument for having it be a separate type. > > Yes. It is worth to remember that Ada once was considered a strongly > typed > language... It still is!, the trouble is at library level, not language level. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 11:04 ` Yannick Duchêne (Hibou57) @ 2011-10-20 12:21 ` Dmitry A. Kazakov 2011-10-20 12:38 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-20 12:21 UTC (permalink / raw) On Thu, 20 Oct 2011 13:04:43 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Thu, 20 Oct 2011 09:37:57 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: > >> On Wed, 19 Oct 2011 16:43:08 -0500, Randy Brukardt wrote: >> >>> The only way to safely use a UTF-8 string is opaquely, which means you can >>> store it whole, but any operation on it is performed after decoding it. >>> That's of course the best argument for having it be a separate type. >> >> Yes. It is worth to remember that Ada once was considered a strongly >> typed language... > It still is!, the trouble is at library level, not language level. No, the troubles at the library level are reflections of language problems. The language ceased to evolve within its paradigm of a strongly typed language. Instead of addressing new issues from the stand point of typed approach, it tries solutions from the languages alien to its spirit. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 12:21 ` Dmitry A. Kazakov @ 2011-10-20 12:38 ` Yannick Duchêne (Hibou57) 2011-10-20 14:31 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-20 12:38 UTC (permalink / raw) Le Thu, 20 Oct 2011 14:21:16 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: >> It still is!, the trouble is at library level, not language level. > > No, the troubles at the library level are reflections of language > problems. > > The language ceased to evolve within its paradigm of a strongly typed > language. Instead of addressing new issues from the stand point of typed > approach, it tries solutions from the languages alien to its spirit. Can you draw a short (or less short) formal model ? Do you have clear ideas ? Are they ideas inspired from known formalisms (may be like in specific or research languages) ? I am interested in typing and typing models. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 12:38 ` Yannick Duchêne (Hibou57) @ 2011-10-20 14:31 ` Dmitry A. Kazakov 2011-10-20 15:54 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-20 14:31 UTC (permalink / raw) On Thu, 20 Oct 2011 14:38:27 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Thu, 20 Oct 2011 14:21:16 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: >>> It still is!, the trouble is at library level, not language level. >> >> No, the troubles at the library level are reflections of language >> problems. >> >> The language ceased to evolve within its paradigm of a strongly typed >> language. Instead of addressing new issues from the stand point of typed >> approach, it tries solutions from the languages alien to its spirit. > Can you draw a short (or less short) formal model ? Do you have clear > ideas ? Are they ideas inspired from known formalisms (may be like in > specific or research languages) ? I am interested in typing and typing > models. I am not a language designer. I have problems rather than solutions. What I know is that the decomposition shall go along the types. Design entities must be described as types. Their relationships should be as type relationships. Substitutability should be decided on the basis of manifested declarations, not the type structure. Interface must be clearly separated from implementation. Implementation must be absolutely free too choose. There shall be no procedures but operations on types. All types shall have classes. Any syntax sugar (prefix notation, infix operations, assignments, indexing, member extraction, aggregates, entries, attributes) shall be operations. Construction model must be type safe (in particular, each type must have constructors, including class-wide types). The type system shall support both specialization and generalization. The programmer should be able to enforce static type and constraint checks, in particular, to convert any potentially dynamic checks into compile-time errors. All exceptions must be typed, contracted and statically checked. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 14:31 ` Dmitry A. Kazakov @ 2011-10-20 15:54 ` Yannick Duchêne (Hibou57) 2011-10-20 17:35 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-20 15:54 UTC (permalink / raw) Le Thu, 20 Oct 2011 16:31:59 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > I am not a language designer. I have problems rather than solutions. Like many of us here ;) > What I know is that the decomposition shall go along the types. You use to say you don't feel FP good, but I sware, I am sure you would enjoy some part of it ;) > Implementation must be absolutely free too > choose. There shall be no procedures but operations on types. All types > shall have classes. What's missing from Interface type introduced with Ada 2005 ? Doesn't it fulfill the above expectations ? (also keep in mind sometime efficiency is required, and if you want place formalism over efficiency, then you have to sacrifice efficiency, conscientiously). > Any syntax sugar (prefix notation, infix operations, > assignments, indexing, member extraction, aggregates, entries, > attributes) shall be operations. Are you sure you are not confused between concrete syntax and abstract syntax ? Otherwise, if I may reword you, perhaps you are complaining there are not enough user re-definable operations. Otherwise, I don't see what's relevant in turning syntactic sugar into operations; these plays two different roles and are of orthogonal domains. > Construction model must be type safe (in particular, > each type must have constructors, including class-wide types). The type > system shall support both specialization and generalization. Could you provide an example case of generalization you have in mind ? > The programmer > should be able to enforce static type and constraint checks, in > particular, > to convert any potentially dynamic checks into compile-time errors. All > exceptions must be typed, contracted and statically checked. This is not a language topic, instead, a technology level topic. I feel runtime check is a reasonable fall-back for what cannot be statically checked in th actual state of the technology. If you really require static check, then you must restrict yourself to what can be statically checked. If Ada 2012 defines some Design by Contract checks as runtime check, this is not a language flaw, a pragmatic choice. Along with that, if a compiler is able to statically check what Ada 2012 designate as runtime check, then nothing in the language definition disallows the compiler to apply all static checks it is able to. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 15:54 ` Yannick Duchêne (Hibou57) @ 2011-10-20 17:35 ` Dmitry A. Kazakov 2011-10-21 12:53 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-20 17:35 UTC (permalink / raw) On Thu, 20 Oct 2011 17:54:28 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Thu, 20 Oct 2011 16:31:59 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: >> What I know is that the decomposition shall go along the types. > You use to say you don't feel FP good, but I sware, I am sure you would > enjoy some part of it ;) No, FP is just too low level: procedural decomposition. Type systems correspond to the categories - a better and more capable mathematics => safer design. Another fundamental problem of FP is a wrong premise about being stateless. Computing is solely about states. You run a program to have its side effects, there is no other reason for doing that. >> Implementation must be absolutely free too >> choose. There shall be no procedures but operations on types. All types >> shall have classes. > What's missing from Interface type introduced with Ada 2005 ? 1. Most Ada types do not have interfaces 2. Ada interface cannot be inherited from a concrete type 3. Ada interface cannot have implementation 4. Ada interface does not support ad-hoc supertypes > Doesn't it > fulfill the above expectations ? (also keep in mind sometime efficiency is > required, and if you want place formalism over efficiency, then you have > to sacrifice efficiency, conscientiously). Not an issue. Scalar types may have interfaces at zero time/space cost. You don't need to embed tag into by-value types. >> Any syntax sugar (prefix notation, infix operations, >> assignments, indexing, member extraction, aggregates, entries, >> attributes) shall be operations. > Are you sure you are not confused between concrete syntax and abstract > syntax ? I don't understand this. The problem is that, for example, for the record type T and its member A, the ".A" is not the operation of T, because record is not an interface. A'First is not an operation of array. ":=" is not an operation (doubly dispatching) of its left and right sides etc. >> Construction model must be type safe (in particular, >> each type must have constructors, including class-wide types). The type >> system shall support both specialization and generalization. > Could you provide an example case of generalization you have in mind ? Examples are: 1. Type extension (e.g. upon derivation, present in Ada) 2. Expansion of enumeration types 3. Cartesian product of types, e.g. Real x Real -> Complex 4. Lifting constraints, e.g. Float -> IEEE Float (number + NaN + +Inf ...) 5. Ad-hoc supertypes, e.g. String U Unbounded_String -> General_String, creating new classes from existing ones by union. >> The programmer should be able to enforce static type and constraint checks, in >> particular, to convert any potentially dynamic checks into compile-time errors. All >> exceptions must be typed, contracted and statically checked. > This is not a language topic, instead, a technology level topic. I feel > runtime check is a reasonable fall-back for what cannot be statically > checked in th actual state of the technology. No, it is inconsistent and unreasonable. Static checks are meant to detect bugs. Bug is either there or not, independently on whether the program is running, not running, will ever run. It is just not a function of the execution state. Bug is a property of the program and all its possible sates as a whole. A program cannot be both correct and incorrect. A program checking itself as wrong is a Cretan Liar. > If you really require static > check, then you must restrict yourself to what can be statically checked. Yes, and I want a firewall between static and dynamic checks. If some proposition is declared statically true or false, while the compiler is unable to prove it, that should make the program illegal. The programmer must be forced to chose, and if it decides for a static check he must be sure that the compiler indeed verified his assumption or else have to change the program. > If Ada 2012 defines some Design by Contract checks as runtime check, this > is not a language flaw, a pragmatic choice. Yet another generator of arbitrary exceptions. Lessons from accessibility checks not learned... > Along with that, if a compiler > is able to statically check what Ada 2012 designate as runtime check, then > nothing in the language definition disallows the compiler to apply all > static checks it is able to. See above, it is the difference between an illegal program and a program raising exceptions, nothing in common. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 17:35 ` Dmitry A. Kazakov @ 2011-10-21 12:53 ` Yannick Duchêne (Hibou57) 2011-10-21 13:41 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-21 12:53 UTC (permalink / raw) Le Thu, 20 Oct 2011 19:35:21 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > No, FP is just too low level: procedural decomposition. Type systems > correspond to the categories - a better and more capable mathematics => > safer design. Another fundamental problem of FP is a wrong premise about > being stateless. Computing is solely about states. You run a program to > have its side effects, there is no other reason for doing that. You should write every thing down you know (your though about Ada, FP, and so on). Would be useful to you and others. >> What's missing from Interface type introduced with Ada 2005 ? > > 1. Most Ada types do not have interfaces Eiffel has this, and this is 1) not perfect (may lead to performance issue) 2) rarely used in practice > 2. Ada interface cannot be inherited from a concrete type You can have a concrete implementation, why is that not enough ? > 3. Ada interface cannot have implementation Derived types can. Why is that a trouble is one inheritance level is purely abstract ? > 4. Ada interface does not support ad-hoc supertypes Can you tell more with an example ? (I don't know what supertypes are) Feels like to need an even higher level language than Ada is. There are some, however most are interpreted language, and are not targeting safety (in the wide meaning) as much as Ada too. > Not an issue. Scalar types may have interfaces at zero time/space cost. > You > don't need to embed tag into by-value types. This is possible indeed, but at the cost of separate compilation. SmallEiffel did this, but was relying on overall program analysis, which were compiled as a whole. Some other Eiffel implementation using separate compilation, could not optimize. If you make it part of the language standard, you are imposing implementation requirements beyond the reasonable. Very big applications need separate compilation. Although attempted and suggested by Bertrand Meyer, Eiffel applications never scaled large fine (except with global analysis, but re-compiling a whole application whenever something change, although they may be some trick to avoid real recompilation of everything, is not an acceptable option for Ada niches). >>> Any syntax sugar (prefix notation, infix operations, >>> assignments, indexing, member extraction, aggregates, entries, >>> attributes) shall be operations. >> Are you sure you are not confused between concrete syntax and abstract >> syntax ? > > I don't understand this. The problem is that, for example, for the record > type T and its member A, the ".A" is not the operation of T, because > record > is not an interface. A'First is not an operation of array. ":=" is not an > operation (doubly dispatching) of its left and right sides etc. Same feeling as above. Seems you are looking for something which is higher level than Ada is. There are some pleasant language in this area, but which just end to be cool toys (although still cool to play with ;) ). May be worth to recall Ada is not a modeling language, but an implementation language with features to enforce safety as much as possible. >> This is not a language topic, instead, a technology level topic. I feel >> runtime check is a reasonable fall-back for what cannot be statically >> checked in th actual state of the technology. > > No, it is inconsistent and unreasonable. Static checks are meant to > detect bugs. Bug is either there or not, independently on whether the > program is > running, not running, will ever run. Easy to say, less to do. You did not demonstrate this is not related to actual technology, you just complained it is not as you wish. Sorry if I've not replied to each point, to keep it short. I often agree with many points you sometime raised about Ada. There, I feel you are going to far for what Ada is intended to. You are not noticing any inconsistencies in existing features, you are requiring new features. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 12:53 ` Yannick Duchêne (Hibou57) @ 2011-10-21 13:41 ` Dmitry A. Kazakov 2011-10-25 19:22 ` Randy Brukardt 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-21 13:41 UTC (permalink / raw) On Fri, 21 Oct 2011 14:53:11 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Thu, 20 Oct 2011 19:35:21 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: >>> What's missing from Interface type introduced with Ada 2005 ? >> >> 1. Most Ada types do not have interfaces > Eiffel has this, and this is 1) not perfect (may lead to performance > issue) 2) rarely used in practice There is no performance loss. >> 2. Ada interface cannot be inherited from a concrete type > You can have a concrete implementation, why is that not enough ? Because it is not what required: you have a concrete type and want to name its interface to inherit from, only the interface or its part. >> 3. Ada interface cannot have implementation > Derived types can. Why is that a trouble is one inheritance level is > purely abstract ? Why I am forced to have it? If you have a reason, the implication is that *each* type must have two declarations: the interface and the type itself. Note that this does not solve the problem, because it would not give partial interfaces. The problem is fragile design: you don't know in advance all interfaces the users of the package might you later on. It is very bad for large system design. >> 4. Ada interface does not support ad-hoc supertypes > Can you tell more with an example ? (I don't know what supertypes are) If A is a subtype of B, then B is a supertype of A. Subtype imports operations, supertype exports them. Ad-hoc means that you can hang on supertypes on existing types, e.g. coming from a library, which cannot be changed. Doing so you could bring such unrelated types under one roof, e.g. to be able to put them into a container etc. >> Not an issue. Scalar types may have interfaces at zero time/space cost. You >> don't need to embed tag into by-value types. > This is possible indeed, but at the cost of separate compilation. It is possible without that. >>>> Any syntax sugar (prefix notation, infix operations, >>>> assignments, indexing, member extraction, aggregates, entries, >>>> attributes) shall be operations. >>> Are you sure you are not confused between concrete syntax and abstract >>> syntax ? >> >> I don't understand this. The problem is that, for example, for the record >> type T and its member A, the ".A" is not the operation of T, because record >> is not an interface. A'First is not an operation of array. ":=" is not an >> operation (doubly dispatching) of its left and right sides etc. > > Same feeling as above. Seems you are looking for something which is higher > level than Ada is. It is not higher level, it is just regular language. Ada 83 was designed in the times when type systems were pretty fresh stuff. It bears marks of elder languages which had only built-in types. > May > be worth to recall Ada is not a modeling language, but an implementation > language with features to enforce safety as much as possible. You mean that lacking constructors, user-defined assignment, safe finalization adds something to safety? That must be a very strange kind of safety then... >>> This is not a language topic, instead, a technology level topic. I feel >>> runtime check is a reasonable fall-back for what cannot be statically >>> checked in th actual state of the technology. >> >> No, it is inconsistent and unreasonable. Static checks are meant to >> detect bugs. Bug is either there or not, independently on whether the >> program is running, not running, will ever run. > Easy to say, less to do. You did not demonstrate this is not related to > actual technology, you just complained it is not as you wish. No, I complained that self correctness check is inconsistent. As for raising exceptions from run-time checks, that plague is well known to anybody who ever used access types. ARG keeps on struggling to repair the damage made in Ada 95, while breaching another, bigger hole in the language... -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 13:41 ` Dmitry A. Kazakov @ 2011-10-25 19:22 ` Randy Brukardt 2011-10-25 19:35 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Randy Brukardt @ 2011-10-25 19:22 UTC (permalink / raw) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1059 bytes --] "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:5279agttaub8.1pl7pt496l1am$.dlg@40tude.net... > On Fri, 21 Oct 2011 14:53:11 +0200, Yannick Duch�ne (Hibou57) wrote: > >> Le Thu, 20 Oct 2011 19:35:21 +0200, Dmitry A. Kazakov >> <mailbox@dmitry-kazakov.de> a �crit: > >>>> What's missing from Interface type introduced with Ada 2005 ? >>> >>> 1. Most Ada types do not have interfaces >> Eiffel has this, and this is 1) not perfect (may lead to performance >> issue) 2) rarely used in practice > > There is no performance loss. Anytime you have a construct that allows multiple inheritance, there is a large performance loss (whether or not you use the multiple inheritance). You can move the performance loss from one construct to another (i.e. dispatching calls, access types, etc.) but you can't get rid of it. Keep in mind that "performance loss" means not just run-time but also space efficiency (which is important in a language used mainly in embedded systems). Randy. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-25 19:22 ` Randy Brukardt @ 2011-10-25 19:35 ` Dmitry A. Kazakov 2011-10-26 22:41 ` Randy Brukardt 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-25 19:35 UTC (permalink / raw) On Tue, 25 Oct 2011 14:22:27 -0500, Randy Brukardt wrote: > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > news:5279agttaub8.1pl7pt496l1am$.dlg@40tude.net... >> On Fri, 21 Oct 2011 14:53:11 +0200, Yannick Duch�ne (Hibou57) wrote: >> >>> Le Thu, 20 Oct 2011 19:35:21 +0200, Dmitry A. Kazakov >>> <mailbox@dmitry-kazakov.de> a �crit: >> >>>>> What's missing from Interface type introduced with Ada 2005 ? >>>> >>>> 1. Most Ada types do not have interfaces >>> Eiffel has this, and this is 1) not perfect (may lead to performance >>> issue) 2) rarely used in practice >> >> There is no performance loss. > > Anytime you have a construct that allows multiple inheritance, there is a > large performance loss (whether or not you use the multiple inheritance). > You can move the performance loss from one construct to another (i.e. > dispatching calls, access types, etc.) but you can't get rid of it. There is no time/memory loss, at all. For the types in question any legal Ada 2005 program would generate exactly same code as it would be the change. The performance argument is bogus, because it considers programs, which are presently impossible to write. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-25 19:35 ` Dmitry A. Kazakov @ 2011-10-26 22:41 ` Randy Brukardt 2011-10-27 7:43 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Randy Brukardt @ 2011-10-26 22:41 UTC (permalink / raw) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 2622 bytes --] "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:ci96gr5yzmpp$.1mwky141c6e78$.dlg@40tude.net... > On Tue, 25 Oct 2011 14:22:27 -0500, Randy Brukardt wrote: > >> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message >> news:5279agttaub8.1pl7pt496l1am$.dlg@40tude.net... >>> On Fri, 21 Oct 2011 14:53:11 +0200, Yannick Duch�ne (Hibou57) wrote: >>> >>>> Le Thu, 20 Oct 2011 19:35:21 +0200, Dmitry A. Kazakov >>>> <mailbox@dmitry-kazakov.de> a �crit: >>> >>>>>> What's missing from Interface type introduced with Ada 2005 ? >>>>> >>>>> 1. Most Ada types do not have interfaces >>>> Eiffel has this, and this is 1) not perfect (may lead to performance >>>> issue) 2) rarely used in practice >>> >>> There is no performance loss. >> >> Anytime you have a construct that allows multiple inheritance, there is a >> large performance loss (whether or not you use the multiple inheritance). >> You can move the performance loss from one construct to another (i.e. >> dispatching calls, access types, etc.) but you can't get rid of it. > > There is no time/memory loss, at all. For the types in question any legal > Ada 2005 program would generate exactly same code as it would be the > change. First of all, I was including Ada 2005 interfaces in this complaint -- so "Ada 2005" is irrelevant (you've already gone over the edge at that point). You *might* be right about Ada 95 programs, but it would require a substantial increase in compiler complexity in order to support that. But Ada compilers are already very complex - fairly close to the point where the complexity would overwhelm the ability to get them correct. It's much more likely that a much simpler design would be used for a pervasively multiple inheriting language where everything is much more expensive. You might think that such a compiler's output could be optimized to a more efficient version. Indeed, that was the original premise behind Janus/Ada (optimization could eliminate the cost of generic sharing, pervasive heap allocation of objects, etc.) But it didn't work, the optimizations were too complex to be very practical other than in the simplest of circumstances. Ultimately, we bit the bullet and supported multiple representations for arrays, records, and the like, because that got rid of a lot of the expense at the source. But it also added a whole lot of complexity to the compiler. It's possible that a from-scratch compiler design could do better, but I doubt it. And it seems unlikely that anyone will be doing one of those for Ada anytime soon. Randy. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-26 22:41 ` Randy Brukardt @ 2011-10-27 7:43 ` Dmitry A. Kazakov 2011-10-27 15:13 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-27 7:43 UTC (permalink / raw) On Wed, 26 Oct 2011 17:41:30 -0500, Randy Brukardt wrote: > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > news:ci96gr5yzmpp$.1mwky141c6e78$.dlg@40tude.net... >> On Tue, 25 Oct 2011 14:22:27 -0500, Randy Brukardt wrote: >> >>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message >>> news:5279agttaub8.1pl7pt496l1am$.dlg@40tude.net... >>>> On Fri, 21 Oct 2011 14:53:11 +0200, Yannick Duch�ne (Hibou57) wrote: >>>> >>>>> Le Thu, 20 Oct 2011 19:35:21 +0200, Dmitry A. Kazakov >>>>> <mailbox@dmitry-kazakov.de> a �crit: >>>> >>>>>>> What's missing from Interface type introduced with Ada 2005 ? >>>>>> >>>>>> 1. Most Ada types do not have interfaces >>>>> Eiffel has this, and this is 1) not perfect (may lead to performance >>>>> issue) 2) rarely used in practice >>>> >>>> There is no performance loss. >>> >>> Anytime you have a construct that allows multiple inheritance, there is a >>> large performance loss (whether or not you use the multiple inheritance). >>> You can move the performance loss from one construct to another (i.e. >>> dispatching calls, access types, etc.) but you can't get rid of it. >> >> There is no time/memory loss, at all. For the types in question any legal >> Ada 2005 program would generate exactly same code as it would be the >> change. > > First of all, I was including Ada 2005 interfaces in this complaint -- so > "Ada 2005" is irrelevant (you've already gone over the edge at that point). > You *might* be right about Ada 95 programs, but it would require a > substantial increase in compiler complexity in order to support that. But > Ada compilers are already very complex - fairly close to the point where the > complexity would overwhelm the ability to get them correct. Because the language is in a mess. That surely makes compilers complex. Without an overhaul it will collapse in some not so distant future anyway under the weight of arbitrary language patches. You wanted it complex, here you are! > It's much more > likely that a much simpler design would be used for a pervasively multiple > inheriting language where everything is much more expensive. Note that it was not about multiple inheritance. Yannick suggested that making types like Boolean, String, Integer etc to have classes and primitive operations would mean a performance loss. That is wrong. Introducing classes and primitive operation will cost strictly zero in *all* use cases, which are legal now. Other use cases (e.g. using class-wide objects and dispatching) are presently illegal, so the whole argument is bogus. As for MI, I doubt it very much that MI for *tagged* types would imply any overhead in *comparable* cases. But this is another discussion. Again, any such comparison should be correct. I don't care which cost MI inflicts on record members inherited through it, because it is not legal now, thus, irrelevant. Would inheritance from interfaces become more expensive? (a comparable case) I don't believe it. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-27 7:43 ` Dmitry A. Kazakov @ 2011-10-27 15:13 ` Yannick Duchêne (Hibou57) 2011-10-27 19:39 ` Robert A Duff 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-27 15:13 UTC (permalink / raw) Le Thu, 27 Oct 2011 09:43:22 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > Note that it was not about multiple inheritance. Yannick suggested that > making types like Boolean, String, Integer etc to have classes and > primitive operations would mean a performance loss. That is wrong. I exactly said this would require program analysis as a whole, at the cost of separate compilation, and thus also at the cost of dropping any kind of library, either shared or static. If any type is potentially the root of a class, then you have to avoid dynamic dispatching every where possible, and to do so, you need global analysis. If you don't, you get the direct performance issues, typical of interpreted languages. But I may be wrong if I am not replying to what you had in mind (not sure anymore I understand the topic). -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-27 15:13 ` Yannick Duchêne (Hibou57) @ 2011-10-27 19:39 ` Robert A Duff 2011-10-27 21:09 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: Robert A Duff @ 2011-10-27 19:39 UTC (permalink / raw) "Yannick Duch�ne (Hibou57)" <yannick_duchene@yahoo.fr> writes: > Le Thu, 27 Oct 2011 09:43:22 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: > >> Note that it was not about multiple inheritance. Yannick suggested that >> making types like Boolean, String, Integer etc to have classes and >> primitive operations would mean a performance loss. That is wrong. > I exactly said this would require program analysis as a whole, at the > cost of separate compilation, and thus also at the cost of dropping any > kind of library, either shared or static. If any type is potentially > the root of a class, then you have to avoid dynamic dispatching every > where possible, and to do so, you need global analysis. If you don't, > you get the direct performance issues, typical of interpreted languages. I'm not sure what whole-program analysis you're thinking of. In Ada, you can tell whether a procedure is dispatching at compile time of the declaration of that procedure. And you can tell whether a given call is dispatching at compile time of that call. No whole-program analysis needed. There would be some overhead when converting a Boolean to Boolean'Class. A Boolean should fit in 1 byte (or 1 bit if packed). So you don't want to store a Tag with every Boolean. Instead, you want to gin up the Tag on conversion to class-wide. But this overhead is not DISTRIBUTED overhead, so it doesn't matter. - Bob ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-27 19:39 ` Robert A Duff @ 2011-10-27 21:09 ` Yannick Duchêne (Hibou57) 2011-10-28 7:50 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-27 21:09 UTC (permalink / raw) Le Thu, 27 Oct 2011 21:39:31 +0200, Robert A Duff <bobduff@shell01.theworld.com> a écrit: > In Ada, you can tell whether a procedure is dispatching at compile time > of the declaration of that procedure. And you can tell whether a given > call is dispatching at compile time of that call. No whole-program > analysis needed. If a call is not dispatching, it may be anything you want, this will be the same: a deterministic call. If you want classes, this is for dispatching call I suppose (*), if you do not expect to use dispatching calls, you may not need classes. Finally, I suppose if one want classes, that means he/she want dispatching calls. If someone want dispatching call on some high level custom types, that's OK; if someone wants dispatching calls on general purpose and basic type like Boolean, that's not OK, this cost too much, unless optimized. To know when a call is dispatching or static, does not make dispatching calls less costly. If you want dispatching call to cost not too much, you have to optimize these calls, which requires the mentioned global analysis. If you do not think about dispatching calls, then you may not need classes. Or else, what kind of classes was this all about ? I may just have misunderstood the topic, I keep this in mind too. (*) Not necessarily, but I feel this is really how things typically goes. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-27 21:09 ` Yannick Duchêne (Hibou57) @ 2011-10-28 7:50 ` Dmitry A. Kazakov 2011-10-28 8:45 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-28 7:50 UTC (permalink / raw) On Thu, 27 Oct 2011 23:09:42 +0200, Yannick Duch�ne (Hibou57) wrote: > if you do not expect to use dispatching > calls, you may not need classes. No. If you don't expect to use the number 123, that does not imply that integer numbers shall not have that value. > Finally, I suppose if one want classes, > that means he/she want dispatching calls. If someone want dispatching call > on some high level custom types, that's OK; if someone wants dispatching > calls on general purpose and basic type like Boolean, that's not OK, 1. You should explain the difference. Why some types are more types than others? 2. Dispatching never happens on *a* type, it does on a *set* of types. In Ada you simply cannot have a dispatching call on a specific type. Ada is a typed language. > this cost too much, unless optimized. Nope, it does not cost anything. You are comparing costs of using something with the costs of not writing (and thus not executing) the program at all. Non-existing programs consume no resources. Again, in order to make comparison meaningful you have to consider comparable cases. For example, having a class you can put class-wide instances into a container. Without the class you have to write some variant record wrapper type with alternatives of different types and the discriminant playing the role of a tag. Now you could compare the performance of this poor man's class implementation and one of the proper class. > To know when a call is dispatching or static, does not make dispatching > calls less costly. If you want dispatching call to cost not too much, you > have to optimize these calls, Not in Ada, where specific and class-wide types are distinct. If you statically know the type you declare the object of that type. If you don't know the type, then presently for types which are not tagged, you cannot write the program at all. > Or else, what kind of classes was this all about ? Class = set of types closed upon inheritance. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-28 7:50 ` Dmitry A. Kazakov @ 2011-10-28 8:45 ` Yannick Duchêne (Hibou57) 2011-10-28 14:59 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-28 8:45 UTC (permalink / raw) Le Fri, 28 Oct 2011 09:50:03 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > 2. Dispatching never happens on *a* type, it does on a *set* of types. I meant on the class-wide view of a type (ok, sorry for the dirty wording). -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-28 8:45 ` Yannick Duchêne (Hibou57) @ 2011-10-28 14:59 ` Dmitry A. Kazakov 0 siblings, 0 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-28 14:59 UTC (permalink / raw) On Fri, 28 Oct 2011 10:45:07 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Fri, 28 Oct 2011 09:50:03 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: >> 2. Dispatching never happens on *a* type, it does on a *set* of types. > I meant on the class-wide view of a type (ok, sorry for the dirty wording). For non-tagged types there will no class-wide view at all. Conversion to class-wide will create new object. This is also the schema for ad-hoc supertypes and the subtypes which do not inherit the representation. For them type conversions must be "physical". -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 7:37 ` Dmitry A. Kazakov 2011-10-20 11:04 ` Yannick Duchêne (Hibou57) @ 2011-10-20 17:40 ` J-P. Rosen 2011-10-20 18:43 ` Dmitry A. Kazakov 2011-10-21 10:07 ` Vadim Godunko 1 sibling, 2 replies; 100+ messages in thread From: J-P. Rosen @ 2011-10-20 17:40 UTC (permalink / raw) Le 20/10/2011 09:37, Dmitry A. Kazakov a �crit : > On Wed, 19 Oct 2011 16:43:08 -0500, Randy Brukardt wrote: > >> > The only way to safely use a UTF-8 string is opaquely, which means you can >> > store it whole, but any operation on it is performed after decoding it. >> > That's of course the best argument for having it be a separate type. > Yes. It is worth to remember that Ada once was considered a strongly typed > language... > Different types represent things that are of different nature. It is not obvious that a difference in /encoding/ is sufficient to say that two things are of different nature. Consider also the problem with files. Is a UTF-8 file a text file? Do you want a UTF8_IO package? Normally, a UTF-8 file starts with a BOM in the first line, telling that the whole file is UTF8. How would you read that? Excerpt from AI137: --- When reading a file, a BOM can be expected as starting the first line of the file, but not subsequent lines. The proposed handling of BOM assumes the following pattern: 1) Read the first line. Call function Encoding on that line with an appropriate default to use if the line does not start with a BOM. Initialize the encoding scheme to the value returned by the function. 2) Decode all lines (including the first one) with the chosen encoding scheme. Since the BOM is ignored by Decode functions, it is not necessary to slice the first line specially. --- A possible alternative solution could be to make UTF_8_String a type derived from String (rather than a subtype). With conversions allowed, you would not lose Text_IO. I don't know if we'll have time to discuss this in Denver, but if you are serious about it, by all means get in touch with your standardization body and let them make a comment. There is no point in saying "that's how it should have been", and taking no action to that effect. -- --------------------------------------------------------- J-P. Rosen (rosen@adalog.fr) Adalog a d�m�nag� / Adalog has moved: 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 17:40 ` J-P. Rosen @ 2011-10-20 18:43 ` Dmitry A. Kazakov 2011-10-21 10:07 ` Vadim Godunko 1 sibling, 0 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-20 18:43 UTC (permalink / raw) On Thu, 20 Oct 2011 19:40:40 +0200, J-P. Rosen wrote: > Le 20/10/2011 09:37, Dmitry A. Kazakov a �crit : >> On Wed, 19 Oct 2011 16:43:08 -0500, Randy Brukardt wrote: >> >>> > The only way to safely use a UTF-8 string is opaquely, which means you can >>> > store it whole, but any operation on it is performed after decoding it. >>> > That's of course the best argument for having it be a separate type. >> Yes. It is worth to remember that Ada once was considered a strongly typed >> language... >> > Different types represent things that are of different nature. Depends on the meaning "different": 1. Differently implemented types representing same entities from the problem domain; 2. Incompatible types representing semantically different entities. > It is not > obvious that a difference in /encoding/ is sufficient to say that two > things are of different nature. #1 if encoding is not the problem domain, but an implementation detail, which should be the case for most application programming; #2 otherwise, e.g. in systems programming. > Consider also the problem with files. Is a UTF-8 file a text file? Do > you want a UTF8_IO package? Not likely. For text files I would prefer single Text_IO package consistently applying an appropriate recoding from the file encoding to the representation of the string type used in the operation. Of course, the targets, which do not support identification of the file encoding, will use the Form parameter to specify it explicitly. > A possible alternative solution could be to make UTF_8_String a type > derived from String (rather than a subtype). With conversions allowed, > you would not lose Text_IO. I don't know if we'll have time to discuss > this in Denver, but if you are serious about it, by all means get in > touch with your standardization body and let them make a comment. There > is no point in saying "that's how it should have been", and taking no > action to that effect. Yes, String types must be kept different in the sense #1 and same in the sense #2. That means that the type system should support classes (e.g. Wide_Wide_String'Class) comprising types of *different* implementation, which don't inherit representations from each other. This is not an issue of strings. It is a general problem, which must be approached generally. So far Ada has classes of shared representations for which upcasting and downcasting are view conversions. Classes of different representation should have physical conversions for T<->T'Class, T->S etc, creating new objects. Yes, it is inefficient, but when efficiency is an issue the type specific operations could always be overridden rather that inherited through conversion. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-20 17:40 ` J-P. Rosen 2011-10-20 18:43 ` Dmitry A. Kazakov @ 2011-10-21 10:07 ` Vadim Godunko 2011-10-21 11:25 ` J-P. Rosen 1 sibling, 1 reply; 100+ messages in thread From: Vadim Godunko @ 2011-10-21 10:07 UTC (permalink / raw) On Oct 20, 9:40 pm, "J-P. Rosen" <ro...@adalog.fr> wrote: > > A possible alternative solution could be to make UTF_8_String a type > derived from String (rather than a subtype). Why all around stick with concrete representation of textual information? Lets define text as logical sequence of Unicode code points, regardless of external representation (so, encoding); lets define new kind of "string" as private type, provide useful 'syntax sugar' to use it in 'usual' way and lets String/Wide_String/ Wide_Wide_String to die. I believe it is true Ada way to separate high level concept and low level representation. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 10:07 ` Vadim Godunko @ 2011-10-21 11:25 ` J-P. Rosen 2011-10-21 12:25 ` Yannick Duchêne (Hibou57) ` (2 more replies) 0 siblings, 3 replies; 100+ messages in thread From: J-P. Rosen @ 2011-10-21 11:25 UTC (permalink / raw) Le 21/10/2011 12:07, Vadim Godunko a �crit : > Why all around stick with concrete representation of textual > information? Lets define text as logical sequence of Unicode code > points, regardless of external representation (so, encoding); lets > define new kind of "string" as private type, provide useful 'syntax > sugar' to use it in 'usual' way and lets String/Wide_String/ > Wide_Wide_String to die. I believe it is true Ada way to separate high > level concept and low level representation. But that is exactly what Wide_Wide_String is! So you are proposing to drop Wide_Wide_String on the ground that it is visibly an array, and then provide a private type with a lot of (costly) machinery to allow it to be manipulated just as if it were an array? Come on! That's ultra-purism that brings zero improvement in practice. -- --------------------------------------------------------- J-P. Rosen (rosen@adalog.fr) Adalog a d�m�nag� / Adalog has moved: 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 11:25 ` J-P. Rosen @ 2011-10-21 12:25 ` Yannick Duchêne (Hibou57) 2011-10-21 13:13 ` Dmitry A. Kazakov 2011-10-21 18:55 ` Vadim Godunko 2 siblings, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-21 12:25 UTC (permalink / raw) Le Fri, 21 Oct 2011 13:25:39 +0200, J-P. Rosen <rosen@adalog.fr> a écrit: > Le 21/10/2011 12:07, Vadim Godunko a écrit : > >> Why all around stick with concrete representation of textual >> information? Lets define text as logical sequence of Unicode code >> points, regardless of external representation (so, encoding); lets >> define new kind of "string" as private type, provide useful 'syntax >> sugar' to use it in 'usual' way and lets String/Wide_String/ >> Wide_Wide_String to die. I believe it is true Ada way to separate high >> level concept and low level representation. > But that is exactly what Wide_Wide_String is! > > So you are proposing to drop Wide_Wide_String on the ground that it is > visibly an array, and then provide a private type with a lot of (costly) > machinery to allow it to be manipulated just as if it were an array? > > Come on! That's ultra-purism that brings zero improvement in practice. I have to agree with that pragmatic point of view. We should stick to it. Common boys and girls, if something is not good for you, design your own stuff. Wide_Wide_String just hold the same status as Text_IO: not meant to be universally suited to everything, but meant to be a basic implementation sufficient to quickly draw an application (either for pedagogical purpose or quick model delivery). Specific stuffs, requires specific designs, and it's up to you to do the art performance ;) Ada will never bring every thing for every purpose. P.S. Still feel a trouble with file names by the way. Should still be fixed, because this actually not even fulfill the above basic expectations. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 11:25 ` J-P. Rosen 2011-10-21 12:25 ` Yannick Duchêne (Hibou57) @ 2011-10-21 13:13 ` Dmitry A. Kazakov 2011-10-21 16:03 ` Yannick Duchêne (Hibou57) 2011-10-21 18:55 ` Vadim Godunko 2 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-21 13:13 UTC (permalink / raw) On Fri, 21 Oct 2011 13:25:39 +0200, J-P. Rosen wrote: > But that is exactly what Wide_Wide_String is! Not really. Wide_Wide_String is one possible implementation of logical Unicode string. There can be other implementations, e.g. String, Wide_String, UTF8_String, UTF16_String, EBCDIC_String, ASCII_String... All these implementation must be interchangeable and implement same logical string interface. The same applies to unbouded and fixed length strings. > Come on! That's ultra-purism that brings zero improvement in practice. On the contrary: 1. It would reduce by the factor 10 the number of packages; 2. It would statically ensure that the encoding is handled correctly. (I would bet that almost any Ada program is broken with that regard); 3. It would free the programmer from the burden of premature optimization; 4. It would make design of Ada bindings much simpler and safer. E.g. C_String could be an implementation of logical Unicode string compatible with null-terminated C strings. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 13:13 ` Dmitry A. Kazakov @ 2011-10-21 16:03 ` Yannick Duchêne (Hibou57) 2011-10-21 18:34 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-21 16:03 UTC (permalink / raw) Le Fri, 21 Oct 2011 15:13:54 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > Not really. Wide_Wide_String is one possible implementation of logical > Unicode string. And precisely, that implementation is sufficient (*). You can't expect Ada will provide a so much abstract implementation that it will cover all possible implementations. By the way, nothing disallows a compiler implementation to not use an array of 32 bits item to implement an array of Wide_Wide_Character. As long as the interface is preserved, this would be legal for a compiler to use any implementation it could to provide a Wide_Wide_String. As the purpose of Ada is to be a programming language, would be more relevant to focus on whether or not it is possible in Ada, to design an implementation rather than whether or not it provides a given implementation embedded in the language. It's not a set of libraries, it's a programming language (it's a common pitfall I feel, when people start confusing between libraries provided with languages and languages one their own). (*) And that implementation is a clean view, unlike the one of String holding UTF-8 data. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 16:03 ` Yannick Duchêne (Hibou57) @ 2011-10-21 18:34 ` Dmitry A. Kazakov 2011-10-21 19:30 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-21 18:34 UTC (permalink / raw) On Fri, 21 Oct 2011 18:03:03 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Fri, 21 Oct 2011 15:13:54 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: >> Not really. Wide_Wide_String is one possible implementation of logical >> Unicode string. > And precisely, that implementation is sufficient (*). Nope. Under Windows I rather need UTF-16 and ASCII. Under Linux it would be UTF-8 and RADIX-50 for RSX-11. > You can't expect Ada > will provide a so much abstract implementation that it will cover all > possible implementations. Why not? Why should not a language provide abstractions for character encoding? > (*) And that implementation is a clean view, unlike the one of String > holding UTF-8 data. You are confusing interface and implementation. This is one of Ada's problems that they are not clearly separated. Ada 83 pioneered the idea of such separation for user-defined private types, but was not consequent to support it for other types, especially, for arrays and records. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 18:34 ` Dmitry A. Kazakov @ 2011-10-21 19:30 ` Yannick Duchêne (Hibou57) 2011-10-21 20:02 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-21 19:30 UTC (permalink / raw) Le Fri, 21 Oct 2011 20:34:55 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: >>> Not really. Wide_Wide_String is one possible implementation of logical >>> Unicode string. >> And precisely, that implementation is sufficient (*). > > Nope. Under Windows I rather need UTF-16 and ASCII. Under Linux it would > be > UTF-8 and RADIX-50 for RSX-11. This is implementation. The model on either an UTF-8 or an UTF-16 system, would still be the one of Wide_Wide_Character. Linux may be UTF-8 internally, I use Unicode in Linux, not UTF-8. Windows may be UTF-16 internally, I use Unicode in Windows and not UTF-16. From within neither, one will access UTF-8 or UTF-16 low level storage units, and instread will access Unicode character at hight level. If a given compiler implements Wide_Wide_Character using one or another encoding, is another story. The error of using String in some areas of the standard packages, does not invalidate Wide_Wide_String. >> You can't expect Ada >> will provide a so much abstract implementation that it will cover all >> possible implementations. > > Why not? Why should not a language provide abstractions for character > encoding? A language is not a library, it provides most importantly, elementary semantic with which you design more complex things, more or less optionally built-ins models (which you can drop if you wish) for most important things or things identified as such (which is a subjective topic, you can just expect an average opinion), not for everything in the world. Providing a model for Unicode is reasonably enough. >> (*) And that implementation is a clean view, unlike the one of String >> holding UTF-8 data. > > You are confusing interface and implementation. This is one of Ada's > problems that they are not clearly separated. Ada 83 pioneered the idea > of > such separation for user-defined private types, but was not consequent to > support it for other types, especially, for arrays and records. Array and records are typically not to be publicly exposed. Most of time, when you define a record type, the record view appears in the package private part only, the same with arrays. The Ada standard library doesn't expose records (or else I can't recall one), but indeed exposes some arrays, which should be hidden in a clean design. However, this may be justified as a naive while still valid implementation, as much as a simple and efficient enough implementation. Arrays has an interface, even if this cannot be tweaked from the programmer's point of view. Array and records are basic bricks to implement types, not the core of the type models. This does not disallow pure abstract data types. There are non-perfect things in the library, but as long as Ada as a language allows to define what you need, opinions should be measured. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 19:30 ` Yannick Duchêne (Hibou57) @ 2011-10-21 20:02 ` Dmitry A. Kazakov 2011-10-21 20:36 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-21 20:02 UTC (permalink / raw) On Fri, 21 Oct 2011 21:30:45 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Fri, 21 Oct 2011 20:34:55 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: >>>> Not really. Wide_Wide_String is one possible implementation of logical >>>> Unicode string. >>> And precisely, that implementation is sufficient (*). >> >> Nope. Under Windows I rather need UTF-16 and ASCII. Under Linux it would >> be UTF-8 and RADIX-50 for RSX-11. > This is implementation. You wrote about the implementation being sufficient, which is evidently wrong. The interface = an array of code points indexed by some cardinal number is sufficient. The implementation = Wide_Wide_String is not. Ada does not allow you multiple implementations for this interface forming one class of types. Ada does not allow you constrained subtypes of the interface, e.g. narrower sets of code points (String), narrower ranges of the index (small embedded targets). Ada does not allow you alternative implementations like unbounded strings in the same class. This problem is a *fundamental* problem of the Ada type system. It must be addressed if the Ada wishes to stay a strongly typed language. >>> You can't expect Ada >>> will provide a so much abstract implementation that it will cover all >>> possible implementations. >> >> Why not? Why should not a language provide abstractions for character >> encoding? > A language is not a library, it provides most importantly, elementary > semantic with which you design more complex things, more or less > optionally built-ins models (which you can drop if you wish) for most > important things or things identified as such (which is a subjective > topic, you can just expect an average opinion), not for everything in the > world. Exactly this is what I want from Ada. >>> (*) And that implementation is a clean view, unlike the one of String >>> holding UTF-8 data. >> >> You are confusing interface and implementation. This is one of Ada's >> problems that they are not clearly separated. Ada 83 pioneered the idea of >> such separation for user-defined private types, but was not consequent to >> support it for other types, especially, for arrays and records. > Array and records are typically not to be publicly exposed. How so? They are two most used public interfaces of composite types in Ada! BTW the same applies to the numeric types, would you claim them used only privately too? -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 20:02 ` Dmitry A. Kazakov @ 2011-10-21 20:36 ` Yannick Duchêne (Hibou57) 2011-10-22 7:54 ` Dmitry A. Kazakov 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-21 20:36 UTC (permalink / raw) Le Fri, 21 Oct 2011 22:02:55 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > You wrote about the implementation being sufficient, which is evidently > wrong. > > The interface = an array of code points indexed by some cardinal number > is > sufficient. The implementation = Wide_Wide_String is not. That's what I wanted to mean: array interface implemented with its most naive implementation. The interface is good and is Ada side, the implementation may vary, and is compiler side. > Ada does not allow you multiple implementations for this interface > forming > one class of types. Ada does not allow you constrained subtypes of the > interface, e.g. narrower sets of code points (String), narrower ranges of > the index (small embedded targets). Ada does not allow you alternative > implementations like unbounded strings in the same class. That's not that easy. If you want to restrict the set of code point allowed in a container, you must care about preserving class properties. A subtype T1 of a type T0, is supposed to be a valid element where a type T0 is expected. However, if the actual is T1 and the expected type is T0 and the object is a target of some operation, then as an example, appending a code point outside of the restricted range, while valid with an actual of type T0, would be illegal with an actual of type T1. On the contrary, as a source T1 will always be valid where a T0 was expected. Conclusion: the interface could not remain the same, and different interface, means different type. Unsolvable (a common pitfall known of ancient Eiffel users). That's by the way one of the reason why assertions introduced with Ada 2012, are expected to be checked at runtime, because to warrant it to be statically valid, would lead to a real nightmare for the language maintainers (I keep in mind you don't enjoy runtime check, which is OK, if you assume all of the consequences). > This problem is a *fundamental* problem of the Ada type system. It must > be > addressed if the Ada wishes to stay a strongly typed language. If address nicely enough (except with some area like String and some part of access types) what it provide. Consistency in a narrow range is better than a wide range with inconsistencies, and to most people, a narrow range which can be reasonably implemented, is better than a perfect thing which cannot be implemented. That was one of the error Bertrand Meyer did, when he asserted language designers should not bother about whether of not a given language property is certain to be implementable. In real life, language designers have to care it is, and have to care it is reasonably. As said in a prior message, such languages already exist, but as far as I know, all I played with was either interpreted or inefficient languages (and all had names I cannot remember, sorry), which is not OK for Ada (for me it's OK if it lacks some purity, as long as it is safe and efficient enough). After all, may be what you need is not Ada! (would not be a shame). > How so? They are two most used public interfaces of composite types in > Ada! The language does not enforce it, this only occur in the standard library, and you remain free to not follow this design and choose your own if you wish. Just like the naming convention, I don't enjoy the one of the standard package, that does not prevent me from using my own,, the language does not enforce anything there. > BTW the same applies to the numeric types, would you claim them used only > privately too? Arguable in theory, not in practice. If it ever is, just use a language better suited for your very specific area. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 20:36 ` Yannick Duchêne (Hibou57) @ 2011-10-22 7:54 ` Dmitry A. Kazakov 2011-10-22 20:28 ` Yannick Duchêne (Hibou57) 2011-10-22 22:23 ` Yannick Duchêne (Hibou57) 0 siblings, 2 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-22 7:54 UTC (permalink / raw) On Fri, 21 Oct 2011 22:36:03 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Fri, 21 Oct 2011 22:02:55 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: > A > subtype T1 of a type T0, is supposed to be a valid element where a type T0 > is expected. This is handled in Ada by contracting Constraint_Error in the interfaces. > Conclusion: the interface could not remain the same, and different > interface, means different type. Here you confirm that LSP does work. But Ada does not base its type system on LSP. There cannot be any usable LSP-conform type system. > Unsolvable (a common pitfall known of ancient Eiffel users). They should better understand LSP and its implications. > That's by the way one of the reason why assertions > introduced with Ada 2012, Something non-substitutable remains non-substitutable independently on whatever assertions. The solution is trivial and was known already in Ada 83: add exception propagation *to* the postcondition. >> This problem is a *fundamental* problem of the Ada type system. It must be >> addressed if the Ada wishes to stay a strongly typed language. > If address nicely enough (except with some area like String and some part > of access types) what it provide. Consistency in a narrow range is better > than a wide range with inconsistencies, and to most people, a narrow range > which can be reasonably implemented, is better than a perfect thing which > cannot be implemented. That was one of the error Bertrand Meyer did, when > he asserted language designers should not bother about whether of not a > given language property is certain to be implementable. In real life, > language designers have to care it is, and have to care it is reasonably. Sorry, I don't understand the above, it reads like C advocacy, but I am not sure. What is your point? Strong typing is not necessary because inefficient? >> How so? They are two most used public interfaces of composite types in >> Ada! > The language does not enforce it, this only occur in the standard library, > and you remain free to not follow this design and choose your own if you > wish. Just like the naming convention, I am not forced to use strings either. After all, there exist successful languages without strings and arrays, e.g. C... >> BTW the same applies to the numeric types, would you claim them used only >> privately too? > Arguable in theory, not in practice. If it ever is, just use a language > better suited for your very specific area. To summarize your point: for practical reasons, Ada better become C. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-22 7:54 ` Dmitry A. Kazakov @ 2011-10-22 20:28 ` Yannick Duchêne (Hibou57) 2011-10-22 22:23 ` Yannick Duchêne (Hibou57) 1 sibling, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-22 20:28 UTC (permalink / raw) Le Sat, 22 Oct 2011 09:54:07 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > Sorry, I don't understand the above, it reads like C advocacy, but I am > not > sure. What is your point? Strong typing is not necessary because > inefficient? No, not efficiency via weakness (which works against efficiency anyway, as Python, JavaScript and others shows well), efficiency and safety via a “world” narrowed to what we are able to automatically handle, like SPARK do. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-22 7:54 ` Dmitry A. Kazakov 2011-10-22 20:28 ` Yannick Duchêne (Hibou57) @ 2011-10-22 22:23 ` Yannick Duchêne (Hibou57) 2011-10-23 7:53 ` Dmitry A. Kazakov 1 sibling, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-22 22:23 UTC (permalink / raw) Le Sat, 22 Oct 2011 09:54:07 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: >> Conclusion: the interface could not remain the same, and different >> interface, means different type. > > Here you confirm that LSP does work. But Ada does not base its type > system > on LSP. There cannot be any usable LSP-conform type system. It actually do. But it will never with a broken design, obviously. With the above example of sub‑typing a container for a subtype element type, the trouble does not come with Ada, but with the merge of two interfaces: input interface and output interface. It's a rule of thumb for me, to separate both, because after some experiences, I've learned sooner or later, you face troubles if you do not distinguish both (*). If you have two interfaces, one for input and one for output, there is no more trouble, you can subtype the Input interface to follow an element subtype. If you don't want to separate both, then you just can't subtype this way. You have to make a choice, and that's not Ada's fault, that's the domain's “fault”. Although Jean-Pierre underlined they are with Ada, some matters which are in practice above some others (which is true and OK to notice), nothing in Ada prevents you from using the good design, even if that design does match Ada niches and typical use cases (you may just have to not tell anyone, cheese). (*) I like to redesign sometime, with a read only abstract T1 and a concrete derived read/write T2. If you really believe Ada subtypes, *as Ada allows to use it*, does not conform to the substitution principle, can you provide an example ? Personally I don't see a trouble if the language does not allow you to do something it would not be able to handle. You know… better not run anything at all, than running an erroneous thing. Please Dmitry, could write down once a whole, all you comments about Ada. Even do it with a funny title, kind of “Ada criticisms (and _proposals_)” if you feel it. At least this would help to follow the story, because that's easy to forget what was already said and what was not, along with rationales and examples… I have a strange feeling of repeating myself, sometime, when I reply to you when you complain. Also, this would be an opportunity for better formalization of your comments. I don't enjoy talks which too much seems to be about taste when the subject is a rather formal thing (the language) and also when the subject is something into which many people invested a lot. Your paper could also be opened for comments (just like Ada do with its definition). An opportunity for better, longer and clearer clarifications in a single reference place would be nice. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-22 22:23 ` Yannick Duchêne (Hibou57) @ 2011-10-23 7:53 ` Dmitry A. Kazakov 2011-10-25 19:16 ` Randy Brukardt 0 siblings, 1 reply; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-23 7:53 UTC (permalink / raw) On Sun, 23 Oct 2011 00:23:14 +0200, Yannick Duch�ne (Hibou57) wrote: > If you really believe Ada subtypes, *as Ada allows to use it*, does not > conform to the substitution principle, can you provide an example ? Specialization (Ada subtype is a specialization) breaks LSP in out-operation (an operation with out parameters and/or result of the subtype) Generalization breaks in-operations. This does not mean that there is something wrong with specialization or generalization, only that subtyping cannot be based on LSP. Which is the reason why programming languages use so-called "subclassing" instead, read: non-LSP subtyping. Ada 83 missed Newspeak and called subtyping "subtyping". > Please Dmitry, could write down once a whole, all you comments about Ada. Not necessary, you can skip my moans and get right to the response: "The change .......(fill as appropriate)........ could break existing Ada programs, which is unacceptable, unless the cases when it would make Ada look more like Java, LISP, Perl, ....(put a disgusting language here)....., but it does not." (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-23 7:53 ` Dmitry A. Kazakov @ 2011-10-25 19:16 ` Randy Brukardt 0 siblings, 0 replies; 100+ messages in thread From: Randy Brukardt @ 2011-10-25 19:16 UTC (permalink / raw) "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:1l7zxjcrre04c.1taw8dtwqpkkh.dlg@40tude.net... ... > Not necessary, you can skip my moans and get right to the response: > > "The change .......(fill as appropriate)........ could break existing Ada > programs, which is unacceptable, unless the cases when it would make Ada > look more like Java, LISP, Perl, ....(put a disgusting language > here)....., > but it does not." > > (:-)) This is correct :-), with the exception of the "unless". All of the changes that make Ada look more like some "disgusting language" don't break any existing programs. We wouldn't have made the change otherwise. The few changes that could break existing programs are all about doing what we believe Ada was meant to do (such as properly composing "="); none of them have anything to do with looking like some other language. (I'm presuming that you are talking about things like prefix calls and conditional expressions here.) In addition, the new "indexing" sugar is intended to get us closer to your ideal of a fully abstract interface for arrays. It should make it possible to define a strongly typed Unicode_String that could have alternate implementations for different representations. (We don't yet have a good way to get literals for private types, a problem that we've never been able to solve although we haven't tried as hard as we should have.) Randy. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 11:25 ` J-P. Rosen 2011-10-21 12:25 ` Yannick Duchêne (Hibou57) 2011-10-21 13:13 ` Dmitry A. Kazakov @ 2011-10-21 18:55 ` Vadim Godunko 2011-10-21 19:18 ` J-P. Rosen 2011-10-21 19:41 ` Yannick Duchêne (Hibou57) 2 siblings, 2 replies; 100+ messages in thread From: Vadim Godunko @ 2011-10-21 18:55 UTC (permalink / raw) On Oct 21, 3:25 pm, "J-P. Rosen" <ro...@adalog.fr> wrote: > > But that is exactly what Wide_Wide_String is! > Wide_Wide_String is just another kind of representation - UCS-4/ UTF-32. > So you are proposing to drop Wide_Wide_String on the ground that it is > visibly an array, and then provide a private type with a lot of (costly) > machinery to allow it to be manipulated just as if it were an array? > All kinds of strings are still useful in my model (String for ISO-8859-1, Wide_String for UCS-2 and Wide_Wide_String for UCS-4), and they are required to represent string literals. Internal representation of data in such private type can be optimized for use in concrete domain; but source code which use it still be portable. Actually, near to nobody use Wide_Wide_String in real applications. Why? > Come on! That's ultra-purism that brings zero improvement in practice. > Its done already. ;-) ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 18:55 ` Vadim Godunko @ 2011-10-21 19:18 ` J-P. Rosen 2011-10-21 19:41 ` Yannick Duchêne (Hibou57) 1 sibling, 0 replies; 100+ messages in thread From: J-P. Rosen @ 2011-10-21 19:18 UTC (permalink / raw) Le 21/10/2011 20:55, Vadim Godunko a �crit : > Actually, near to nobody use Wide_Wide_String in real applications. > Why? > Because there is close to zero need, especially considering the kind of domains where Ada is used. Wide_Wide_String was added only because it was a requirement from JTC1. And frankly, I prefer that implementers spend their precious time in improving the parts of the compiler that most users need, rather than satifying aesthetic views of abstract strings. -- --------------------------------------------------------- J-P. Rosen (rosen@adalog.fr) Adalog a d�m�nag� / Adalog has moved: 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-21 18:55 ` Vadim Godunko 2011-10-21 19:18 ` J-P. Rosen @ 2011-10-21 19:41 ` Yannick Duchêne (Hibou57) 1 sibling, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-21 19:41 UTC (permalink / raw) Le Fri, 21 Oct 2011 20:55:41 +0200, Vadim Godunko <vgodunko@gmail.com> a écrit: > Actually, near to nobody use Wide_Wide_String in real applications. > Why? Lack of habits (*), too much people are used to US-ASCII, or Latin-1 at best, as much as depending on the application area. Also as Jean-Pierre said, most Ada niches don't have to deal with. Some other areas will have to bother, like UI, web applications, authoring applications, …. (*) That's not Ada specific, the same with C/C++ and some other common languages, even including Python. Most application designers only care of their own native language and don't bother about foreign languages… troubles will then come later ;) -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 15:02 ` Adam Beneschan 2011-10-18 15:16 ` Dmitry A. Kazakov @ 2011-10-18 22:54 ` ytomino 1 sibling, 0 replies; 100+ messages in thread From: ytomino @ 2011-10-18 22:54 UTC (permalink / raw) On Oct 19, 12:02 am, Adam Beneschan <a...@irvine.com> wrote: > I think we have a terminology problem. OK, sorry that my point of the argument was not put in order well. Do confirming. > Latin-1 is a set of characters (a subset of the full Unicode character set). Yes. And it's also used as name of encoding. (ISO 8859-1, like Yannick calls) > So I get > confused when people talk about Latin-1 versus UTF-8 strings as if > they were mutually exclusive. They're not, the way I understand the > terms. You can have a string composed of Latin-1 characters that's > represented using UTF-8 encoding; and the bits in that string would be > different from a string of the same Latin-1 characters using the > "regular" encoding, if any character in the string is in the 16#80#.. > 16#FF# range. Yes. "Latin-1 as character set" is not exclusive with Unicode (UCS-2 or UCS-4). "Latin-1 as encoding" is exclusive with UTF-8. And then, I (we?) talked about "Latin-1 as encoding". > On the other hand, I was confused by your statement > "Ada.Character.Handling.To_Upper breaks UTF-8". I don't even see a > way for this to make sense. Ada.Characters.Handling works on > character types, and a character type is an enumeration type; but a > UTF-8 "character" can't be an enumeration type at all, since it's a > variable-length sequence of 8-bit bytes. I'm not quite sure what you > meant here. Ada.Characters and Ada.Strings are defined to work with "Latin-1 as encoding" in String type. Some subprograms (like To_Upper) in these will replace upper half characters (16#80#..) to meaningless values in String holding UTF-8, if we invoke these with UTF-8 String. (Equal_Case_Insensitive does not replace characters, but returns meaningless value if parameters have upper half characters encoded as UTF-8.) Of course, Ada.Wide_Wide_Characters.Handling.To_Upper (UTF_Encoding.Wide_Wide_Strings.Decode (any UTF-8 encoded string)) works fine. > As to having utilities such as versions of Ada.Strings.Unbounded or > Ada.Strings.Fixed that work directly on UTF-8-encoded strings (and > versions of Ada.Characters that operate on single UTF-8-encoded > characters): it's certainly possible to write a package like that, and > anyone is free to do so, but I just don't think they'd be widely used > enough to add to the Standard. I could be wrong. I throught the standard library is going to be separated UTF-8 from Latin-1, when read about UTF-8 mode of Form parameter that Randy says. Latin-1 is not familiar for me usually, so I has wanted UTF-8 versions of Ada.Characters. Sorry that my personal wish was mixed. But it's certain that the standard library has some lacks for handling non-ASCII file names. By the way... I probably will confuse you more :-) Do you know that single code-point is NOT single letter for display? Unicode has "composed character". The cases is existing that plural code-points represent single real letter. (refer http://www.unicode.org/reports/tr15/tr15-33.html) In addition, Unicode has "variation selector", This is a decorator for previous letter (possible to mix with composed character). (refer http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html) Therefore, the difficulty of handling Wide_Wide_String is similar to the difficulty of handling encoded (UTF-8 or other format) string, in fact. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 1:10 ` Adam Beneschan 2011-10-18 2:32 ` ytomino @ 2011-10-18 3:15 ` Yannick Duchêne (Hibou57) 2011-10-18 7:55 ` Dmitry A. Kazakov 2 siblings, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-18 3:15 UTC (permalink / raw) Le Tue, 18 Oct 2011 03:10:35 +0200, Adam Beneschan <adam@irvine.com> a écrit: > That's why it doesn't > make sense to have string routines (like > Ada.Strings.Equal_Case_Insensitive or Ada.Character_Handling.To_Upper) > that work with UTF-8. That would make sens, if String was an array container for _one_ Unicode character subset and nothing else. What a mess if someone pass an UTF-8 string to such a casing mapping method (*)… don't expect to decode it after that. (*) Except if the string is restricted to US-ASCII, in which case, you will not get anything wrong, but just still a pure US-ASCII string, which is always UTF-8 by definition. Not the same story for ISO-8859-1 strings. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 1:10 ` Adam Beneschan 2011-10-18 2:32 ` ytomino 2011-10-18 3:15 ` Yannick Duchêne (Hibou57) @ 2011-10-18 7:55 ` Dmitry A. Kazakov 2011-10-18 9:41 ` Yannick Duchêne (Hibou57) ` (2 more replies) 2 siblings, 3 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-18 7:55 UTC (permalink / raw) On Mon, 17 Oct 2011 18:10:35 -0700 (PDT), Adam Beneschan wrote: > I have a feeling you're fundamentally confused about what UTF-8 is, as > compared to "Latin-1". Latin-1 is a character mapping. It defines, > for all integers in the range 0..255, what character that integer > represents (e.g. 77 represents 'M', etc.). Unicode is a character > mapping that defines characters for a much larger integer range. No, Unicode is a standard describes character mappings. Both UTF-8 and Latin-1 are encodings. Latin-1 as an encoding has a property that there is 1-1 octet to code point correspondence, at the cost that some (most) of code points cannot be represented by the encoding. UTF-8 lacks this property, but is capable to represent all code points. > Because of this, it is not feasible to work with strings or characters > in UTF-8 encoding. Suppose you declare a string > > S : String (1 .. 100); > > but you want it to be a UTF-8 string. How would that work? If you > want to look at S(50), the computer would have to start at the > beginning of the string and figure out whether each character is > represented as 1 or 2 bytes. Nobody wants that. Nobody actually cares, because strings are not processed that way. String indices are obtained in the course of operations which keep them at the beginnings of properly encoded code points. It is a language problem to distinguish index (some index type) and position (cardinal number). Ada does this BTW. When you write S(50), what is 50 here? 50th character (code point) counting from the beginning of the string or the index 50 of a character which position is unknown without looking into the string? Considering the declaration of String, it is not clear if Positive is a position or proper index. For the latter S(50) just does is not read as "50th character". Furthermore it is not guaranteed that of 50 is a valid index then 51 is valid too. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 7:55 ` Dmitry A. Kazakov @ 2011-10-18 9:41 ` Yannick Duchêne (Hibou57) 2011-10-18 10:25 ` J-P. Rosen 2011-10-18 15:34 ` Adam Beneschan 2 siblings, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-18 9:41 UTC (permalink / raw) Le Tue, 18 Oct 2011 09:55:07 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > No, Unicode is a standard describes character mappings. Both UTF-8 and > Latin-1 are encodings. Latin-1 as an encoding has a property that there > is 1-1 octet to code point correspondence, at the cost that some (most) > of > code points cannot be represented by the encoding. UTF-8 lacks this > property, To not mislead people, don't forget UTF-8 too has this property, in regard to US-ASCII, which also is an Unicode subset. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 7:55 ` Dmitry A. Kazakov 2011-10-18 9:41 ` Yannick Duchêne (Hibou57) @ 2011-10-18 10:25 ` J-P. Rosen 2011-10-18 10:56 ` Yannick Duchêne (Hibou57) 2011-10-18 15:34 ` Adam Beneschan 2 siblings, 1 reply; 100+ messages in thread From: J-P. Rosen @ 2011-10-18 10:25 UTC (permalink / raw) Le 18/10/2011 09:55, Dmitry A. Kazakov a �crit : > No, Unicode is a standard describes character mappings. True > Both UTF-8 and > Latin-1 are encodings. Wrong. Latin-1 is the name of the lower left corner of the BMP (Basic Multilingual Plan, or Plan 0 of ISO-10646) -- --------------------------------------------------------- J-P. Rosen (rosen@adalog.fr) Adalog a d�m�nag� / Adalog has moved: 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 10:25 ` J-P. Rosen @ 2011-10-18 10:56 ` Yannick Duchêne (Hibou57) 0 siblings, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-18 10:56 UTC (permalink / raw) Le Tue, 18 Oct 2011 12:25:17 +0200, J-P. Rosen <rosen@adalog.fr> a écrit: > Le 18/10/2011 09:55, Dmitry A. Kazakov a écrit : > >> No, Unicode is a standard describes character mappings. > True > >> Both UTF-8 and >> Latin-1 are encodings. > Wrong. Latin-1 is the name of the lower left corner of the BMP (Basic > Multilingual Plan, or Plan 0 of ISO-10646) May I add to avoid confusion in readers mind, what I named ISO 8859-1 in this thread, is the formal name of Latin-1. Both refers to the same thing, Latin-1 is kind of its friendly name. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 7:55 ` Dmitry A. Kazakov 2011-10-18 9:41 ` Yannick Duchêne (Hibou57) 2011-10-18 10:25 ` J-P. Rosen @ 2011-10-18 15:34 ` Adam Beneschan 2011-10-18 17:27 ` J-P. Rosen 2 siblings, 1 reply; 100+ messages in thread From: Adam Beneschan @ 2011-10-18 15:34 UTC (permalink / raw) On Oct 18, 12:55 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote: > On Mon, 17 Oct 2011 18:10:35 -0700 (PDT), Adam Beneschan wrote: > > I have a feeling you're fundamentally confused about what UTF-8 is, as > > compared to "Latin-1". Latin-1 is a character mapping. It defines, > > for all integers in the range 0..255, what character that integer > > represents (e.g. 77 represents 'M', etc.). Unicode is a character > > mapping that defines characters for a much larger integer range. > > No, Unicode is a standard describes character mappings. Both UTF-8 and > Latin-1 are encodings. Latin-1 as an encoding has a property that there is > 1-1 octet to code point correspondence, at the cost that some (most) of > code points cannot be represented by the encoding. UTF-8 lacks this > property, but is capable to represent all code points. Sigh... I guess you're right about the term "Latin-1". It appears to be *both* a character mapping *and* an encoding, based on a bit of Wikipedia research. The problem for me is this: what does that make Latin-2, Latin-3, KOI8-R, etc.? Those seem to describe the same encoding mechanism as Latin-1 (each code represented as one 8-bit byte), but with different meanings for the codes in the 16#A0#..16#FF# range. So the same encoding scheme seems to have multiple different names. That's very confusing to me. I've tended to look at character-set issues as having two independent parts: part 1 is how do we define the correspondence between integers and the character symbols [or other "characters" with special meanings like control characters]; and part 2 is, once we have a sequence of integers that correspond to those characters, how do we represent that sequence in memory, in a file, when sending bits over a wire, etc. The two parts appear completely independent to me, which is why I get confused when a term like "Latin-1" is used that straddles both parts. (Unless we decree that Unicode is the only mapping in existence, and things like Latin-2 or KOI8-R are encodings in which bytes in the 16#A0#..16#FF# range represent integers which are totally different and which are defined by the Unicode standard?) I guess I'll have to learn what people mean by their terms. I had some misimpressions. And I think we could solve a lot by making String a more abstract type defined by its operations rather than by its representation (array of character). For a new language, as opposed to one in which we're trying to maintain backward compatibility with a language designed in the 1980s, that would be a great idea. (I *don't* think it was a good idea to define UTF8_String as a subtype of String, and to decide that a String could be used as a sequence of bytes that had no direct correspondence to any characters from a character set. That seems like a big compromise. On the other hand, doing it "right" would have been a lot of work which I wouldn't have had to do, most of it unpaid. So I'm hesitant to complain too much.) -- Adam ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 15:34 ` Adam Beneschan @ 2011-10-18 17:27 ` J-P. Rosen 2011-10-18 18:33 ` Adam Beneschan 2011-10-18 19:54 ` Yannick Duchêne (Hibou57) 0 siblings, 2 replies; 100+ messages in thread From: J-P. Rosen @ 2011-10-18 17:27 UTC (permalink / raw) Le 18/10/2011 17:34, Adam Beneschan a �crit : > On Oct 18, 12:55 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> > wrote: >> On Mon, 17 Oct 2011 18:10:35 -0700 (PDT), Adam Beneschan wrote: >>> I have a feeling you're fundamentally confused about what UTF-8 is, as >>> compared to "Latin-1". Latin-1 is a character mapping. It defines, >>> for all integers in the range 0..255, what character that integer >>> represents (e.g. 77 represents 'M', etc.). Unicode is a character >>> mapping that defines characters for a much larger integer range. >> >> No, Unicode is a standard describes character mappings. Both UTF-8 and >> Latin-1 are encodings. Latin-1 as an encoding has a property that there is >> 1-1 octet to code point correspondence, at the cost that some (most) of >> code points cannot be represented by the encoding. UTF-8 lacks this >> property, but is capable to represent all code points. > > Sigh... I guess you're right about the term "Latin-1". It appears to > be *both* a character mapping *and* an encoding, based on a bit of > Wikipedia research. The problem for me is this: what does that make > Latin-2, Latin-3, KOI8-R, etc.? Those seem to describe the same > encoding mechanism as Latin-1 (each code represented as one 8-bit > byte), but with different meanings for the codes in the 16#A0#..16#FF# > range. So the same encoding scheme seems to have multiple different > names. That's very confusing to me. > Not 100% sure, but I think here is the picture. 1) Code points are always 31 bits (or maybe 30). 2) Below is the lower left corner of BMP (use fixed fonts!): | |____________________ | | | | Latin 1 | Latin 2 | |_________|_________|_______ The lower halves of Latin-1 and Latin-2 are identical, i.e. the same characters have two different code-points, differing by 256. When you use Latin-1 with 8 bit bytes, you can view this as an encoding with the 24 upper bits being 16#00_00_00#. When you use Latin-2 with 8 bit bytes, you can view this as an encoding with the 24 upper bits being 16#00_00_01#. So in a sense, Latin-1 and Latin-2 are both character sets, and when represented on only 8 bits, an encoding. Does this make sense? -- --------------------------------------------------------- J-P. Rosen (rosen@adalog.fr) Adalog a d�m�nag� / Adalog has moved: 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 17:27 ` J-P. Rosen @ 2011-10-18 18:33 ` Adam Beneschan 2011-10-18 19:54 ` Yannick Duchêne (Hibou57) 1 sibling, 0 replies; 100+ messages in thread From: Adam Beneschan @ 2011-10-18 18:33 UTC (permalink / raw) On Oct 18, 10:27 am, "J-P. Rosen" <ro...@adalog.fr> wrote: > Le 18/10/2011 17:34, Adam Beneschan a crit : > > > > > On Oct 18, 12:55 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> > > wrote: > >> On Mon, 17 Oct 2011 18:10:35 -0700 (PDT), Adam Beneschan wrote: > >>> I have a feeling you're fundamentally confused about what UTF-8 is, as > >>> compared to "Latin-1". Latin-1 is a character mapping. It defines, > >>> for all integers in the range 0..255, what character that integer > >>> represents (e.g. 77 represents 'M', etc.). Unicode is a character > >>> mapping that defines characters for a much larger integer range. > > >> No, Unicode is a standard describes character mappings. Both UTF-8 and > >> Latin-1 are encodings. Latin-1 as an encoding has a property that there is > >> 1-1 octet to code point correspondence, at the cost that some (most) of > >> code points cannot be represented by the encoding. UTF-8 lacks this > >> property, but is capable to represent all code points. > > > Sigh... I guess you're right about the term "Latin-1". It appears to > > be *both* a character mapping *and* an encoding, based on a bit of > > Wikipedia research. The problem for me is this: what does that make > > Latin-2, Latin-3, KOI8-R, etc.? Those seem to describe the same > > encoding mechanism as Latin-1 (each code represented as one 8-bit > > byte), but with different meanings for the codes in the 16#A0#..16#FF# > > range. So the same encoding scheme seems to have multiple different > > names. That's very confusing to me. > > Not 100% sure, but I think here is the picture. > 1) Code points are always 31 bits (or maybe 30). > 2) Below is the lower left corner of BMP (use fixed fonts!): > > | > |____________________ > | | | > | Latin 1 | Latin 2 | > |_________|_________|_______ > > The lower halves of Latin-1 and Latin-2 are identical, i.e. the same > characters have two different code-points, differing by 256. > > When you use Latin-1 with 8 bit bytes, you can view this as an encoding > with the 24 upper bits being 16#00_00_00#. When you use Latin-2 with 8 > bit bytes, you can view this as an encoding with the 24 upper bits being > 16#00_00_01#. > > So in a sense, Latin-1 and Latin-2 are both character sets, and when > represented on only 8 bits, an encoding. > > Does this make sense? No, I don't think so. In Latin-2 (ISO/IEC-8859-2), the code points 16#00#..16#A0# have the same meanings as in Latin-1 and Unicode. Past that, though, the correspondence is all over the place. Thus, 16#A1# in Latin-2 corresponds to 16#0104# in the Unicode BMP; 16#A2# -> 16#02D8#, 16#A3# -> 16#0141#, 16#A5# -> 16#013D#, etc. -- Adam ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 17:27 ` J-P. Rosen 2011-10-18 18:33 ` Adam Beneschan @ 2011-10-18 19:54 ` Yannick Duchêne (Hibou57) 1 sibling, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-18 19:54 UTC (permalink / raw) Le Tue, 18 Oct 2011 19:27:37 +0200, J-P. Rosen <rosen@adalog.fr> a écrit: > 1) Code points are always 31 bits (or maybe 30). Less than that ;) The last valid code-point is actually 16#10FFFF#, which is 21 bits wide. This is for valid code-points, only, because this one is not even assigned to anything (belongs to the private-use-area, plan #16). The last code-point with assigned semantic but without glyph, is 16#E0FFF# and the last assigned code-point with assigned glyph and semantic is 16#2FFFF#. Well, beside these details, the last code-point will very-very probably never go beyond 16#10FFFF#, and if an application does not expect to define private code-point for internal use, then the last valid code-point can be defined as 16#EOFFF# which is 20 bits wide. Counted in bytes, this turn out to be 3 bytes in all cases, not 4. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 23:47 ` ytomino 2011-10-18 1:10 ` Adam Beneschan @ 2011-10-18 8:01 ` Dmitry A. Kazakov 1 sibling, 0 replies; 100+ messages in thread From: Dmitry A. Kazakov @ 2011-10-18 8:01 UTC (permalink / raw) On Mon, 17 Oct 2011 16:47:49 -0700 (PDT), ytomino wrote: > But other libraries in the standard are explicitly defined as Latin-1. > It's certain that Ada.Character.Handling.To_Upper breaks UTF-8. > So we can not use almost subprograms in Ada.Characters and Ada.Strings > for handling file names. Right, it is lot more than just Ada.Directories. I have implemented UTF-8 versions of Ada.Strings.Handling and Ada.Strings.Maps: sets and maps of characters, case conversions, character characterization, superscript and subscript integer I/O. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-17 21:33 ` Randy Brukardt 2011-10-17 23:47 ` ytomino @ 2011-10-18 2:59 ` Yannick Duchêne (Hibou57) 2011-10-18 4:07 ` Michael Rohan ` (2 more replies) 1 sibling, 3 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-18 2:59 UTC (permalink / raw) Le Mon, 17 Oct 2011 23:33:28 +0200, Randy Brukardt <randy@rrsoftware.com> a écrit: > Say what? > > Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to store > UTF-8 encoded strings. *Please, note the following in just personal opinion* (just want to tell what I feel, don't expect to hurt any one) Every one know and noticed, while this is still confusing “bytes and character” like C did. Eiffel had an implementation of UTF-8 string, which was different than the default ASCII string, and you could not access bytes from it, there was proper encapsulation and type check. It happened I used a similar abstraction in a tiny Ada application. Unless it is required there is a BOM at the beginning of each UTF-8 string, and this BOM is required to always be checked --- will have to check the new RM, but feel the answer is No ---, confusing both types into a single one is not that clean --- even if the answer was Yes, this would only be dynamic check, and not static check. I feel it is more an implementation trick (which was indeed intended by the design of UTF-8 targeting some hardly solvable context), than a clean formalization. Try to iterate over an element of type String. What did you get if it is a proper ISO 8859-1 srtring ? You get Characters. What did you get if it is UTF-8 ? You get garbage and “random who-know-what-it-is”, … _and the type system does not catch it_ (*), while it is is one of its primarily intent. By the way, if ISO/ANSI string and UTF-8 strings are the same, then what is Wide_Character ? Unicode Basic Plan or UTF-16LE or UTF-16BE or guess ? This will not break Ada values to the eye of most people (**), but I believe these and some other people noticed the same. (*) Both types are not even structurally compatible. (**) That's a library design flaw, not a language flaw! The difference between both, is that if a library part is not strongly tight into the language definition like IO attributes or finalization behaviors are, one always has the provision to work it around using its own library. But still lost the interest of a standard library. -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 2:59 ` Yannick Duchêne (Hibou57) @ 2011-10-18 4:07 ` Michael Rohan 2011-10-18 4:54 ` ytomino 2011-10-18 10:10 ` J-P. Rosen 2 siblings, 0 replies; 100+ messages in thread From: Michael Rohan @ 2011-10-18 4:07 UTC (permalink / raw) Hi, Just to confirm "ytomino" take on things: while I started this on Ada.Directories, I have fallen into the practice of simply doing From_UTF8 on anything coming from the environment (Ada.Command_Line, Ada.Environment_Variables, etc) and To_UTF8 on the way out. This works for my Linux system (en-US, Latin-1, no surprise), but using Wide_String internally, the external/internal interface Strings need to be converted somehow and UTF8 is as reasonable option. As to the use of the Form parameter, additional standardization might be needed. With GNAT, the Form (for Open at least) can be used to define the encoding of the file contents but not of the file name. Take care, Michael. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 2:59 ` Yannick Duchêne (Hibou57) 2011-10-18 4:07 ` Michael Rohan @ 2011-10-18 4:54 ` ytomino 2011-10-18 9:54 ` Yannick Duchêne (Hibou57) 2011-10-18 10:10 ` J-P. Rosen 2 siblings, 1 reply; 100+ messages in thread From: ytomino @ 2011-10-18 4:54 UTC (permalink / raw) Excuse my digression. On Oct 18, 11:59 am, Yannick Duchêne (Hibou57) <yannick_duch...@yahoo.fr> wrote: > Eiffel had an implementation of UTF-8 string, which was different > than the default ASCII string, and ***you could not access bytes from it*** (about ***) Really? I'm novice about Eiffel, but I think that accessing bytes of encoded string is worth. I can not believe it at once. So I did google. If you talk about UNICODE_STRING, it seems decoded from UTF-8 to code- points array (like Wide_Wide_String). http://www.maths.tcd.ie/~odunlain/eiffel/html/base/UNICODE_STRING.html If you talk about Eiffel.NET, it seems having byte_count and byte_item. http://www.eiffelroom.org/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 4:54 ` ytomino @ 2011-10-18 9:54 ` Yannick Duchêne (Hibou57) 2011-10-18 10:52 ` ytomino 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-18 9:54 UTC (permalink / raw) Le Tue, 18 Oct 2011 06:54:22 +0200, ytomino <aghia05@gmail.com> a écrit: > Excuse my digression. > > On Oct 18, 11:59 am, Yannick Duchêne (Hibou57) > <yannick_duch...@yahoo.fr> wrote: >> Eiffel had an implementation of UTF-8 string, which was different >> than the default ASCII string, and ***you could not access bytes from >> it*** > > (about ***) > > Really? I'm novice about Eiffel, but I think that accessing bytes of > encoded string is worth. I can not believe it at once. So I did > google. > If you talk about UNICODE_STRING, it seems decoded from UTF-8 to code- > points array (like Wide_Wide_String). > http://www.maths.tcd.ie/~odunlain/eiffel/html/base/UNICODE_STRING.html > If you talk about Eiffel.NET, it seems having byte_count and > byte_item. > http://www.eiffelroom.org/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net “Really?” Yes Ytomino ;) You obviously need to initialize it in some way, but until initialized, you can't access individuals bytes, and all indexes you pass to UNICODE_STRING methods, are character index, never byte index. In the former link, just to a search using “-- Get ”, to search for basic accessors: you have two, one as a method, one as an operator (you will easily guess, the syntax is Ada inspired), and both expect a character index, not a byte index. You obviously have initializers and converters to and from other encoding, and methods to check if the actual content match some restricted Unicode range to avoid runtime error (defensive programming), but the main interface, accesses characters, not bytes at all. It's a long time I did not write any Eiffel stuff. Thanks for the above link, was a pleasure to see :) (some days ago, some one else posted some stuff with Eiffel inside too) -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 9:54 ` Yannick Duchêne (Hibou57) @ 2011-10-18 10:52 ` ytomino 2011-10-18 11:02 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 100+ messages in thread From: ytomino @ 2011-10-18 10:52 UTC (permalink / raw) On Oct 18, 6:54 pm, Yannick Duchêne (Hibou57) <yannick_duch...@yahoo.fr> wrote: > > “Really?” Yes Ytomino ;) You obviously need to initialize it in some way, > but until initialized, you can't access individuals bytes, and all indexes > you pass to UNICODE_STRING methods, are character index, never byte index. > > In the former link, just to a search using “-- Get ”, to search for basic > accessors: you have two, one as a method, one as an operator (you will > easily guess, the syntax is Ada inspired), and both expect a character > index, not a byte index. You obviously have initializers and converters to > and from other encoding, and methods to check if the actual content match > some restricted Unicode range to avoid runtime error (defensive > programming), but the main interface, accesses characters, not bytes at > all. > > It's a long time I did not write any Eiffel stuff. Thanks for the above > link, was a pleasure to see :) (some days ago, some one else posted some > stuff with Eiffel inside too) > > -- > “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on > Programming — Alan J. — P. Yale University] > “Structured Programming supports the law of the excluded muddle.” [Idem] > Java: Write once, Never revisit OK, I've understood. But, UNICODE_STRING is usually not called "UTF-8 string". Because the content is decoded. UNICODE_STRING seems just array of UCS-32 code points to me. It's called "UTF-32 string" commonly. (It's same as that Wide_Wide_String is not called UTF-8 string.) ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 10:52 ` ytomino @ 2011-10-18 11:02 ` Yannick Duchêne (Hibou57) 2011-10-18 21:18 ` ytomino 0 siblings, 1 reply; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-18 11:02 UTC (permalink / raw) Le Tue, 18 Oct 2011 12:52:01 +0200, ytomino <aghia05@gmail.com> a écrit: > OK, I've understood. > But, UNICODE_STRING is usually not called "UTF-8 string". Because the > content is decoded. > UNICODE_STRING seems just array of UCS-32 code points to me. It's > called "UTF-32 string" commonly. > (It's same as that Wide_Wide_String is not called UTF-8 string.) If my mind is still right since the time I get into SmallEiffel compiler's sources (back to 1999 and 2000), this was implemented with UTF-8 for memory efficiency. May be its successor, SmartEiffel, less memory efficient, was different. Has underlined by Dmitry, the best way is to see it as a sequence of code points, as you first said, indeed (although directly mappable to code points, UTF-32, still formally refers to an encoding, although a straight and direct encoding… but don't mind, that's just a detail). -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 11:02 ` Yannick Duchêne (Hibou57) @ 2011-10-18 21:18 ` ytomino 0 siblings, 0 replies; 100+ messages in thread From: ytomino @ 2011-10-18 21:18 UTC (permalink / raw) On Oct 18, 8:02 pm, Yannick Duchêne (Hibou57) <yannick_duch...@yahoo.fr> wrote: > Le Tue, 18 Oct 2011 12:52:01 +0200, ytomino <aghi...@gmail.com> a écrit:> OK, I've understood. > > But, UNICODE_STRING is usually not called "UTF-8 string". Because the > > content is decoded. > > UNICODE_STRING seems just array of UCS-32 code points to me. It's > > called "UTF-32 string" commonly. > > (It's same as that Wide_Wide_String is not called UTF-8 string.) > > If my mind is still right since the time I get into SmallEiffel compiler's > sources (back to 1999 and 2000), this was implemented with UTF-8 for > memory efficiency. May be its successor, SmartEiffel, less memory > efficient, was different. Has underlined by Dmitry, the best way is to see > it as a sequence of code points, as you first said, indeed (although > directly mappable to code points, UTF-32, still formally refers to an > encoding, although a straight and direct encoding… but don't mind, that's > just a detail). > > -- > “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on > Programming — Alan J. — P. Yale University] > “Structured Programming supports the law of the excluded muddle.” [Idem] > Java: Write once, Never revisit Fuckin' great! I downloaded and searched SmartEiffel's UNICODE_STRING.e. It has two arrays of UTF-16 values. UTF-16 array *A* has UCS-2 characters or first halfs of surrogate pair. UTF-16 array *B* has second halfs of surrogate pair. *B* is never allocated without it's required to hold a surrogate pair at least. It's certain that memory efficient and its calculation order is not increased. (This string is not my liking, but interesting!) ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 2:59 ` Yannick Duchêne (Hibou57) 2011-10-18 4:07 ` Michael Rohan 2011-10-18 4:54 ` ytomino @ 2011-10-18 10:10 ` J-P. Rosen 2011-10-22 6:32 ` Michael Rohan 2 siblings, 1 reply; 100+ messages in thread From: J-P. Rosen @ 2011-10-18 10:10 UTC (permalink / raw) Le 18/10/2011 04:59, Yannick Duchêne (Hibou57) a écrit : > I feel it is more an > implementation trick (which was indeed intended by the design of UTF-8 > targeting some hardly solvable context), than a clean formalization. It is not (says the one who wrote the AI). The issue of using String or a different type was carefully investigated, and String was chosen mainly on the ground of usability (f.e., when you read from a text file, you don't know if it is encoded or not until you have read the BOM - would you read the BOM into a String or an Encoded_String?) This appears in the !discussion section of AI AI05-0137-1/05 Note that the AI carefully talks about "characters whose position numbers correspond to the encoding". An encoding is a way to store a character string in a more compact manner, but it still represents a string of characters. Compare that with packed arrays - that does not change the high level nature of the array. -- --------------------------------------------------------- J-P. Rosen (rosen@adalog.fr) Adalog a déménagé / Adalog has moved: 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-18 10:10 ` J-P. Rosen @ 2011-10-22 6:32 ` Michael Rohan 2011-10-22 7:25 ` Yannick Duchêne (Hibou57) 2011-10-25 19:26 ` Randy Brukardt 0 siblings, 2 replies; 100+ messages in thread From: Michael Rohan @ 2011-10-22 6:32 UTC (permalink / raw) Hi, There seems to be two major issues being considered here * The handling of "string" data with within Ada applications, i.e., should String be opaque with, perhaps, class type interfaces giving views into this data as Latin-1, UTF8, UCS-2, etc. * The more immediate issue I raised initially, what to do when you have a Wide_String and want to use it as a file name. I'm currently just converting such names to UTF8 which works well on Linux but probably will have issue on Windows if I were to use non-Latin-1 type strings. While the first issue is relatively involved, the second issue could be handled by the run-time (with the possibilities of exceptions if the name could not be mapped, but that would be up to the application to handle). My initial question suggested there should be Wide_* versions of the packages that interface with the OS (Directories, Command_Line, Environment_Variables, etc). Having implemented wrappers for these it seems to me extending the existing packages to have additional routines for Wide_String/Wide_Wide_String would be cleaner. This extension of the existing packages would be something that might be possible to consider for the next revision (but maybe too late?). Take care, Michael. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-22 6:32 ` Michael Rohan @ 2011-10-22 7:25 ` Yannick Duchêne (Hibou57) 2011-10-25 19:26 ` Randy Brukardt 1 sibling, 0 replies; 100+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-10-22 7:25 UTC (permalink / raw) Le Sat, 22 Oct 2011 08:32:44 +0200, Michael Rohan <michael@zanyblue.com> a écrit: > My initial question suggested there should be Wide_* versions of the > packages that interface with the OS (Directories, Command_Line, > Environment_Variables, etc). Having implemented wrappers for these it > seems to me extending the existing packages to have additional routines > for Wide_String/Wide_Wide_String would be cleaner. I vote for a single additional Wide_Wide_String version (the String version is required for compatibility, and a Wide_String version would be useless, as a Wide_Wide_String version could do the more, and the less and the mean time). -- “Syntactic sugar causes cancer of the semi-colons.” [Epigrams on Programming — Alan J. — P. Yale University] “Structured Programming supports the law of the excluded muddle.” [Idem] Java: Write once, Never revisit ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-22 6:32 ` Michael Rohan 2011-10-22 7:25 ` Yannick Duchêne (Hibou57) @ 2011-10-25 19:26 ` Randy Brukardt 1 sibling, 0 replies; 100+ messages in thread From: Randy Brukardt @ 2011-10-25 19:26 UTC (permalink / raw) "Michael Rohan" <michael@zanyblue.com> wrote in message news:20586225.484.1319265164765.JavaMail.geo-discussion-forums@prgt10... ... >This extension of the existing packages would be something that might be >possible to >consider for the next revision (but maybe too late?). Ada 2012 is essentially finished (it will be really finished as soon as I finish fixing the latest batch of editorial comments). So it is way too late for anything but the most trivial changes. At this point, all suggestions are going into Ada 2020 (the provisional name for the following revision - where we'll have perfect vision of what Ada should be ;-). Randy. ^ permalink raw reply [flat|nested] 100+ messages in thread
* Re: Why no Ada.Wide_Directories? 2011-10-14 6:58 Why no Ada.Wide_Directories? Michael Rohan 2011-10-14 7:39 ` Yannick Duchêne (Hibou57) 2011-10-15 1:06 ` ytomino @ 2011-10-27 17:40 ` anon 2 siblings, 0 replies; 100+ messages in thread From: anon @ 2011-10-27 17:40 UTC (permalink / raw) Here is a reason from a link at Unicode.org: http://www.cl.cam.ac.uk/~mgk25/unicode.html "...An ASCII or Latin-1 file can be transformed into a UCS-2 file by simply inserting a 0x00 byte in front of every ASCII byte. If we want to have a UCS-4 file, we have to insert three 0x00 bytes instead before every ASCII byte. Using UCS-2 (or UCS-4) under Unix would lead to very severe problems. Strings with these encodings can contain as parts of many wide characters bytes like "\0" or "/" which have a special meaning in filenames and other C library function parameters. In addition, the majority of UNIX tools expects ASCII files and cannot read 16-bit words as characters without major modifications. For these reasons, UCS-2 is not a suitable external encoding of Unicode in filenames, text files, environment variables, etc." So Wide_Character could cause problems in other parts of the OS or Ada/C libraries. And Ada has does have a "Safety and Security" concerns. Like paragraph 4 in Annex H. 4 Restricting language constructs whose usage might complicate the demonstration of program correctness Plus, the goal of "reliability, maintainability, and efficiency" could not be keep if Ada_Directory was Wide_Character. Because the storage of Wide_Character rather 16-bit or 32-bit is not as efficient as 8 bit for filenames. Just think about the old simple 8 by 3 character file names. In Wide_Characters that would minimally be 16 by 6 byte (UCS-2) or even 32 by 12 byte (UCS-4). Which means searching and comparing names could take 2 to 4 time longer and 2 or 4 time more storage for the name. Which is less efficiency. A quick note on maintainability, and how many systems will be using the (16/32) Unicode for their filenames. So, to be reliability and efficiency, Wide_Characters should be keep to the routines and data that requires the addition storage to be accurate, not to files that are already hurt because they are normally on a slower access media. And causing more time is defeat the purpose of timely reliability program. In <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32>, Michael Rohan <michael.k.rohan@gmail.com> writes: >Hi, > >I've working a little on accessing files and directories using Ada.Director= >ies and have been using a thin wrapper layer to convert from Wide_String to= > UTF8 and back. It does, however, seem strange there is no Wide_Directorie= >s version in the std library. Was there a technical reason it wasn't inclu= >ded? > >Take care, >Michael ^ permalink raw reply [flat|nested] 100+ messages in thread
end of thread, other threads:[~2011-10-28 15:00 UTC | newest] Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-10-14 6:58 Why no Ada.Wide_Directories? Michael Rohan 2011-10-14 7:39 ` Yannick Duchêne (Hibou57) 2011-10-14 9:07 ` Dmitry A. Kazakov 2011-10-14 12:48 ` Yannick Duchêne (Hibou57) 2011-10-14 12:54 ` Yannick Duchêne (Hibou57) 2011-10-15 1:06 ` ytomino 2011-10-15 6:55 ` Vadim Godunko 2011-10-15 12:34 ` ytomino 2011-10-15 8:38 ` Dmitry A. Kazakov 2011-10-15 13:12 ` Peter C. Chapin 2011-10-15 13:22 ` Ludovic Brenta 2011-10-15 14:47 ` Dmitry A. Kazakov 2011-10-16 5:48 ` Yannick Duchêne (Hibou57) 2011-10-17 0:15 ` Peter C. Chapin 2011-10-17 3:23 ` Yannick Duchêne (Hibou57) 2011-10-17 7:12 ` Simon Wright 2011-10-17 7:59 ` Dmitry A. Kazakov 2011-10-18 10:55 ` Peter C. Chapin 2011-10-18 12:27 ` Dmitry A. Kazakov 2011-10-16 5:51 ` Yannick Duchêne (Hibou57) 2011-10-17 21:41 ` Randy Brukardt 2011-10-18 7:29 ` Dmitry A. Kazakov 2011-10-18 14:06 ` Pascal Obry 2011-10-18 14:08 ` Pascal Obry 2011-10-19 21:32 ` Randy Brukardt 2011-10-17 21:33 ` Randy Brukardt 2011-10-17 23:47 ` ytomino 2011-10-18 1:10 ` Adam Beneschan 2011-10-18 2:32 ` ytomino 2011-10-18 4:46 ` ytomino 2011-10-18 9:32 ` Yannick Duchêne (Hibou57) 2011-10-18 10:00 ` Dmitry A. Kazakov 2011-10-18 10:06 ` Yannick Duchêne (Hibou57) 2011-10-18 12:01 ` Dmitry A. Kazakov 2011-10-18 15:02 ` Adam Beneschan 2011-10-18 15:16 ` Dmitry A. Kazakov 2011-10-18 23:42 ` Adam Beneschan 2011-10-19 8:12 ` Dmitry A. Kazakov 2011-10-19 21:43 ` Randy Brukardt 2011-10-20 7:37 ` Dmitry A. Kazakov 2011-10-20 11:04 ` Yannick Duchêne (Hibou57) 2011-10-20 12:21 ` Dmitry A. Kazakov 2011-10-20 12:38 ` Yannick Duchêne (Hibou57) 2011-10-20 14:31 ` Dmitry A. Kazakov 2011-10-20 15:54 ` Yannick Duchêne (Hibou57) 2011-10-20 17:35 ` Dmitry A. Kazakov 2011-10-21 12:53 ` Yannick Duchêne (Hibou57) 2011-10-21 13:41 ` Dmitry A. Kazakov 2011-10-25 19:22 ` Randy Brukardt 2011-10-25 19:35 ` Dmitry A. Kazakov 2011-10-26 22:41 ` Randy Brukardt 2011-10-27 7:43 ` Dmitry A. Kazakov 2011-10-27 15:13 ` Yannick Duchêne (Hibou57) 2011-10-27 19:39 ` Robert A Duff 2011-10-27 21:09 ` Yannick Duchêne (Hibou57) 2011-10-28 7:50 ` Dmitry A. Kazakov 2011-10-28 8:45 ` Yannick Duchêne (Hibou57) 2011-10-28 14:59 ` Dmitry A. Kazakov 2011-10-20 17:40 ` J-P. Rosen 2011-10-20 18:43 ` Dmitry A. Kazakov 2011-10-21 10:07 ` Vadim Godunko 2011-10-21 11:25 ` J-P. Rosen 2011-10-21 12:25 ` Yannick Duchêne (Hibou57) 2011-10-21 13:13 ` Dmitry A. Kazakov 2011-10-21 16:03 ` Yannick Duchêne (Hibou57) 2011-10-21 18:34 ` Dmitry A. Kazakov 2011-10-21 19:30 ` Yannick Duchêne (Hibou57) 2011-10-21 20:02 ` Dmitry A. Kazakov 2011-10-21 20:36 ` Yannick Duchêne (Hibou57) 2011-10-22 7:54 ` Dmitry A. Kazakov 2011-10-22 20:28 ` Yannick Duchêne (Hibou57) 2011-10-22 22:23 ` Yannick Duchêne (Hibou57) 2011-10-23 7:53 ` Dmitry A. Kazakov 2011-10-25 19:16 ` Randy Brukardt 2011-10-21 18:55 ` Vadim Godunko 2011-10-21 19:18 ` J-P. Rosen 2011-10-21 19:41 ` Yannick Duchêne (Hibou57) 2011-10-18 22:54 ` ytomino 2011-10-18 3:15 ` Yannick Duchêne (Hibou57) 2011-10-18 7:55 ` Dmitry A. Kazakov 2011-10-18 9:41 ` Yannick Duchêne (Hibou57) 2011-10-18 10:25 ` J-P. Rosen 2011-10-18 10:56 ` Yannick Duchêne (Hibou57) 2011-10-18 15:34 ` Adam Beneschan 2011-10-18 17:27 ` J-P. Rosen 2011-10-18 18:33 ` Adam Beneschan 2011-10-18 19:54 ` Yannick Duchêne (Hibou57) 2011-10-18 8:01 ` Dmitry A. Kazakov 2011-10-18 2:59 ` Yannick Duchêne (Hibou57) 2011-10-18 4:07 ` Michael Rohan 2011-10-18 4:54 ` ytomino 2011-10-18 9:54 ` Yannick Duchêne (Hibou57) 2011-10-18 10:52 ` ytomino 2011-10-18 11:02 ` Yannick Duchêne (Hibou57) 2011-10-18 21:18 ` ytomino 2011-10-18 10:10 ` J-P. Rosen 2011-10-22 6:32 ` Michael Rohan 2011-10-22 7:25 ` Yannick Duchêne (Hibou57) 2011-10-25 19:26 ` Randy Brukardt 2011-10-27 17:40 ` anon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox