From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,80b3e504140e89fd X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2002-06-29 04:15:04 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!canoe.uoregon.edu!logbridge.uoregon.edu!uwm.edu!newsfeed.cs.utexas.edu!geraldo.cc.utexas.edu!not-for-mail From: "Bobby D. Bryant" Newsgroups: comp.lang.ada Subject: Re: Config_Files proposal {long} Date: Sat, 29 Jun 2002 05:03:17 -0600 Organization: dis- Message-ID: References: NNTP-Posting-Host: dial-45-82.ots.utexas.edu Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: geraldo.cc.utexas.edu 1025348611 9701 128.83.112.114 (29 Jun 2002 11:03:31 GMT) X-Complaints-To: abuse@utexas.edu NNTP-Posting-Date: Sat, 29 Jun 2002 11:03:31 +0000 (UTC) User-Agent: Pan/0.11.3 (Unix) X-Comment-To: "Stephen Leake" Xref: archiver1.google.com comp.lang.ada:26789 Date: 2002-06-29T05:03:17-06:00 List-Id: On Tue, 18 Jun 2002 11:07:08 -0600, Stephen Leake wrote: > I've posted another example spec and implementation, at > > http://users.erols.com/leakstan/Stephe/Ada/Config_Files/config_files.html I'm sorry that I haven't had time to follow these discussions carefully, but I'd still like to offer some thoughts on it. (Please pardon anything that has already been discussed.) "11. Additional files may be opened for read-only simultaneously in one Config_File object, using an append-read-only operation. Keys are searched for first in the writeable file, then in the additional read-only files. Keys that are created or modified are written to the writeable file when flushed." I'm not sure I understand that. If it's saying "allow a config file to chain in other config files", great. If it's saying something else, then please add what I just said to your list of requirements. "22. Provide a way to read and write opaque binary values (ie bitmaps for icons)." Do you really want binary data in-line with text data? I think a better solution would be for the config file to just give the filename for binary data, so that after fetching the filename from the config file the program would use special-purpose library functions to read/write binary data. People are going to need to specify filenames in config files from time to time anyway, so what I'm suggesting won't require any new mechanisms, and it would let you completely off-load responsibility for arbitrary binary formats. "23. Provide a way to write comments. Comments are associated with a key (possibly a hierarchy level), and are preserved thru open and flush. Comments are intended to guide manually editing the file." I agree with this, but I want to point out that it is _extremely_ problematic. If you support comments then when people hand-edit a config file they will occasionaly (and rightly) add a comment explaining why that value is there. But if a program later changes that value, how is it to know when the comment has been vitiated? Add: I would like to see direct, high-level support for loading/saving 2-D tables, with the tables laid out in 2-D fashion (with row and column headers) in the config file. For instance, suppose the program in question is a wargame. Most such games require a table of Unit_Type x Terrain_Type --> Movement_Cost (and many other 2-D tables as well), and it would make life very easy for the scenario designer if s/he could simply type the data into the config file _as_a_2-D_table_. Also, it would be nice to support a certain type of dynamism in such tables. To continue the wargame example, it is becoming increasingly popular to design games that are just minimalist engines driven by externally specified data. So the scenario designer might wish to enumerate the unit and terrain types in the config file and then follow them with the table of movement costs. It would be a very useful feature, IMO, if the table-loading were able to do some sanity checking as it loaded the table. (Minimally, ensure that the table has the correct dimensions; optimally, have it actually verify that the row and column headers have the correct values; ideally, by direct reference to the values of the defining fields elsewhere in the config file.) Higher-dimensional tables are sometimes needed as well, though seemingly less often. They are also somewhat difficult to represent in a text file. So I would suggest that higher-dimensional tables be built up by "stacking" lower-dimensional tables appearing iteratively in the config file. (The same logic could of course be applied to the construction of 2-D tables, but they are a frequent and natural data type for humans, and they _can_ be specified in tabular fashion in a text file, so I think it would make good sense to support them directly with a high-level API.) Other food for thought: I think I mentioned a couple of months ago that I am using GUILE (Scheme) for my config files. I won't go so far as recommend adopting that as a standard, but I would like mention a couple of things about it to provoke thought. First off, compare the Scheme-like syntax to XML. Here's the example from your page: 124076833 2 3.14159E+00 he said "hi there & goodbye" Stradivarious And here's how the same thing would look in one of my config files right now: (configuration (numeric (interfaces (C (An_Unsigned 124076833) (An_Int 2) ) ) (Float (A_Float 3.14159E+00) ) ) (strings (Quoted he said "hi there & goodbye") (Violins Stradivarious) ) ) The latter is, IMO, *much* easier to read and comprehend. It's also about (guessingly) 40% smaller. While I don't advocate smallness for smallness' sake (and I think that "bloat" is usually merely a slur that people invoke against software systems that they don't like but can't come up with any cogent criticism of), the lean syntax can be very important in config files because it promotes readability -- partly through reduced clutter, and partly because there will be fewer times when non-semantic line wraps are required. [Per my example, an actual Scheme representation for the "Quoted" field would need to be expressed somewhat differently; I gave instead what I think it would look like if you adopted the parentheses syntax for what you're trying to do. The only thing that would need special treatment would be parentheses -- in fact, only close-parentheses in data that did not also include matching open parentheses.] The only disadvantage I can think of is that the lack of labeled end markers makes it hard to see where very long lists end. When this becomes a problem for me I merely address it with a cosmetic comment, thus: ... (Floats (F1 1.2) (F2 2.3) (F3 4.2) (F4 1.9) ... (F468 7.2) (Comment: End of "Floats" section.) ) ... As a side note, notice also that using Scheme for config files means that there is no formal distinction between config files and script files, since code and data are represented the same way in Scheme. However, this requres a run-time system such as GUILE to make it work, and is thus far out of the scope of what you are trying to do. OTOH, if you *are* thinking about support for scripting sometime in the future, give the mechanism some thought now. I find it extremely convenient to include certain kinds of code in my config files, thus: ... (Penalty_Function (log n)) ... Finally, if I may say so without stepping on any toes, I would like to call attention to the curiously "flat" examples that you have for the Java and XML sections on your Web page, and the emphasis on data types rather than on semantics. For comparison, here is what one of my real config files looks like (substantially reduced and re-commented for the purposes of this post) - (configuration (Comment: The next three values are symbolic names that will be used as keys for lookups in 1-D tables further down.) (use-problem legion-i) (use-workers first-five) (use-solution-strategy fixed-size-250) (worker-plot-colors "RGB:FF/00/00" "RGB:00/BF/00" "RGB:00/00/FF" ) (problems (Comment: "problems" is a list of problem-specific data, to be selected by the "use-probem" key defined above. I.e., the program grabs "use-problem" and extracts its value, then grabs "problems" and uses the previously obtained value of "use-problem" to fetch the proper record out of "problems". Due to the nature of the application, the data varies from problem to problem, and the program must decide what fields to ask for based on the value of the key. I.e., if "use-problem" is "phalanx-1" it will not ask for "barbarian-rate". That is hard-coded behavior embedded in a case statement in the application program, and I don't see any other obvious way to do it.) (legion-i (Comment: "use-map" is a key that will be looked up in a table below.) (use-map 21x21+3cities) (number-of-legions 5) (barbarian-rate 1.0) (game-length 200) (movement-granularity 10.0) (games-per-generation 3) (population-size 500) ) (phalanx-1 (use-map 41x31+5cities) (units-per-side 24) (turns-per-game 30) (games-per-generation 5) (population-size 10) ) ) (maps (Comment: "maps" is another lookup table. The key is obtained from the problem definition, above.) (21x21+3cities (map-width 21) (map-height 21) (number-of-cities 3) ) (41x31+5cities (map-width 41) (map-height 31) (number-of-cities 5) ) ) (worker-configurations (Comment: "worker-configurations" is another lookup table.) (first-five (Comment: Notice that "worker-random-seeds" is a list that my program reads into an array. Currently I do this by iterating within the program, but it would be nice if the config-file parser did that for me.) (worker-random-seeds 1 2 3 4 5) (minimum-workers-required-to-start 1) ) (three-require-three (worker-random-seeds 1 2 3) (minimum-workers-required-to-start 3) ) ) (solution-strategies (quick-test (hof-size 20) (children-each 20) (min-initial-hidden-units 1) (epochs (epoch (generations 2) (weight-mutation-rate .01) (size-penalty none) ) ) ) (fixed-size-250 (hof-size 50) (children-each 10) (min-initial-hidden-units 2) (epochs (Comment: Very importantly, notice that this is a *list* of records of the same type. The config-file parser must not simply grab the first and ignore the rest; rather, it must give them to me sequentially on demand.) (epoch (generations 50) (weight-mutation-rate .01) (size-penalty none) ) (epoch (generations 100) (weight-mutation-rate .05) (size-penalty none) ) (epoch (generations 100) (weight-mutation-rate .10) (size-penalty none) ) ) ) ) ) In addition to the raw GUILE bindings I have a package of higher-level support routines that makes life easier for the programmer. Included are a "Lookup" function and an overloaded-by-return-type "Second" function for extracting the values. For example, after appropriate variable declarations and with the config file already loaded but not processed, my code has things like this: Problem := Lookup("use-problem", Config); Map := Lookup("use-map", Problem); X := Second(Lookup("map-width", Map)); Please look at some of the things I'm doing in the example, and consider how you would be able to do it (and what it would look like) with the various proposed syntaxes. FWIW, if I were doing it myself I would probably start with an X-like syntax and remove the need for "end" markers by making the indentation semantically significant. I used GUILE instead because it saved me writing my own parser (and because I may need the scripting capabilities later), but the GUILE/Scheme syntax, when pretty-printed, is very similar to X-like with semantic indentations. To repeat your XML example yet again, you would get something like: configuration numeric interfaces C An_Unsigned 124076833 An_Int 2 Float A_Float 3.14159E+00 strings Quoted he said "hi there & goodbye" Violins Stradivarious Most human-friendly of all, IMO, but possibly prone to error, and surely difficult for syntax-highlighting editors unless you chose to limit the field names to a predefined set. Sorry for the length; hopefully someone will find an interesting thing or two somewhere in there. Bobby Bryant Austin, Texas