From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,80b3e504140e89fd
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2002-06-29 04:15:04 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!canoe.uoregon.edu!logbridge.uoregon.edu!uwm.edu!newsfeed.cs.utexas.edu!geraldo.cc.utexas.edu!not-for-mail
From: "Bobby D. Bryant" <bdbryant@mail.utexas.edu>
Newsgroups: comp.lang.ada
Subject: Re: Config_Files proposal {long}
Date: Sat, 29 Jun 2002 05:03:17 -0600
Organization: dis-
Message-ID: <afk460$9f5$1@geraldo.cc.utexas.edu>
References: <uwuswy0qr.fsf@gsfc.nasa.gov>
NNTP-Posting-Host: dial-45-82.ots.utexas.edu
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: geraldo.cc.utexas.edu 1025348611 9701 128.83.112.114 (29 Jun 2002
 11:03:31 GMT)
X-Complaints-To: abuse@utexas.edu
NNTP-Posting-Date: Sat, 29 Jun 2002 11:03:31 +0000 (UTC)
User-Agent: Pan/0.11.3 (Unix)
X-Comment-To: "Stephen Leake" <stephen.a.leake.1@gsfc.nasa.gov>
Xref: archiver1.google.com comp.lang.ada:26789
Date: 2002-06-29T05:03:17-06:00
List-Id: <comp.lang.ada>

On Tue, 18 Jun 2002 11:07:08 -0600, Stephen Leake wrote:

> I've posted another example spec and implementation, at
> 
> http://users.erols.com/leakstan/Stephe/Ada/Config_Files/config_files.html

I'm sorry that I haven't had time to follow these discussions
carefully, but I'd still like to offer some thoughts on it.  (Please
pardon anything that has already been discussed.)


"11. Additional files may be opened for read-only simultaneously in
one Config_File object, using an append-read-only operation. Keys
are searched for first in the writeable file, then in the additional
read-only files. Keys that are created or modified are written to
the writeable file when flushed."

I'm not sure I understand that.  If it's saying "allow a config file
to chain in other config files", great.  If it's saying something
else, then please add what I just said to your list of requirements.


"22. Provide a way to read and write opaque binary values (ie
bitmaps for icons)."

Do you really want binary data in-line with text data?  I think a
better solution would be for the config file to just give the
filename for binary data, so that after fetching the filename from
the config file the program would use special-purpose library
functions to read/write binary data.

People are going to need to specify filenames in config files from
time to time anyway, so what I'm suggesting won't require any new
mechanisms, and it would let you completely off-load responsibility
for arbitrary binary formats.


"23. Provide a way to write comments. Comments are associated with a
key (possibly a hierarchy level), and are preserved thru open and
flush. Comments are intended to guide manually editing the file."

I agree with this, but I want to point out that it is _extremely_
problematic.  If you support comments then when people hand-edit a
config file they will occasionaly (and rightly) add a comment
explaining why that value is there.  But if a program later changes
that value, how is it to know when the comment has been vitiated?


Add:

I would like to see direct, high-level support for loading/saving 2-D
tables, with the tables laid out in 2-D fashion (with row and column
headers) in the config file.  For instance, suppose the program in
question is a wargame.  Most such games require a table of Unit_Type
x Terrain_Type --> Movement_Cost (and many other 2-D tables as well),
and it would make life very easy for the scenario designer if s/he
could simply type the data into the config file _as_a_2-D_table_.

Also, it would be nice to support a certain type of dynamism in such
tables.  To continue the wargame example, it is becoming
increasingly popular to design games that are just minimalist
engines driven by externally specified data.  So the scenario
designer might wish to enumerate the unit and terrain types in the
config file and then follow them with the table of movement costs.
It would be a very useful feature, IMO, if the table-loading were
able to do some sanity checking as it loaded the table.  (Minimally,
ensure that the table has the correct dimensions; optimally, have
it actually verify that the row and column headers have the correct
values; ideally, by direct reference to the values of the defining
fields elsewhere in the config file.)

Higher-dimensional tables are sometimes needed as well, though
seemingly less often.  They are also somewhat difficult to
represent in a text file.  So I would suggest that
higher-dimensional tables be built up by "stacking"
lower-dimensional tables appearing iteratively in the config file.
(The same logic could of course be applied to the construction of
2-D tables, but they are a frequent and natural data type for
humans, and they _can_ be specified in tabular fashion in a text
file, so I think it would make good sense to support them directly
with a high-level API.)


Other food for thought:

I think I mentioned a couple of months ago that I am using GUILE
(Scheme) for my config files.  I won't go so far as recommend
adopting that as a standard, but I would like mention a couple of
things about it to provoke thought.

First off, compare the Scheme-like syntax to XML.  Here's the
example from your page:

<?xml version="1.0"?> <Config>
  <Numeric>
    <Interfaces>
      <C>
        <An_Unsigned> 124076833</An_Unsigned>
        <An_Int> 2</An_Int>
      </C>
    </Interfaces>
    <Float>
      <A_Float> 3.14159E+00</A_Float>
    </Float>
  </Numeric>
  <Strings>
    <Quoted>  he said &quot;hi there &amp; goodbye&quot; </Quoted>
    <Violins>Stradivarious</Violins>
  </Strings>
</Config>

And here's how the same thing would look in one of my config files
right now:

(configuration
  (numeric
    (interfaces
      (C
        (An_Unsigned 124076833)
        (An_Int              2)
      )
    )
    (Float
      (A_Float 3.14159E+00)
    )
  )
  (strings
    (Quoted  he said "hi there & goodbye")
    (Violins Stradivarious)
  )
)

The latter is, IMO, *much* easier to read and comprehend.  It's
also about (guessingly) 40% smaller.  While I don't advocate
smallness for smallness' sake (and I think that "bloat" is usually
merely a slur that people invoke against software systems that they
don't like but can't come up with any cogent criticism of), the
lean syntax can be very important in config files because it
promotes readability -- partly through reduced clutter, and partly
because there will be fewer times when non-semantic line wraps are
required.

[Per my example, an actual Scheme representation for the "Quoted"
field would need to be expressed somewhat differently; I gave
instead what I think it would look like if you adopted the
parentheses syntax for what you're trying to do.  The only thing
that would need special treatment would be parentheses -- in fact,
only close-parentheses in data that did not also include
matching open parentheses.]

The only disadvantage I can think of is that the lack of labeled
end markers makes it hard to see where very long lists end.  When
this becomes a problem for me I merely address it with a cosmetic
comment, thus:

...
   (Floats
      (F1 1.2)
      (F2 2.3)
      (F3 4.2)
      (F4 1.9)
       ...
      (F468 7.2)
      (Comment: End of "Floats" section.)
   )
...


As a side note, notice also that using Scheme for config files
means that there is no formal distinction between config files and
script files, since code and data are represented the same way in
Scheme.  However, this requres a run-time system such as GUILE to
make it work, and is thus far out of the scope of what you are
trying to do.  OTOH, if you *are* thinking about support for
scripting sometime in the future, give the mechanism some thought
now.  I find it extremely convenient to include certain kinds of
code in my config files, thus:

...
   (Penalty_Function (log n))
...


Finally, if I may say so without stepping on any toes, I would like
to call attention to the curiously "flat" examples that you have
for the Java and XML sections on your Web page, and the emphasis on
data types rather than on semantics.  For comparison, here is what
one of my real config files looks like (substantially reduced and
re-commented for the purposes of this post) -

(configuration

 (Comment:      The next three values are symbolic names that will be
                used as keys for lookups in 1-D tables further down.)
 (use-problem             legion-i)
 (use-workers             first-five)
 (use-solution-strategy   fixed-size-250)

 (worker-plot-colors
  "RGB:FF/00/00"
  "RGB:00/BF/00"
  "RGB:00/00/FF"
  )

 (problems
  (Comment:     "problems" is a list of problem-specific data, to
                be selected by the "use-probem" key defined above.
                I.e., the program grabs "use-problem" and extracts
                its value, then grabs "problems" and uses the
                previously obtained value of "use-problem" to
                fetch the proper record out of "problems".
                Due to the nature of the application, the data
                varies from problem to problem, and the program
                must decide what fields to ask for based on the
                value of the key.  I.e., if "use-problem" is
                "phalanx-1" it will not ask for "barbarian-rate".
                That is hard-coded behavior embedded in a case
                statement in the application program, and I don't
                see any other obvious way to do it.)
  (legion-i
   (Comment:    "use-map" is a key that will be looked up in a
                table below.)
   (use-map            21x21+3cities)
   (number-of-legions        5)
   (barbarian-rate         1.0)
   (game-length            200)
   (movement-granularity  10.0)
   (games-per-generation     3)
   (population-size        500)
   )
  (phalanx-1
   (use-map           41x31+5cities)
   (units-per-side         24)
   (turns-per-game         30)
   (games-per-generation    5)
   (population-size        10)
   )
  )

 (maps
  (Comment:       "maps" is another lookup table.  The key is
                  obtained from the problem definition, above.)
  (21x21+3cities
   (map-width              21)
   (map-height             21)
   (number-of-cities        3)
   )
  (41x31+5cities
   (map-width              41)
   (map-height             31)
   (number-of-cities        5)
   )
  )

 (worker-configurations
  (Comment:       "worker-configurations" is another lookup table.)
  (first-five
   (Comment:      Notice that "worker-random-seeds" is a list that
                  my program reads into an array.  Currently I do
                  this by iterating within the program, but it
                  would be nice if the config-file parser did that
                  for me.)
   (worker-random-seeds 1 2 3 4 5)
   (minimum-workers-required-to-start 1)
   )
  (three-require-three
   (worker-random-seeds 1 2 3)
   (minimum-workers-required-to-start 3)
   )
  )

 (solution-strategies
  (quick-test
    (hof-size        20)
    (children-each   20)
    (min-initial-hidden-units  1)
    (epochs
     (epoch
      (generations             2)
      (weight-mutation-rate  .01)
      (size-penalty         none)
      )
     )
    )
  (fixed-size-250
    (hof-size         50)
    (children-each    10)
    (min-initial-hidden-units  2)
    (epochs
     (Comment:     Very importantly, notice that this is a
                   *list* of records of the same type.  The
                   config-file parser must not simply grab
                   the first and ignore the rest; rather,
                   it must give them to me sequentially on
                   demand.)
     (epoch
      (generations            50)
      (weight-mutation-rate  .01)
      (size-penalty         none)
      )
     (epoch
      (generations           100)
      (weight-mutation-rate  .05)
      (size-penalty         none)
      )
     (epoch
      (generations           100)
      (weight-mutation-rate  .10)
      (size-penalty         none)
      )
     )
    )
  )

)

In addition to the raw GUILE bindings I have a package of
higher-level support routines that makes life easier for
the programmer.  Included are a "Lookup" function and an
overloaded-by-return-type "Second" function for extracting the
values.  For example, after appropriate variable declarations
and with the config file already loaded but not processed, my
code has things like this:

	Problem := Lookup("use-problem", Config);
        Map     := Lookup("use-map", Problem);
	X       := Second(Lookup("map-width", Map));


Please look at some of the things I'm doing in the example, and
consider how you would be able to do it (and what it would look
like) with the various proposed syntaxes.

FWIW, if I were doing it myself I would probably start with an
X-like syntax and remove the need for "end" markers by making the
indentation semantically significant.  I used GUILE instead because
it saved me writing my own parser (and because I may need the
scripting capabilities later), but the GUILE/Scheme syntax, when
pretty-printed, is very similar to X-like with semantic
indentations.  To repeat your XML example yet again, you would get
something like:

configuration
  numeric
    interfaces
      C
        An_Unsigned 124076833
        An_Int              2
    Float
      A_Float 3.14159E+00
  strings
    Quoted  he said "hi there & goodbye"
    Violins Stradivarious

Most human-friendly of all, IMO, but possibly prone to error, and
surely difficult for syntax-highlighting editors unless you chose
to limit the field names to a predefined set.


Sorry for the length; hopefully someone will find an interesting
thing or two somewhere in there.

Bobby Bryant
Austin, Texas