From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,bc1361a952ec75ca
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2001-08-29 09:48:31 PST
Path: 
 archiver1.google.com!newsfeed.google.com!sn-xit-02!supernews.com!newsfeed.direct.ca!look.ca!newsfeed1.cidera.com!Cidera!cyclone1.gnilink.net!news-east.rr.com!cyclone.rdc-detw.rr.com!news.mw.mediaone.net!lsnws01.we.mediaone.net!typhoon.san.rr.com.POSTED!not-for-mail
Message-ID: <3B8D1B8F.9BDEDC38@san.rr.com>
From: Darren New <dnew@san.rr.com>
Organization: Boxes!
X-Mailer: Mozilla 4.77 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Progress on AdaOS
References: <9IFe7.12813$6R6.1221214@news1.cableinet.net>
 <9lghqu$ac6$1@nh.pace.co.uk> <3B7C3293.76F49097@home.com>
 <9lhefg$lgd$1@nh.pace.co.uk> <3B7D47F1.25D6FC78@boeing.com>
 <5ee5b646.0108171856.18631c4c@posting.google.com> <3B7F624B.7294D24F@acm.org>
 <lK8g7.7817$2u.56850@www.newsranger.com> <9lr6je$5hj$1@nh.pace.co.uk>
 <Pine.A41.4.10.10108201848570.29818-100000@acs5.bu.edu>
 <9ltoi7$4is$1@nh.pace.co.uk> <3B82789B.8D195045@home.com>
 <9ltuo8$70n$1@nh.pace.co.uk> <3B829450.879B0396@home.com>
 <uBW9YP9YCoMO@eisner.encompasserve.org> <9mdh4e$q3v$1@nh.pace.co.uk>
 <9me03r$c302@news.cis.okstate.edu> <3B8AB6C8.910130C8@san.rr.com>
 <9metfo$aai2@news.cis.okstate.edu> <3B8BC332.214F95CA@san.rr.com>
 <9mhmlg$9aa1@news.cis.okstate.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Wed, 29 Aug 2001 16:42:55 GMT
NNTP-Posting-Host: 24.165.20.229
X-Complaints-To: abuse@rr.com
X-Trace: typhoon.san.rr.com 999103375 24.165.20.229 (Wed,
 29 Aug 2001 09:42:55 PDT)
NNTP-Posting-Date: Wed, 29 Aug 2001 09:42:55 PDT
Xref: archiver1.google.com comp.lang.ada:12560
Date: 2001-08-29T16:42:55+00:00
List-Id: <comp.lang.ada>

David Starner wrote:
> 
> On Tue, 28 Aug 2001 16:13:42 GMT, Darren New <dnew@san.rr.com> wrote:
> > My point is to ask why not. My point is to ask "why limit yourself to
> > array-of-byte files" and you're answering "because anything else is not
> > a file." That's not helpful.
> 
> The way I feel, you keeping asking "why limit yourself to citrus fruits
> as oranges" and I'm answering "because anything else isn't an orange."
> Until you give me a concrete definition for a file, this discussion
> isn't going anywhere.

A file would be a collection of data that can be shared between programs
and outlasts any given process. (Of course, if it's represented as a
process, it doesn't outlast the process that it is, but that's no
different than saying a file doesn't persist past the execution of the
"rm" program.)

Note, for example, that a shared memory segment falls into this model,
even under UNIX.

Now imagine taking the UNIX shared-memory-segment model, and expanding
it to be a shared-tagged-object model. You do whatever manipulations you
want, and your allocations all come from a storage pool that deals with
the stuff as a file.

> No matter how you define it, most files are arrays of bytes. Look up
> how hard drives and RAM work if you don't believe me.

Actually, most files are *not* arrays of bytes. Most files are arrays of
records, excluding compressed files (records being 1 bit long) and
streaming media perhaps. Under CP/M, files were arrays of sectors. Line
printers aren't arrays of bytes. The mouse isn't an array of bytes.
/dev/audio isn't an array of bytes. Fonts aren't arrays of bytes. For
that matter, images aren't arrays of bytes either, they're 3D arrays of
pixels. You just serialize them into arrays of bytes.

Which is irrelevant, considering that we seem to have no problem storing
the kinds of complex pointer-based information I'm talking about in RAM,
which is an array of bytes, almost.  (Note, no RAM is *not* an array of
bytes either, on any hardware with memory management.)

> > Why not? Again, you're starting with a preconceived notion of "file"
> > being "an array of bytes", then claiming that the OS should be handling
> > that as the basic type.
> 
> I don't even know what you mean by "the basic type" here.

The fundamental type of data handled by the OS. In UNIX, there's four
fundamental types of data in files: files, dirs, block-special, and
character-special.
 
> > For example, on the Amiga OS, files were tasks, as were device drivers,
> > etc. You could send a message to a file task to read and write portions
> > of the file into particular buffers, and when the message returned, the
> > I/O was complete. This allowed you to treat windows as files
> > (open("con:top/left/width/height/title")) and such. It allowed you to
> > write phenomes to the voice synthesizer and read lip positions back as
> > the sound was generated. Cool stuff like that.
> 
> And, wow, open ("con:1/1/20/20/My Window / file") is so much clearer than
> Create_Window (Top => 1, Left => 1, Width => 20, Height => 20,
> Title => "My Window / file"). 

Uh huh. And /dev/fd0 is so much clearer than INT21 calls, yes? The point
is that windows were consistant with the file paradigm in ways that they
aren't consistant in X-Windows, and this was possible because they were
based on active processes with complex data rather than on stuff built
into the kernel.

> Oops, we need to escape the / somehow
> in your call, don't we . . .

No, because the title came last. Also irrelevant. I can see it's
pointless for me to give counterexamples to your points if you're going
to actively attempt to ignore why I might be giving such
counterexamples.

> You're welcome to open a pipe to the voice synthesizer. It'll work
> just as well.

No, it won't, for reasons that are irrelevant to Ada so I won't persue
here.
 
> > Imagine an OS where "files" are actually protected objects, fully typed,
> > that outlive your program's execution. Directories are simply persistant
> > protected objects that act as arrays mapping strings to these "files".
> 
> So we have an space-inefficent tangle of data that's hard to transfer
> between systems running different OS's and probably even between different
> programs on the same system.

It would be easy to transfer them between programs on the same OS, I
would think. At least, if those programs were written in Ada or
compatible with them. It's easy to move them to a different system: You
serialize them. But why make serialization a *required* step *every*
time you use a file, *including* the ones you have no intention of ever
moving to any other system?

Doing so is like using overlays instead of demand paging for your memory
management. Why would you, when you can build it into the OS and get
everyone using it for free?
 
> >> An OS could certainly flatten that into a file. It would probably
> >> take a lot of work to make sure it wasn't an OS/program/libada specific
> >> file, though.
> >
> > I don't see any particular need to flatten it as such. Again, you're
> > trying to map "files" onto "array of bytes in a UNIX-like file system".
> > That's not the point.
> 
> Hard drives, CDs, floppies and tapes are flat. Hence to store it you're
> going to need to flatten it.

Since RAM is flat, I guess it's impossible to use non-flat structures at
all, right? Why is it OK to use "tangled" data structures in memory, but
not in a file?

For that matter, why are files contiguous arrays of bytes? Why do you
insist that the OS handle the directories but you handle what's inside
the files? Why do you insist that the OS handle allocation of space for
the files, but not handle allocation of space inside them? That's the
point I'm trying to get at.

Anyway, no, hard drives, CDs, and floppies all have sectors. So they're
arrays of sectors, each of which is an array of bytes.

And again, why do you want directories built in? Why not just flatten
them yourself?
 
> > There's a reason folks put names on files, on procedures, on users. The
> > same reason holds for individual pieces of data in a file. It's a win
> > whenever you store more than one piece of data in a file, like more than
> > one line in a text file, more than one user in a password file, etc. And
> > if you're only storing one atomic piece of data in a file, it's because
> > you're using the file's name as an index anyway. Have you never written
> > a program where the name of a file is calculated by the program?
> > "Image01.jpg", "Image02.jpg", ...?
> 
> And how much of a win is storing that in one file over storing it in
> a directory and tarring / zipping it for transport?

Significant. My point is that you *do* use such things, you just don't
recognise it. You say "I don't need complex intra-file data, because I
can make complex inter-file data instead." You're kind of perverting the
directories to serve as keys in exactly the same way that whatshisname
complains that people perverted file names to include the data type.

If you haven't used more sophisticated file systems, you don't see what
you're missing, just like if you've never used directories or relational
databases or OO databases or strong typing, you might have a hard time
seeing the value of it.

Why is strong typing good in a programming language and bad in
persistant data?

> >> > The type for an "Active Server Page" with Ada code embedded in
> >> > HTML markup under control of CVS?
> >>
> >> text/x-asp.
> >
> > And how does CVS know it can handle this,
> 
> text/*
> > and how does the web browser
> > know it needs to invoke the Ada compiler and not the Java compiler when
> > rebuilkding this page?
> 
> I would assume it reads the start of the file.

OK, so you're only solving half the problem here. The content-type on
the file isn't sufficient to figure out what you can do with the file
without reading the data. Why bother with it at all?

Why is strong typing good in a program and bad in a file?
 
> > The point is that no passive tagging system is going to be complete
> > enough that you can specify everything you need to know about the
> > program in the type tag.
> 
> Sure. And?

And why bother? What's the benefit of putting big complex heirarchical
types on files that are neither sufficient nor enforced? What keeps me
from writing a JPEG image into an Ada source file? If I can do that, why
do I need the file type?
 
> I've used Ada. Every so often in Ada, you're juggling twenty different
> types and having to constantly convert between them.

<sarcasm> Oh no. I've never had to juggle file types in a flat-file OS
either. Never had a problem with the wrong character set or line ending
translation, never had to convert from one image format to another, and
never had to change a file extension so some other program would accept
a text file and display it as HTML markup, and of course I've never
opened an image file with a text editor. </sarcasm>

> I fail to see the
> advantage to that in an OS. I don't want each program having its own
> set of types that are incompatible with everything else. I also want
> my editor to work any appropriate file, be it C, Ada, HTML, Tex,
> a diary, whatever.

Welcome to the wonderful world of object-oriented OSes. :-) As long as
the file type supports getting and setting lines of text, your editor
shouldn't have a problem with it. As long as it supports serialization
at all, your binary editor shouldn't have a problem with it.

Why wouldn't you want your editor to work with inappropriate file types?

> You ask how the web browser knows it needs to invoke the Ada compiler
> in my situation. How does the web browser know anything about the
> file in your situation?

The same way the web browser knows the types of the windows and buttons
it displays. It does a "with File.Text.Html" and then uses the routines
therein.

> How does this type get created? 

The same way any other type gets created. You write a package for it.

> What happens
> if you change it from Ada to Chill?

Then you have to figure out how to serialize it or figure out how to
translate it without serializing it. Always coding every single program
to the worst-case conditions is going to lead to all kinds of problems.
What if the next HTML standard specifies that all HTML tags should be in
Hebrew? What happens if the next version of emacs uses FORTH for its
scripting language instead of elisp?

> How does the editor know what type
> to create?

You tell it. Same as now.

> > Errr... When talking about designing and writing an OS, saying "oh,
> > don't worry, the device drivers will take care of it" doesn't work.
> 
> Sure it does. It's called stratifaction. When talking about user mode
> utilities, we can hand certain things off to the kernel, and let
> the kernel worry about it.

But to do that, you have to come up with some universal semantics for
the file system, which is what we're talking about.

If your files have keyed records, your copy program copies keys and
records. If your files have arrays of bytes, your copy program copies
arrays of bytes. The complexity of the file system doesn't preclude
writing universal utilities.
 
> > Why
> > not let the real data be the one with the pointers and strong types and
> > such, and only use streams when you want to communicate with a different
> > OS?
> 
> Because a plain text file is readable by 4 million different programs.
> Because the format for PNG is set in stone and readable by 40,000 different
> programs. 

Well, why don't you read what I said? Of course, the serialized data
formats are readable by any number of programs. That's why it's
important for programs that have data they want to port elsewhere to
serialize them. 

However, if the *only* way to store the data is to serialize it, then
every program has to go throught the overhead of parsing the data.

Look at it this way. To deal with GIF, you use a library. To deal with
PNG, you use a library. To deal with XML, you use a library. This
library's job is to convert from one serialized format (the GIF or PNG
or XML file) to another serialized format (stored in the swap space). It
also converts back again. If you're going to have that internal format
anyway, why not allow it to get stored directly? If it's the
representation of a type that has a defined external serialization, you
write a "serialize" and "deserialize" routine for that type. But guess
what? You have to do that *anyway* in your model.

In addition, this library you use for GIF, why not make *that* the
content-type of the file, rather than some arbitrary string? Why not say
"this file is a GIF because the interface to it is libgif.so"? Or "this
file is a GIF because the interface to it is with File.Image.GIF"?
That's what I'm suggesting, in essence. Of course, once you do that,
files start acting more like persistant variables than they do arrays of
bytes.

Indeed, you don't *really* have arrays of bytes in a UNIX-like OS. You
have calls like "read" and "write" that transfer *something* from a file
into your buffers. The buffers are arrays of characters, but the files
aren't. The OS does the serialization for you there too. Surely writing
to /dev/audio isn't transfering an array of bytes anywhere. Writing to a
fragmented file isn't transfering an array of bytes - the OS is dealing
with disk blocks and channel programs and stuff like that.

Think about the Windows Registry. The number of times you actually want
to turn that into a text file is minute compared to the number of times
you want to access it as a structured data store. The registry is not
defined in terms of the bytes that are in the file. It's defined in
terms of the interfaces to it. This is a *far* superior way of working
things, upward-compatibility-wise and ease-of-use wise. (Witness the
fact that Win3.1 "INI file" interfaces still work with Windows, but
Emacs can't keep its internal formats working. :-)

> However, the internal format for Emacs has been known to change
> between versions. It's hard to keep internal structures the same; much
> easier to conform to a consistent external standard.

Errr, and the point is? Perhaps that emacs internal formats, not
intending to be ported, don't need to be serialized except for the
primitive nature of the data the OS is willing to handle? And that
therefore when you upgrade emacs, you have to recompile the scripts the
first time you use them, in return for having far better performance for
the other 99.44% of the time you're not upgrading emacs?

> Also, I've worked with programs that used a database and those that use
> plain text files, and the latter seem more reliable. 

Sure thing. Tell this to the folks running the airline reservation
programs.

-- 
Darren New 
San Diego, CA, USA (PST). Cryptokeys on demand.