comp.lang.ada
 help / color / mirror / Atom feed
From: Robert Dewar <robert_dewar@my-deja.com>
Subject: Re: Looking for keyed file package
Date: 1999/09/26
Date: 1999-09-26T00:00:00+00:00	[thread overview]
Message-ID: <7sju61$pdv$1@nnrp1.deja.com> (raw)
In-Reply-To: 1999Sep25.114101.1@eisner

In article <1999Sep25.114101.1@eisner>,
  Kilgallen@eisner.decus.org.nospam wrote:
> Since you say "in main memory as possible (and ...in cache",
> I gather you are talking about processor memory cache, whereas
> what I meant was having the index in the caches created by the
> indexed file system in main memory.

Yes, exactly

> Presuming the actual data will likely require a disk read,
> I think the difference between index data being in processor
> cache and being in the file system cache in main memory is
> not of consequence.

That's not necessarily the case, there can be substantial
processing involved in accessing the index, and processor
caching is a factor.

> The data I have been dealing with this month have a maximum
> of 16 bytes of key data for a record length of 600 bytes,
> so the standard recommendation to set the cache size large
> enough to hold the entire index tree in memory seems
> reasonable.

Sure if you have a small file where the indexes can fit in
main memory even if not compressed, that is a simple case
to deal with.

> Memory is cheap enough, however, that even not having the
> keys compressed would be acceptable in this setting. That
> would be quite different with 584 bytes of key data for
> a record length of 600 bytes.  Of course particular key
> data can be more or less susceptible to compression.

Actually that's wrong, with modern key compression techniques
you can get down to less than 16 bits per key on average quite
regardless of the distribution of key data.

This is a rather specialized area, and the techniques involved
are rather tricky. To get an intuitive feeling for why the above
is true, consider the following extremes

1. Keys are very very different, in this case you only need a
little bit of the key to differentiate them (this is trailing
key compression in action, discarding trailing data in the
key that is not significant).

2. Keys are very very similar, in this case leading key
compression becomes effective.

The combination is effective in all situations. As I mentioned
earlier, the technique used in the Realia COBOL compiler gets
down to about 16 bits per key, including control information,
pretty much regardless of key distribution.

Yes, of course for a small file, there is no issue, but the
point is that key compression greatly extends the size of
files that can be handled efficiently. Memory is cheap, but
files are in practice getting bigger at least as fast as
main memory size is increasing.

Consider as an example, the ebay files, do some simple
calculations here (remembering that you can do full searches
on titles and expect instant response). You will see that
files like this get pretty big, pretty quickly :-)


Sent via Deja.com http://www.deja.com/
Before you buy.




  reply	other threads:[~1999-09-26  0:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <37E817C6.80ED41E0@easystreet.com>
     [not found] ` <7saii8$5bl$1@nnrp1.deja.com>
1999-09-22  0:00   ` Looking for keyed file package David Botton
1999-09-22  0:00   ` Al Christians
1999-09-22  0:00     ` Vladimir Olensky
1999-09-22  0:00       ` Al Christians
1999-09-22  0:00         ` David Botton
1999-09-23  0:00     ` Robert Dewar
1999-09-22  0:00       ` Al Christians
1999-09-23  0:00   ` p.obry
1999-09-24  0:00     ` Robert Dewar
1999-09-24  0:00       ` Robert Dewar
1999-09-24  0:00         ` Larry Kilgallen
1999-09-25  0:00           ` Robert Dewar
1999-09-25  0:00             ` Larry Kilgallen
1999-09-26  0:00               ` Robert Dewar [this message]
1999-09-26  0:00                 ` David Botton
1999-09-28  0:00                   ` Robert Dewar
1999-09-28  0:00                     ` David Botton
1999-09-28  0:00                       ` Ted Dennison
1999-09-28  0:00                         ` Robert Dewar
1999-09-22  0:00 ` p.obry
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox