From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,ead02e7101c0c023
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2001-03-01 06:00:48 PST
Path: 
 supernews.google.com!sn-xit-02!supernews.com!news.gv.tsc.tdk.com!newsfeed.berkeley.edu!ucberkeley!freenix!fr.clara.net!heighliner.fr.clara.net!grolier!btnet-peer0!btnet!news5-gui.server.ntli.net!ntli.net!news11-gui.server.ntli.net.POSTED!not-for-mail
Message-ID: <3A9EC49C.9F5CD8D6@linuxchip.demon.co.uk>
From: Dr Adrian Wrigley <amtw@linuxchip.demon.co.uk>
X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.14-5.0smp i686)
X-Accept-Language: en
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Problems with large records (GNAT) [continued]
References: <3A9E05B0.46B406ED@linuxchip.demon.co.uk>
 <esmn6.41158$5M5.2034803@news1.frmt1.sfba.home.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Thu, 01 Mar 2001 13:52:28 -0800
NNTP-Posting-Host: 62.253.132.124
X-Complaints-To: abuse@ntlworld.com
X-Trace: news11-gui.server.ntli.net 983454695 62.253.132.124 (Thu,
 01 Mar 2001 13:51:35 GMT)
NNTP-Posting-Date: Thu, 01 Mar 2001 13:51:35 GMT
Organization: ntl Cablemodem News Service
Xref: supernews.google.com comp.lang.ada:5349
Date: 2001-03-01T13:52:28-08:00
List-Id: <comp.lang.ada>

tmoran@acm.org wrote:
> I realize you are interested in the generic large-memory-with-Gnat
> problem, but, as they say, sometimes it's better to improve the
> algorithm than the hardware.

I have the same frustration with this problem as with things like
segmented memory architectures, short index registers etc.
They all tend to result in less robust code, or a lot more work.
Hitting one of the various memory limits is one of the
common problems I encounter running GNAT/Linux.

I plan to go to (partial) intra-day data sometime, so that will
need a better representation.

...
> For historical data, stock prices can be 16 bit fixed point with delta of
> 1/8, rather than 32 bit floats (excluding Berkshire-Hathaway).  Even
> with prices in pennys nowadays, 24 bits should be quite enough for a
> stock price.  Similarly, a 24 bit Volume (16 million shares of one
> stock traded in one day) should be normally be adequate, perhaps with
> an exception list for anything that doesn't fit.  A sixteen bit fixed
> point value, with suitable delta, should be fine for holding the
> split correction, or 24 bits if you really want to allow for even the
> most bizarre changes.

I decided that 16 bits was inadequate.  Even with prices in the
range $0.05 to $500, you need 20 bits to accommodate a delta
representing 1% at the bottom end.  Companies that have had
a lot of splits and dividends in their history have very
small prices back in the '70s. Perhaps a 16 bit logarithm of
the share price would be OK. (and even speed up volatility
calculations!)

With volume, I think that really needs to better than 32 bit range.
Once you start to calculate weekly or monthly volumes, quite a number
of companies exceed 2**32 shares. (and in some countries, they
even trade fractional shares routinely).  Maybe you've seen the
WWW sites of historic data that show Intel's monthly share volume
as things like "-1518500200 shares".  I mentioned this problem
to Yahoo nearly a year ago, but they haven't fixed it.

When it comes down to it, it is a matter of confidence and simplicity.
Fixed point for this wide ranging data doesn't give me the confidence
I want from a (mission critical) financial application.
I hadn't thought of using 24 bit values, and I think they would
not be worthwhile here given the issues involved.

>   I don't know what kind of processing you are
> doing, but usually one processes a small number of complete time
> series, or the complete market for just a few days, so only a few
> rows or columns of the complete matrix need be in RAM at any one time.

That's why I want a very fast data access method...  I want to
scan all the stocks over all the times.  Sometimes I access the data
sparsely as well.  With mmap, the data from one invocation to another
remain in RAM, and can be completely scanned in only a few seconds.
Maybe someday there will be a standard persistent object store
package in the Ada standard.  Loading data from files into RAM
tends to be amazingly slow, when the file and the in-memory
representation are both as big as the physical memory - and
my machine has no free memory slots :(
--
Adrian Wrigley