From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,ead02e7101c0c023 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2001-03-01 06:00:48 PST Path: supernews.google.com!sn-xit-02!supernews.com!news.gv.tsc.tdk.com!newsfeed.berkeley.edu!ucberkeley!freenix!fr.clara.net!heighliner.fr.clara.net!grolier!btnet-peer0!btnet!news5-gui.server.ntli.net!ntli.net!news11-gui.server.ntli.net.POSTED!not-for-mail Message-ID: <3A9EC49C.9F5CD8D6@linuxchip.demon.co.uk> From: Dr Adrian Wrigley X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.14-5.0smp i686) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Problems with large records (GNAT) [continued] References: <3A9E05B0.46B406ED@linuxchip.demon.co.uk> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Thu, 01 Mar 2001 13:52:28 -0800 NNTP-Posting-Host: 62.253.132.124 X-Complaints-To: abuse@ntlworld.com X-Trace: news11-gui.server.ntli.net 983454695 62.253.132.124 (Thu, 01 Mar 2001 13:51:35 GMT) NNTP-Posting-Date: Thu, 01 Mar 2001 13:51:35 GMT Organization: ntl Cablemodem News Service Xref: supernews.google.com comp.lang.ada:5349 Date: 2001-03-01T13:52:28-08:00 List-Id: tmoran@acm.org wrote: > I realize you are interested in the generic large-memory-with-Gnat > problem, but, as they say, sometimes it's better to improve the > algorithm than the hardware. I have the same frustration with this problem as with things like segmented memory architectures, short index registers etc. They all tend to result in less robust code, or a lot more work. Hitting one of the various memory limits is one of the common problems I encounter running GNAT/Linux. I plan to go to (partial) intra-day data sometime, so that will need a better representation. ... > For historical data, stock prices can be 16 bit fixed point with delta of > 1/8, rather than 32 bit floats (excluding Berkshire-Hathaway). Even > with prices in pennys nowadays, 24 bits should be quite enough for a > stock price. Similarly, a 24 bit Volume (16 million shares of one > stock traded in one day) should be normally be adequate, perhaps with > an exception list for anything that doesn't fit. A sixteen bit fixed > point value, with suitable delta, should be fine for holding the > split correction, or 24 bits if you really want to allow for even the > most bizarre changes. I decided that 16 bits was inadequate. Even with prices in the range $0.05 to $500, you need 20 bits to accommodate a delta representing 1% at the bottom end. Companies that have had a lot of splits and dividends in their history have very small prices back in the '70s. Perhaps a 16 bit logarithm of the share price would be OK. (and even speed up volatility calculations!) With volume, I think that really needs to better than 32 bit range. Once you start to calculate weekly or monthly volumes, quite a number of companies exceed 2**32 shares. (and in some countries, they even trade fractional shares routinely). Maybe you've seen the WWW sites of historic data that show Intel's monthly share volume as things like "-1518500200 shares". I mentioned this problem to Yahoo nearly a year ago, but they haven't fixed it. When it comes down to it, it is a matter of confidence and simplicity. Fixed point for this wide ranging data doesn't give me the confidence I want from a (mission critical) financial application. I hadn't thought of using 24 bit values, and I think they would not be worthwhile here given the issues involved. > I don't know what kind of processing you are > doing, but usually one processes a small number of complete time > series, or the complete market for just a few days, so only a few > rows or columns of the complete matrix need be in RAM at any one time. That's why I want a very fast data access method... I want to scan all the stocks over all the times. Sometimes I access the data sparsely as well. With mmap, the data from one invocation to another remain in RAM, and can be completely scanned in only a few seconds. Maybe someday there will be a standard persistent object store package in the Ada standard. Loading data from files into RAM tends to be amazingly slow, when the file and the in-memory representation are both as big as the physical memory - and my machine has no free memory slots :( -- Adrian Wrigley