From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=BAYES_00,INVALID_MSGID, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,5997b4b7b514f689 X-Google-Attributes: gid103376,public From: Rex Reges Subject: Re: Reading a line of arbitrary length Date: 1997/02/14 Message-ID: <3304BCFB.207B@mds.lmco.com>#1/1 X-Deja-AN: 218814004 References: <5ds40o$rpo@fg70.rz.uni-karlsruhe.de> <33032AE2.666F@mds.lmco.com> <33037A74.44AF@mds.lmco.com> Content-Type: text/plain; charset=us-ascii Organization: M&DS Mime-Version: 1.0 Reply-To: rex.r.reges@lmco.com Newsgroups: comp.lang.ada X-Mailer: Mozilla 3.0 (Win95; I) Date: 1997-02-14T00:00:00+00:00 List-Id: Robert Dewar wrote: > If your point is that it is non-trivial to ensure efficient execution of > programs that use variable length strings that are millions of characters > long, then yes, that is true, it is non-trivial, and also non-important! > What you say about very large variable length strings is true. I did not clearly express what I was talking about: one very long string and that string is of fixed length. The case I was thinking of is a distributed real-time simulator. It was distributed in that it consisted of several Ada mains which may or may not run on the same CPUs within the same computer. It was real-time in that some of the processes had to repeatedly complete a specific amount of work within set frames of 120 Hz, 60Hz and 30Hz. The problem was how to pass string information quickly. This string information consisted of error messages, symbol names, and commands. The solution used was to create a string heap that was similar to the regular heap except that persistent strings were compressed and that each process could add to its string heap from entries in another processes string heap. Integer IDs for strings were provided at string creation time and were gauranteed to be unique among all the processes. The code which used the string heap was unaware of the compression and communication mechanisms. This allowed strings to be passed around as if using access types, but provided global distributed strings. The large string issue arose in the string heap itself. A single large fixed-length string was used as the heap. Start index and size were used to keep track of each string heap entry. It was possible to save the string heap to a file and then restore it later. This was useful for the symbol table data which doesn't change from one execution to another. This caused a problem on the Vax: not being able to read in a 5 million character string in one chunk. I've left out a lot of details which are hopefully irrelevant to this discussion, but I would like to mention the compression needs. The symbol information consisted of 70,000 symbols which were fully qualified Ada names. It also included the enumeration values for those symbols which were enumeration types. This information alone was over 5 megabytes. The compression mechanism was simple: a vocabulary of frequently occuring substrings was created, then each symbol was saved as an array of vocabulary entries. The constraints of the real-time response precluded the usual fixes like storing strings in a file. It was necessary to keep the memory usage to a minimum so that the programs could be locked in memory to prevent page swapping. It's good news to hear about the pattern matching! Up to now I've been satisfied writing string tools as needed, but the general feeling I got from other Ada programmers is that Ada doesn't do strings. -- Rex Reges or you can call me The Fixer Systems Analyst or you can call me The Lawyer Lockheed Martin, M&DS or you can call me The Doctor (610)354-5047 or you can call me Rexasaurus