From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD,
	FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,b5cd7bf26d091c6f
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!news3.google.com!feeder.news-service.com!feeder.erje.net!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: Natasha Kerensikova <lithiumcat@gmail.com>
Newsgroups: comp.lang.ada
Subject: Re: Reading the while standard input into a String
Date: Mon, 6 Jun 2011 10:46:20 +0000 (UTC)
Organization: A noiseless patient Spider
Message-ID: <slrniupbvs.i18.lithiumcat@sigil.instinctive.eu>
References: <slrniunb6n.i18.lithiumcat@sigil.instinctive.eu>
 <pl7du6ibfnw.p03vhf1w4viu.dlg@40tude.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 6 Jun 2011 10:46:20 +0000 (UTC)
Injection-Info: mx04.eternal-september.org;
 posting-host="Mda950WjNwNLAFOE7yJXQw";
	logging-data="744"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX19pvHgZnoUUsg01Xo2F3QQE"
User-Agent: slrn/0.9.9p1 (FreeBSD)
Cancel-Lock: sha1:Wu76aCu3x79EKnpNWIYONDbj6ik=
Xref: g2news1.google.com comp.lang.ada:19633
Date: 2011-06-06T10:46:20+00:00
List-Id: <comp.lang.ada>

Hello,

On 2011-06-06, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
> On Sun, 5 Jun 2011 16:20:39 +0000 (UTC), Natasha Kerensikova wrote:
>
>> However I still read
>> character by character
>
> You have to, because the definition of line end is language/OS/encoding
> dependent, so in order to detect line ends properly you need to scan
> characters one by one, maybe recoding them into the encoding used by the
> parser (e.g. UTF-8). It does not make much sense to read input by arbitrary
> size chunks. Read it line by line. If parser needs returns over the line
> margin (unlikely), then keep read lines cached.

The line end detection problem is exactly why I wanted unprocessed input
bytes. Each instance of the parser code can get at least LF-ended lines
(from unix files) or CR&LF-ended lines (from web form), so unless
Ada.Text_IO can deals with this (but I guess I cannot really count on
it), I have to do it in my own code.

>>  into a temporary buffer,
>
> Read it into the destination buffer.

Well the destination buffer for the processed text is a very inefficient
place to store input text, because the processing involves a lot of
insertions.

Moreover, because of the forward reference issue I detailed in another
post, I cannot see how I can escape the schema:
input stream --> temporary buffer --> output stream/buffer/storage

> Don't use Unbounded_String; that is a
> bad idea in almost all cases, this one included.

Would you explain why?

Unless there is a way to predict the left of the input, I need some
text container able to grow as much as needed while reading.

I will also need a similar container during the processing, and even
GNAT's improved Unbounded_String still does a lot of a reallocations in
the process. So I was considering implementing my own container,
something like Chunked_Unbounded_String, which would allocate memory by
chunks of fixed size (probably provided by a generic package parameter,
using a few kilobytes) and thereby improve a lot performance of lots of
small Appends. But I guess you weren't calling using Unbounded_String a
bad idea only because of performance, were you?


Thanks for your comments,
Natasha