comp.lang.ada
 help / color / mirror / Atom feed
From: Niklas Holsti <niklas.holsti@tidorum.invalid>
Subject: Re: Text_IO, was: Re: Something I don't understand
Date: Wed, 19 Feb 2014 23:45:47 +0200
Date: 2014-02-19T23:45:47+02:00	[thread overview]
Message-ID: <bmkn0bF2498U1@mid.individual.net> (raw)
In-Reply-To: <7gb1iv15zuh$.1qbeifwuyvuoa.dlg@40tude.net>

On 14-02-19 16:13 , Dmitry A. Kazakov wrote:
> On Wed, 19 Feb 2014 15:20:18 +0200, Niklas Holsti wrote:
> 
>> On 14-02-19 11:40 , Dmitry A. Kazakov wrote:
>>> On Wed, 19 Feb 2014 10:36:29 +0200, Niklas Holsti wrote:
>>>
>>>> On 14-02-18 11:31 , Dmitry A. Kazakov wrote:
>>>>> On Tue, 18 Feb 2014 11:00:44 +0200, Niklas Holsti wrote:
>>>
>>>>> To me text buffer, stream, file, string are all instances of the class of
>>>>> types over which Put dispatches. OK, we can call the abstract root type of
>>>>> the class "Text."
>>
>> My notion of type "Text" is an internal representation of text meant for
>> human reading and viewing. I don't see any logical need for making this
>> type a class; there would be only one predefined (and private) type.
> 
> Class is needed because there must be more than one implementation of the
> interface

Why? It is not expected that there should be more than one
implementation, in the same Ada program, of Ada.Containers.Vectors, for
example.

The "Text" type I am talking about is kind of a container for structured
text ("text" as in meaning (2) below).

> and because the interface itself need to be extended in order to
> have Ada library implementations of reasonable size and complexity.

That does not convince me. Mere volume (e.g. number of operations) is
not, to me, a reason for splitting a coherent interface into different
packages.

>> By the way, perhaps the word "text" is ambiguous. I think it is time to
>> make a clear distinction between:
>>
>> (1) a text file (sometimes called an "ASCI file"), which is a sequence
>> of basic symbols (e.g. Character or Wide_Character) used to represent
>> *data* for either reading by another program, or for human reading
>> (without formatting), and
> 
> Sequence of symbols is a string.
> 
> Text is at least a sequence of strings (lines).

True for meaning (1), false for meaning (2) (see below).

Text (in meaning (2)) is a sequence of paragraphs (with more structure
at both higher and lower levels). Lines only appear after the text (2)
is rendered.

>> (2) a text meant only for human reading/viewing and therefore to be
>> rendered as nicely and readably as the chosen viewing device allows.
>> That some parts of the text can be seen as sequences of characters is
>> secondary, and the specific characters and their sequence can change
>> according to the rendering.
> 
> Rendered text is a text so long the reader can reconstruct (1) from (2). So
> in effect (1) and (2) are equivalent in the sense that both are (1).

No, text (2) is *not* the result of rendering text (1). Text (2) is
logically structured text (probably some kind of tree) that has been
deliberately constructed to have structure.

There are, of course, forms of text (1), such as HTML or other mark-up
languages, which can represent text (2) and which can be interpreted
into text (2) which can then be rendered. It is also possible to extract
character strings from rendered text (2), discarding the structure --
for example, PDF-to-text conversion.

The existence of such interpretations or lossy conversions does not mean
that text (1) and text (2) are equivalent.

To make it very concrete: under Windows you use Notepad to create text
(1), but MS-Word to create text (2).

>> Ada.Text_IO implements mainly (1), with some basic support for
>> typewriter-style formatting (column spacing, line spacing, page tracking).
>>
>> The "Text" type I am talking about aims to be the internal
>> representation of (2), before rendering on some viewing device.
> 
> Why should anybody care about (2)?

Because users want to see nicely formatted output, and I am suggesting
that text (2) is a way to implement that in Ada, by separating the
logical construction of the text (2) from its rendering. As you say,
this is very like the idea of HTML, but HTML has other aims, too
(hypertext, web forms, ...)

I thought we (that is, you and I, perhaps not the other participants in
the thread) were discussing improving the ability to emit nicely
formatted, readable text from Ada programs. If not, I'll stop here.

> Why Text_IO should have anything to do with (1)?

Text (1) can also be processed with Sequential_IO, but usually text (1)
is divided into syntactical, meaningful tokens, such as keywords or
decimal numbers, and Text_IO provides (rudimentary) facilities for
generating such tokens on output, and lexically scanning and evaluating
such tokens on input.

The situation is the same in other programming languages, which all
provide such token-oriented input-output for text (1) files. For example
printf/scanf in C.

> One of the issues Text_IO had was inference into text issues
> (e.g. pages etc). There is nothing wrong with pages except that 80% of
> formatting and editing does not care about pages.

That is true today, because output to printed paper sheets plays such a
small role today. Decades ago, when such output was very common at least
in information systems, pagination was quite important. Think Report
Generator, page headers/footers, page subtotals, page numbering... Of
course it can all be programmed without pagination support in Text_IO,
but that is the old argument about do-it-yourself vs. standard libraries
or standard language facilities.

OMG, this argument is old... Algol 60 left the definition of I/O
facilities to the implementation... Algol 68 put standard I/O for text
(1) into the language.

> The result is abstraction inversion, and you want to make it
> only worse.

I don't see that at all.

>>>> I'm thinking of two levels of "Put":
>>>>
>>>>    Put (To : in out Text, Item : in String);
>>>>       Add items to a Text, building a logically structured Text,
>>>>       but without rendering it yet. This wil probably need
>>>>       some concept of "points in a Text where more stuff can
>>>>       be inserted" so that the Put can preserve or extend the
>>>>       logical Text structure.
>>>>
>>>>    Put (To : in out File; Item : in Text);
>>>>       Render the Text into some external File.
>>>>
>>>> The Text buffer intermediary means that each level of Put can (if
>>>> desired) be dispatching on one of the parameters, without needing
>>>> multiple dispatch.
>>>
>>> This was attempted before, many many times, actually. From PostScript to
>>> HTML, an intermediate language that would take care of separating higher
>>> level formatting from lower level rendering. It never worked how many times
>>> tried.
>>
>> Uh... surely PostScript and HTML "work". I'm pretty sure that a large
>> fraction, perhaps even a majority of programs today generate most of
>> their human-readable output as HTML.
> 
> Which is why quality of text is so miserable and why the modern OS makes a
> i486 out of whatever many cores, gigahertz and terabytes monster you run it
> on.

You are changing the issue from "works" to "needs a lot of resources and
is misused". So you lose the argument :-)

> And of course generating HTML or parsing it is no way simpler than
> traditional formatting in any sense of that.

What do you mean by "traditional formatting"?

I'm not suggesting that the Ada programer would write code to generate
HTML. The Ada programmer's code would create a "Text" (i.e. a text (2)),
which can be output as HTML, if it is emitted to a device/file/channel
which wants HTML. To other devices the same "Text" could be emitted in
other forms. A GUI toolkit could use its own text-rendering functions to
render the "Text" in a window.

>>> And for sure, it will be even more hated than Text_IO page formatting is,
>>> because the overhead will be far bigger. Imagine describing the semantics
>>> of, say, conversion of File, Stream, String to Text and backward.
>>
>> Overhead compared to what?
> 
> Compared to direct dispatch to the implementation tailored for the give
> medium. Why do I need HTML in order to write a stream or memory string?

Stream or memory string are text (1), not text (2).

I'm not suggesting that Text_IO should be removed. Of course Ada
programs must be able to read and write files of text (1). It may even
be desirable to extend the token-level formatting abilities of Text_IO
or the Image functions, either by more parameters or by templates or
pictures.

But my main point is that if our goal is text formatted to modern
standards of appearance (with proportional fonts et cetera) then the
rendering cannot be made token by token, or Put (item) by Put (item);
the renderer must work on a whole text (2). Just as a browser must have
a whole HTML <table> in order to display any cell of the table in its
final form.

>> If the need is to generate nicely formatted
>> output, rendered in device-specific ways, and typewriter formatting is
>> not enough, what is the alternative?
> 
> I don't understand the question. It is not about alternatives, the
> formatting must be done. It is about decomposition of the task into
> software components. I don't want any middlemen especially such that in
> order of magnitude more complex than direct formatting.

Can you give an example of what you mean by "direct formatting"?

>> The overheads of Text_IO are important only when processing large text
>> *data* files (meaning (1) of "text"). For generating human-readable text
>> (meaning (2)), especially in an interactive context, the overheads are
>> utterly negligible.
> 
> It is not completely true. Huge amounts of readable texts are processed
> without any human intervention. For those Text_IO performance is a big
> problem: Ada project compilation,

Ada source-code is clearly text (1), text as data, even if it is also
human-readable and human-writable. I agree that Text_IO overhead can be
significant in text (1) processing, as I said. I believe that GNAT, for
example, reads its source files by mapping the whole file into memory,
and does not use Text_IO, and not even the OS bulk read() function. (But
such tricks are of course system-specific, non-standard.)

> syntax highlighting, WYSIWYG text processing etc.

If you are talking about interactive applications, at most a screenful
of text is shown at a time, therefore Text_IO overheads would be
insignificant. But I think that this kind of application would not use
Text_IO anyway, because the formatting capabilities of Text_IO would not
be sufficient. My proposed "Text", that is text (2), should work better.

>> I don't see any need for converting a File/Stream *into* Text, unless
>> the File/Stream is a serialized representation of the full internal
>> structure of a Text object, in which case the File/Stream structure is
>> private and normal serialization/deserialization methods apply.
> 
> But your proposal was:
> 
>    procedure Get (From : in out Text; Value : out Integer);
>    procedure Get (From : in out File_Type; Item : out Text);

No, I proposed Put operations, not Get operations. "Text" (that is, text
(2)) is meant only for output, not for input. For input, use text (1).

Not all output is symmetric with input. Think about an audio device: it
is simple to generate audio containing any specified mix and score of
pure tones or instrument voices; it would be much more difficult to
recover the same score and instrument/voice parameters by digitizing and
analysing the generated audio signal. Even more so for the analogous
input/output of video.

>> I don't intend that the type "Text" should be so fancy and complete that
>> it could be used as such to implement an advanced word processor.
>> Following the same rationale as Ada.Containers, "Text" should provide as
>> much functionality as can be expected to be useful for (and used by)
>> many Ada programs and programmers, but programmers requiring high
>> performance or high/specific functionality would have to implement more
>> advanced "text" representations themselves.
> 
> And so you need it extensible, ergo, a class.

Not so, if we accept the same rationale as for Ada.Containers: they are
meant for use in applications without extreme or special demands on the
performance of the containers. The Ada container types are tagged types,
but you cannot create a better-performing or significantly more
functional container by deriving from a container type; you can only
extend the interface with new operations or override some operations to
work differently.

Ok, so that leads to ad-hoc polymorphism, which has some draw-backs, as
you have explained. But IMO not a show-stopper.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
      .      @       .

  parent reply	other threads:[~2014-02-19 21:45 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-13 23:57 Something I don't understand Laurent
2014-02-14  0:18 ` adambeneschan
2014-02-14  7:05   ` Charles H. Sampson
2014-02-15 15:27   ` Laurent
2014-02-15 19:10     ` Laurent
2014-02-15 20:05       ` Niklas Holsti
2014-02-15 21:16         ` Laurent
2014-02-15 21:40       ` Jeffrey Carter
2014-02-16  1:39       ` Robert A Duff
2014-02-16  9:08         ` Text_IO, was: " Simon Clubley
2014-02-16  9:43           ` Dmitry A. Kazakov
2014-02-16 16:57             ` Dennis Lee Bieber
2014-02-16 16:17           ` Robert A Duff
2014-02-17 12:52             ` Simon Clubley
2014-02-17 15:32               ` G.B.
2014-02-17 15:35                 ` G.B.
2014-02-17 17:34                 ` Mike H
2014-02-17 16:59               ` Niklas Holsti
2014-02-17 17:17                 ` Dmitry A. Kazakov
2014-02-17 17:42                   ` Niklas Holsti
2014-02-17 19:55                     ` Dmitry A. Kazakov
2014-02-18  7:14                       ` Niklas Holsti
2014-02-18  8:40                         ` Dmitry A. Kazakov
2014-02-18  9:00                           ` Niklas Holsti
2014-02-18  9:31                             ` Dmitry A. Kazakov
2014-02-19  8:36                               ` Niklas Holsti
2014-02-19  9:40                                 ` Dmitry A. Kazakov
2014-02-19 13:20                                   ` Niklas Holsti
2014-02-19 14:13                                     ` Dmitry A. Kazakov
2014-02-19 15:37                                       ` Georg Bauhaus
2014-02-19 16:32                                         ` Laurent
2014-02-19 17:46                                           ` Simon Clubley
2014-02-20  2:39                                         ` Dennis Lee Bieber
2014-02-20 11:44                                           ` G.B.
2014-02-19 21:45                                       ` Niklas Holsti [this message]
2014-02-20  9:52                                         ` Dmitry A. Kazakov
2014-02-20 18:19                                           ` Niklas Holsti
2014-02-19 15:06                                     ` Robert A Duff
2014-02-19 17:03                                       ` Niklas Holsti
2014-02-19 22:30                                         ` Robert A Duff
2014-02-17 18:13                 ` Simon Clubley
2014-02-17 20:09                   ` Dmitry A. Kazakov
2014-02-18  7:50                     ` Georg Bauhaus
2014-02-18  8:28                       ` Dmitry A. Kazakov
2014-02-17 20:22                   ` Niklas Holsti
2014-02-18  0:50                     ` Simon Clubley
2014-02-18  6:56                       ` Niklas Holsti
2014-02-18  8:04                         ` Georg Bauhaus
2014-02-19 22:01                     ` Robert A Duff
2014-02-20  8:25                       ` Dmitry A. Kazakov
2014-02-20 15:54                         ` Robert A Duff
2014-02-20 17:54                           ` Dmitry A. Kazakov
2014-02-20 20:45                       ` Niklas Holsti
2014-02-19 21:52                   ` Robert A Duff
2014-02-20  0:50                     ` Simon Clubley
2014-02-19 21:46                 ` Robert A Duff
2014-02-20  0:09                   ` Jeffrey Carter
2014-02-20  1:09                     ` Simon Clubley
2014-02-20  7:06                       ` Niklas Holsti
2014-02-20 13:05                         ` Simon Clubley
2014-02-20 11:51                       ` G.B.
2014-02-20 12:53                         ` Simon Clubley
2014-02-21 11:50                       ` Brian Drummond
2014-02-23 21:37                         ` AdaMagica
2014-02-23 23:23                           ` Bill Findlay
2014-02-24  4:29                           ` AdaMagica
2014-02-24 12:22                           ` Brian Drummond
2014-02-24 19:03                             ` AdaMagica
2014-02-20 20:02                   ` Niklas Holsti
2014-02-19 21:15               ` Robert A Duff
2014-02-19 22:01                 ` Simon Clubley
2014-02-16 14:50         ` Mike H
2014-02-17 16:09         ` Laurent
2014-02-17 17:42           ` Mike H
2014-02-18  1:05             ` Dennis Lee Bieber
2014-02-17 22:31           ` Jeffrey Carter
2014-02-19 12:51             ` Laurent
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox