File output and buffering

comp.lang.ada
 help / color / mirror / Atom feed

* File output and buffering
@ 2008-08-19 20:27 Maciej Sobczak
  2008-08-20  6:45 ` Georg Bauhaus
  2008-08-20  8:43 ` Maciej Sobczak
  0 siblings, 2 replies; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-19 20:27 UTC (permalink / raw)


It seems to me that the file output in standard Ada library is not
buffered:
1. There is no buffer-related operation in the whole library.
2. The semantics of output operations is defined in terms of the
effects on external file.
3. The performance of simple test is consistent with what can be
obtained in equivalent C code that flushes the channel after every
operation (ie. some 15-20x slower than with default buffering).

Let's suppose that I want to add buffering to my output. I can write
the stream type that does the necessary magic, but how can I reuse the
formatting machinery that is already available in Ada.Text_IO and
related packages?

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-19 20:27 File output and buffering Maciej Sobczak
@ 2008-08-20  6:45 ` Georg Bauhaus
  2008-08-20  8:43 ` Maciej Sobczak
  1 sibling, 0 replies; 20+ messages in thread
From: Georg Bauhaus @ 2008-08-20  6:45 UTC (permalink / raw)

Maciej Sobczak wrote:

> Let's suppose that I want to add buffering to my output. I can write
> the stream type that does the necessary magic, but how can I reuse the
> formatting machinery that is already available in Ada.Text_IO and
> related packages?

Some formatting procedures from {Number}_IO and from Editing
can write to a String instead of to a File_Type.
Can you stream the strings to a buffer?

There is an article on AdaPower entitlet something like
"How to access memory as a String". I think it will
illustrate reliable tricks, possibly of some use when
handling data in the "external" world.

In any case, char_array values are good for OS
procedures of names like write, read, and so on.

-- 
Georg Bauhaus
Y A Time Drain  http://www.9toX.de

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-19 20:27 File output and buffering Maciej Sobczak
  2008-08-20  6:45 ` Georg Bauhaus
@ 2008-08-20  8:43 ` Maciej Sobczak
  2008-08-20  8:59   ` Maciej Sobczak
  1 sibling, 1 reply; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-20  8:43 UTC (permalink / raw)

On 19 Sie, 22:27, Maciej Sobczak <see.my.homep...@gmail.com> wrote:

> It seems to me that the file output in standard Ada library is not
> buffered:
> 1. There is no buffer-related operation in the whole library.
> 2. The semantics of output operations is defined in terms of the
> effects on external file.
> 3. The performance of simple test is consistent with what can be
> obtained in equivalent C code that flushes the channel after every
> operation (ie. some 15-20x slower than with default buffering).

Now I'm puzzled, because it looks like the files are written in chunks
of 32kB. In other words, nothing is written to the file until the
total output accumulated to 32kB and the step is preserved for each
future write - this indicates that the buffering is actually in use.

My original observations become questions:

1. Why there is no buffer-related operation in the whole library?
In particular: how can I *flush* the buffer?
This is very important for log files. I have discovered this exactly
because the log is not written synchronously with Put operations,
which makes it "a bit" less useful. How can I make sure that what I
Put is actually written? Closing a file after each Put does not make
much sense.

2. What about the semantics of Put?

3. Why is buffered Ada.Text_IO as slow as non-buffered C's stdio? Who
is eating the 20x factor?

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-20  8:43 ` Maciej Sobczak
@ 2008-08-20  8:59   ` Maciej Sobczak
  2008-08-20  9:21     ` Dmitry A. Kazakov
  2008-08-20 13:19     ` Georg Bauhaus
  0 siblings, 2 replies; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-20  8:59 UTC (permalink / raw)


On 20 Sie, 10:43, Maciej Sobczak <see.my.homep...@gmail.com> wrote:

I will answer myself:

> 1. Why there is no buffer-related operation in the whole library?

Heh, there is.

> In particular: how can I *flush* the buffer?

By calling Ada.Text_IO.Flush.

Which means that Georg Bauhaus fell into the trap of my confusion. :-)

Still valid question:

> 3. Why is buffered Ada.Text_IO as slow as non-buffered C's stdio? Who
> is eating the 20x factor?

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-20  8:59   ` Maciej Sobczak
@ 2008-08-20  9:21     ` Dmitry A. Kazakov
  2008-08-20 14:44       ` Maciej Sobczak
  2008-08-20 13:19     ` Georg Bauhaus
  1 sibling, 1 reply; 20+ messages in thread
From: Dmitry A. Kazakov @ 2008-08-20  9:21 UTC (permalink / raw)

On Wed, 20 Aug 2008 01:59:52 -0700 (PDT), Maciej Sobczak wrote:

> On 20 Sie, 10:43, Maciej Sobczak <see.my.homep...@gmail.com> wrote:
> 
> Still valid question:
> 
>> 3. Why is buffered Ada.Text_IO as slow as non-buffered C's stdio? Who
>> is eating the 20x factor?

Because of page formatting, I suggest. You can use text streams instead.
[But don't use String'Write! Although, the newest GNAT optimized that,
AFAIK.]

BTW, buffering does not make I/O faster. It obviously does the opposite.
Certainly, you didn't mean the "last-mile" buffer held by the driver, which
is usually inaccessible. In some elder OSes you could directly write from a
user buffer mapped by the kernel, have record files etc. That was *fast*.
But then came C, Unix and Co., you know... (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-20  8:59   ` Maciej Sobczak
  2008-08-20  9:21     ` Dmitry A. Kazakov
@ 2008-08-20 13:19     ` Georg Bauhaus
  2008-08-20 14:41       ` Maciej Sobczak
  1 sibling, 1 reply; 20+ messages in thread
From: Georg Bauhaus @ 2008-08-20 13:19 UTC (permalink / raw)


Maciej Sobczak schrieb:

>> In particular: how can I *flush* the buffer?
> 
> By calling Ada.Text_IO.Flush.
> 
> Which means that Georg Bauhaus fell into the trap of my confusion. :-)

Sort of, but, as you say, the issue remains.
> Still valid question:
> 
>> 3. Why is buffered Ada.Text_IO as slow as non-buffered C's stdio? Who
>> is eating the 20x factor?


Text_IO is demonstrably slow. There are some speedy
shortcuts in the GNAT implementation of Put (e.g. Write_Buf).
But AFAICS there is (and has to be) a lot of protecting code
around the OS calls.

Using the following stupid programs for comparison,
and using strace, I get 3370 calls to write(2) from C,
but 50_000 from both C++ and Ada. Among other things open
to speculation (or open to inspection).  There are 4622
different lines in the 50_000 lines of output.

I think that if you have a formatted (constrained) string,
system I/O using fputs and flush might be a lot faster
(modulo threading issues).


#include <stdio.h>

int main()
{
    char s[68 + 1] =
"********************************************************************";

    for (int k = 0; k < 50000; ++k)
    {
        s[k % 68] = (char)(33 + k % 67);
        fputs(s, stdout), fputc('\n', stdout);
    }
    return 0;
}



#include <iostream>

int main()
{
    std::string s =
"********************************************************************";

    for (int k = 0; k < 50000; ++k)
    {
        s[k % 68] = static_cast<char>(33 + k % 67);
        std::cout << s << std::endl;
    }
    return 0;
}

with Ada.Text_IO;
procedure Ada_Wrt is
   S: String := (1 .. 68 => '*');
begin
   for K in 0 .. 50_000 - 1 loop
      S(1 + K rem 68) := Character'Val(33 + K rem 67);
      Ada.Text_IO.Put_Line(S);
   end loop;
end Ada_Wrt;

--
Georg Bauhaus
Y A Time Drain  http://www.9toX.de



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-20 13:19     ` Georg Bauhaus
@ 2008-08-20 14:41       ` Maciej Sobczak
  0 siblings, 0 replies; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-20 14:41 UTC (permalink / raw)

On 20 Sie, 15:19, Georg Bauhaus <rm.dash-bauh...@futureapps.de> wrote:

> Using the following stupid programs for comparison,
> and using strace, I get 3370 calls to write(2) from C,
> but 50_000 from both C++ and Ada.

The C++ part can be explained by the fact that you did not use it
properly.

>         std::cout << s << std::endl;

Try this instead:

std::cout << s << '\n';

The difference is that std::endl performs *two* actions on the given
stream: it inserts the newline and... flushes. If you intend to only
insert the newline character, do what you mean. It is even less
typing.

(yes, 99% of "benchmarks" available on the web are broken for the same
reason)

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-20  9:21     ` Dmitry A. Kazakov
@ 2008-08-20 14:44       ` Maciej Sobczak
  2008-08-20 15:39         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-20 14:44 UTC (permalink / raw)


On 20 Sie, 11:21, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> BTW, buffering does not make I/O faster. It obviously does the opposite.

You must be using some strange timer or a specially distorted
definition of I/O.

Buffering allows to minimize the overhead that is there per each
physical output operation. If you can produce the same amount of data
but with less operations, then the total overhead is smaller.

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-20 14:44       ` Maciej Sobczak
@ 2008-08-20 15:39         ` Dmitry A. Kazakov
  2008-08-21  7:10           ` Maciej Sobczak
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry A. Kazakov @ 2008-08-20 15:39 UTC (permalink / raw)


On Wed, 20 Aug 2008 07:44:48 -0700 (PDT), Maciej Sobczak wrote:

> On 20 Sie, 11:21, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
> 
>> BTW, buffering does not make I/O faster. It obviously does the opposite.
> 
> You must be using some strange timer or a specially distorted
> definition of I/O.
> 
> Buffering allows to minimize the overhead that is there per each
> physical output operation.

Buffering is used to make I/O in an asynchronous and/or conveyered way.
That does not make I/O faster in terms of latencies.

Any language buffer on top of numerous layered buffers, typical for an OS,
adds nothing, but overhead.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-20 15:39         ` Dmitry A. Kazakov
@ 2008-08-21  7:10           ` Maciej Sobczak
  2008-08-21  9:24             ` Dmitry A. Kazakov
  0 siblings, 1 reply; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-21  7:10 UTC (permalink / raw)

On 20 Sie, 17:39, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> Buffering is used to make I/O in an asynchronous and/or conveyered way.

No, it is not asynchronous. Nothing happens in the background, the
operations are only grouped. The group is (usually) transmitted in the
synchronous fashion.

I do not know what is "conveyered".

> That does not make I/O faster in terms of latencies.

It does make it faster in terms of throughput.

Note: I do not imply that throughput is more valuable for optimization
than latency - these can be different goals and usually are.

> Any language buffer on top of numerous layered buffers, typical for an OS,
> adds nothing, but overhead.

It can reduce the overhead that is associated with the number of
requests. System calls are not free and there is also a significant
latency of the medium that is better to be avoided (like network
roundtrips or disk seek times).

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-21  7:10           ` Maciej Sobczak
@ 2008-08-21  9:24             ` Dmitry A. Kazakov
  2008-08-21 20:54               ` Maciej Sobczak
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry A. Kazakov @ 2008-08-21  9:24 UTC (permalink / raw)

On Thu, 21 Aug 2008 00:10:52 -0700 (PDT), Maciej Sobczak wrote:

> On 20 Sie, 17:39, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
> 
>> Buffering is used to make I/O in an asynchronous and/or conveyered way.
> 
> No, it is not asynchronous. Nothing happens in the background, the
> operations are only grouped. The group is (usually) transmitted in the
> synchronous fashion.
> 
> I do not know what is "conveyered".

Pipelined processing. When you refer to throughput, then it is increased
only because of existence of hidden conveyers, which ultimately always
boils down to some asynchronously working elements.

>> That does not make I/O faster in terms of latencies.
> 
> It does make it faster in terms of throughput.
> 
> Note: I do not imply that throughput is more valuable for optimization
> than latency - these can be different goals and usually are.
> 
>> Any language buffer on top of numerous layered buffers, typical for an OS,
>> adds nothing, but overhead.
> 
> It can reduce the overhead that is associated with the number of
> requests. System calls are not free and there is also a significant
> latency of the medium that is better to be avoided (like network
> roundtrips or disk seek times).

Well, here we need to clarify what is the I/O end point. When you say
"system call" it presumes that the end point is the driver. Let us fix it.
Now, the next question is where coalescing/pipelining is to happen. See
where it goes? Is the driver's interface a stream of units or else, also,
of blocks of units?

Case A. There is no back door to the driver, you have only a stream. What
can buffering add? Nothing, but overhead.

Case B. There is a back door for pushing bigger chunks of units. Then use
it in your application and it will go *faster* than whatever buffered
interface on top of the same thing!

Note also that A and B usually refer different protocol layers. It is
common to put a stream layer onto something block-oriented beneath, and
reverse. That stream is buffering and necessarily an overhead. Buffering is
always overhead. We buy it only because the alternative is inaccessible,
like to do DMA transfers from the application. But a language library is in
the *same* position as the application, so buffering there would gain
nothing, *from* performance perspective.

Ada.Text_IO is slow because of the buffering it does in order to implement
a protocol (pages) which you do not need. Classic abstraction inversion
case.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-21  9:24             ` Dmitry A. Kazakov
@ 2008-08-21 20:54               ` Maciej Sobczak
  2008-08-21 21:27                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-21 20:54 UTC (permalink / raw)

On 21 Sie, 11:24, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> > I do not know what is "conveyered".
>
> Pipelined processing. When you refer to throughput, then it is increased
> only because of existence of hidden conveyers, which ultimately always
> boils down to some asynchronously working elements.

No, there is no asynchronous processing there (usually). There is
grouping that leads to smaller number of still synchronous operations.

> Well, here we need to clarify what is the I/O end point.

No, we do not need to, especially when it is already clear that we
would spiral down in an endless philosophy discussion about
definitions.

It is enough to get a clock and measure two simple test programs.
I can offer the test programs if needed.

> Ada.Text_IO is slow because of the buffering it does in order to implement
> a protocol (pages) which you do not need.

I do not see how paging could be related here.
Or at least I can imagine an implementation where the overhead of
bookkeeping pages is less than 15-20x.

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-21 20:54               ` Maciej Sobczak
@ 2008-08-21 21:27                 ` Dmitry A. Kazakov
  2008-08-22 11:53                   ` Maciej Sobczak
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry A. Kazakov @ 2008-08-21 21:27 UTC (permalink / raw)

On Thu, 21 Aug 2008 13:54:25 -0700 (PDT), Maciej Sobczak wrote:

> On 21 Sie, 11:24, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
> 
>>> I do not know what is "conveyered".
>>
>> Pipelined processing. When you refer to throughput, then it is increased
>> only because of existence of hidden conveyers, which ultimately always
>> boils down to some asynchronously working elements.
> 
> No, there is no asynchronous processing there (usually). There is
> grouping that leads to smaller number of still synchronous operations.

"Still synchronous operations" of items in the group? Come on, grouping
brings nothing if items are output synchronously to the caller. Coalescing
helps if and only if individual items in the group are output
asynchronously to the caller and to the receiver. In other words when the
interested parties re-synchronize only at the ends of a group. In which
state relatively to the output is the caller between the ends of a group?

> It is enough to get a clock and measure two simple test programs.
> I can offer the test programs if needed.

No thanks. We are actually paid for designing such tests, so we have plenty
of.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-21 21:27                 ` Dmitry A. Kazakov
@ 2008-08-22 11:53                   ` Maciej Sobczak
  2008-08-22 13:22                     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-22 11:53 UTC (permalink / raw)


On 21 Sie, 23:27, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> "Still synchronous operations" of items in the group? Come on, grouping
> brings nothing if items are output synchronously to the caller.

Of course it brings a lot - it minimizes the total overhead due to
smaller number of requests.

Ever tried to send each character in a separate mail instead of
sending one mail containing many characters?

> In which
> state relatively to the output is the caller between the ends of a group?

Why should I care? Sometimes I care only about throughput.

> > It is enough to get a clock and measure two simple test programs.
> > I can offer the test programs if needed.
>
> No thanks. We are actually paid for designing such tests, so we have plenty
> of.

Then why do you try so hard to distort this discussion?

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-22 11:53                   ` Maciej Sobczak
@ 2008-08-22 13:22                     ` Dmitry A. Kazakov
  2008-08-22 21:41                       ` Maciej Sobczak
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry A. Kazakov @ 2008-08-22 13:22 UTC (permalink / raw)

On Fri, 22 Aug 2008 04:53:56 -0700 (PDT), Maciej Sobczak wrote:

> On 21 Sie, 23:27, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
> 
>> "Still synchronous operations" of items in the group? Come on, grouping
>> brings nothing if items are output synchronously to the caller.
> 
> Of course it brings a lot - it minimizes the total overhead due to
> smaller number of requests.
> 
> Ever tried to send each character in a separate mail instead of
> sending one mail containing many characters?

It seems that you didn't read my posts. One last try. In your example, when
characters of a message are sent *synchronously* (assuming E-mail as the
transport layer, no back doors, etc) then each single character has to be
sent as a reply to the answer to the earlier mail. The very ability to send
multiple characters in one mail means that they are sent in parallel =
asynchronously. Compare it to parallel vs. serial communication. For the
rest see 

   http://en.wikipedia.org/wiki/Buffer_%28telecommunication%29

Note the category of the article, read the purposes of buffering.

>> In which
>> state relatively to the output is the caller between the ends of a group?
> 
> Why should I care?

Because it debunks your claim that the transfer of individual items is
synchronous. It is asynchronous, when makes sense.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-22 13:22                     ` Dmitry A. Kazakov
@ 2008-08-22 21:41                       ` Maciej Sobczak
  2008-08-23 10:25                         ` Dmitry A. Kazakov
       [not found]                         ` <Q7adnfmCI6Ly6S3VnZ2dnUVZ_jOdnZ2d@earthlink.com>
  0 siblings, 2 replies; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-22 21:41 UTC (permalink / raw)

On 22 Sie, 15:22, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> It seems that you didn't read my posts.

I've read them, but did not understand.

> One last try. In your example, when
> characters of a message are sent *synchronously* (assuming E-mail as the
> transport layer, no back doors, etc) then each single character has to be
> sent as a reply to the answer to the earlier mail.

Then we have a different notion of "synchronously".
When I write something to the file, the operation is synchronous when
the program *waits* for the transfer to complete.

> The very ability to send
> multiple characters in one mail means that they are sent in parallel =
> asynchronously.

Then we have a different notion of "asynchronously".
When I write something to the file, the operation is asynchronous when
the program can continue while the transfer is being handled.

And we have also a different notion of "parallel".
When I send a mail, it is transferred serially over a network cable.
The longer is the mail the longer it takes (hint: with parallel
communication the time of transmission would not depend on the number
of characters in the mail, since they would be sent, well, in
parallel).

> Compare it to parallel vs. serial communication.

Nothing to compare.

> For the
> rest see
>
>    http://en.wikipedia.org/wiki/Buffer_%28telecommunication%29

Short, but nice. Especially point d).

> Note the category of the article, read the purposes of buffering.

Yes, the purpose d) is what I'm talking about. I use buffers to group
data into smaller number of bigger units. This is where the
performance gain comes from.

> Because it debunks your claim that the transfer of individual items is
> synchronous. It is asynchronous, when makes sense.

No, it is synchronous, since the program has to wait until the
transfer completes (if the transfer is triggered at all - the buffer
makes that happen less frequently).

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-22 21:41                       ` Maciej Sobczak
@ 2008-08-23 10:25                         ` Dmitry A. Kazakov
  2008-08-23 13:41                           ` Steve
       [not found]                         ` <Q7adnfmCI6Ly6S3VnZ2dnUVZ_jOdnZ2d@earthlink.com>
  1 sibling, 1 reply; 20+ messages in thread
From: Dmitry A. Kazakov @ 2008-08-23 10:25 UTC (permalink / raw)

On Fri, 22 Aug 2008 14:41:18 -0700 (PDT), Maciej Sobczak wrote:

> On 22 Sie, 15:22, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
> 
>> One last try. In your example, when
>> characters of a message are sent *synchronously* (assuming E-mail as the
>> transport layer, no back doors, etc) then each single character has to be
>> sent as a reply to the answer to the earlier mail.
> 
> Then we have a different notion of "synchronously".
> When I write something to the file, the operation is synchronous when
> the program *waits* for the transfer to complete.

The transfer of the group, not the transfers of the individual items of.

>> The very ability to send
>> multiple characters in one mail means that they are sent in parallel =
>> asynchronously.
> 
> Then we have a different notion of "asynchronously".
> When I write something to the file, the operation is asynchronous when
> the program can continue while the transfer is being handled.

That is the same notion. Asynchronous = not synchronous. The semantics of a
transfer of a group of items does not depend on the order and exact timing
of the transfers of individual items. If any, because they might be not
transferred at all. Consider protocols which recode the group, digital
fountains, etc.

> And we have also a different notion of "parallel".
> When I send a mail, it is transferred serially over a network cable.

Wrong, they are printed and then sent per pigeon post. 

You have defined the transport layer as E-mail. That's it. Don't make
suggestions about how E-mail might work, there are lots of ways.

> The longer is the mail the longer it takes (hint: with parallel
> communication the time of transmission would not depend on the number
> of characters in the mail, since they would be sent, well, in
> parallel).

Nope I have a huge rack of multiplexed modems installed in the cellar.

You again make assumptions about possible implementations of the transport
layer, which weren't there when you presented the example. If the transport
were rather a synchronous bytes stream, then buffering obviously would
bring *nothing* to the throughout.

>> ï¿½ ï¿½http://en.wikipedia.org/wiki/Buffer_%28telecommunication%29
> 
> Short, but nice. Especially point d).

Right, it says "operated on as a unit", read my previous posts. Who
operates them as "a unit"? You need an independent asynchronous agent
capable to do so, otherwise it is not a unit. If you have such an agent,
and you can talk to it in terms of such units, then that is *without*
buffering, and it is faster than anything else. The purpose of d) is to
collect, it is merely an adapter between two protocols. Layered protocols
are always slower.

>> Note the category of the article, read the purposes of buffering.
> 
> Yes, the purpose d) is what I'm talking about. I use buffers to group
> data into smaller number of bigger units. This is where the
> performance gain comes from.

No, it is where you lose performance, because I just send bigger units
directly.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-23 10:25                         ` Dmitry A. Kazakov
@ 2008-08-23 13:41                           ` Steve
  2008-08-23 14:33                             ` Dmitry A. Kazakov
  0 siblings, 1 reply; 20+ messages in thread
From: Steve @ 2008-08-23 13:41 UTC (permalink / raw)

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
news:yiw2f938342v.xzb47swyx5h4$.dlg@40tude.net...
> On Fri, 22 Aug 2008 14:41:18 -0700 (PDT), Maciej Sobczak wrote:
>
>> On 22 Sie, 15:22, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>> wrote:
>>
>>> One last try. In your example, when
>>> characters of a message are sent *synchronously* (assuming E-mail as the
>>> transport layer, no back doors, etc) then each single character has to 
>>> be
>>> sent as a reply to the answer to the earlier mail.
>>
>> Then we have a different notion of "synchronously".
>> When I write something to the file, the operation is synchronous when
>> the program *waits* for the transfer to complete.
>
> The transfer of the group, not the transfers of the individual items of.
>

Dmitry,

I have read enough of your posts on this newsgroup to know you're not a 
troll, but is sure hard to tell from reading this thread.

In my experience (theory aside) sending one character a time to an OS is 
considerably slower than buffering the data and sending blocks of data.

Several years ago I rewrote a driver on one of our system that we used to 
communicate serially (using RS232) with a PLC (Programmable Logic 
Controller).  The driver was originally written to make separate calls to 
the OS for each character sent to the PLC.  The original implementation 
utilized approximately 15% of the CPU.  When I re-wrote the driver to buffer 
the characters into blocks of up to 128 characters (defined by the PLC 
protocol) and make one OS call for the buffered data, the CPU utilization 
dropt to less than 1% of the CPU.

This behavior makes perfect sense to me because for each call to the OS a 
buffer is allocated containing the data to be transmitted and placed in a 
queue for the OS.  The buffer itself contains more than just the data to be 
sent, it includes some overhead, sometimes significant in size.  The 
addition of the buffer to the OS queue often includes considerable overhead, 
context switches, mutexes, etc.  When the number of characters in the buffer 
is increased the overhead is not significantly increased.

Sure, if you're talking directly to hardware hardware that only handles one 
character at a time then buffering and unbuffering data adds overhead.  But 
it is rare in these days to talk directly with the hardware.  Even the 
simpler systems often use a kernel or OS that makes buffering worthwhile.

If you're using TCP/IP to send data, if you're going to send a bunch of data 
at a time it would be silly to send one byte at a time.  IP has considerable 
overhead for each block.  You should try to include as much data as possible 
to minimize the number of packets sent and minimize the overhead.

I find it interesting that you seem to be arguing that it is always better 
to not bufffer, when the original poster has indicated that he has tried 
both buffered and unbuffered approaches and observed thant unbuffered was 
considerably slower on his system.  Either you are miscommunicating or you 
are just plain wrong.

Regards,
Steve

> -- 
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
  2008-08-23 13:41                           ` Steve
@ 2008-08-23 14:33                             ` Dmitry A. Kazakov
  0 siblings, 0 replies; 20+ messages in thread
From: Dmitry A. Kazakov @ 2008-08-23 14:33 UTC (permalink / raw)


I have repeated the argument, provided all possible explanations and
examples more than three times.

Since it goes in circles, let's put an end to this.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: File output and buffering
       [not found]                         ` <Q7adnfmCI6Ly6S3VnZ2dnUVZ_jOdnZ2d@earthlink.com>
@ 2008-08-23 22:00                           ` Maciej Sobczak
  0 siblings, 0 replies; 20+ messages in thread
From: Maciej Sobczak @ 2008-08-23 22:00 UTC (permalink / raw)

On 23 Sie, 22:34, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:

> > Then we have a different notion of "synchronously".
> > When I write something to the file, the operation is synchronous when
> > the program *waits* for the transfer to complete.
>
>         If I may slip in, since this thread has wandered into comparisons
> that even I can't follow...
>
>         Define "complete"

So that the program can immediately terminate and still have the data
reliably stored.
Think about log files of various kinds (including database write ahead
logs) and the importance of having something confirmed.

>         Most I/O systems I've encountered are buffered by the OS...

Of course - and not only that. There are buffers everywhere, even in
hard drives. The semantics of output operation from the program point
of view can be, however, described in terms of reasonably understood
best-effort or pushing data as far as it makes sense. For example, if
the hard drive can guarantee reliable storage at the level of its own
buffers, then it can confirm reception of the data without actually
storing them on plates. From the point of view of the program, the I/O
operation can be considered as finished, because from that point
nothing can mess things up.

> As far
> as an application is concerned, an I/O "write" operation is "complete"
> when the OS accepts the packet for buffering.

Exactly - provided that the packet was *copied* to OS buffers as
opposed to just passing pointer to programs data.

In practical terms:

Ada.Text_IO.Put_Line (File, "Hello");
Ada.Text_IO.Flush (File);
--  here we can crash without losing data

I consider the output operation above (triggered or ensured by Flush)
to be *synchronous with respect to the program*. When the Flush
operation returns the control back to the program, the data is already
stored in the external file (as AARM calls it), whatever that means,
even if the "external file" includes several layers of buffers. From
the program's perspective, it is "done".

If you want to contrast the above with asynchronous version, the
output operation can be initiated by the program but the program would
be allowed to continue without any guarantee related to the amount of
data being stored (and with some provisions to get the status later
on).

Short coverage of what all this means in the context of databases:

http://www.orafaq.com/node/93

It is really well written.

I hope that you get what I'm trying to say here. Well, at least I'm
sure that I'm not inventing anything new.

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-08-23 22:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-19 20:27 File output and buffering Maciej Sobczak
2008-08-20  6:45 ` Georg Bauhaus
2008-08-20  8:43 ` Maciej Sobczak
2008-08-20  8:59   ` Maciej Sobczak
2008-08-20  9:21     ` Dmitry A. Kazakov
2008-08-20 14:44       ` Maciej Sobczak
2008-08-20 15:39         ` Dmitry A. Kazakov
2008-08-21  7:10           ` Maciej Sobczak
2008-08-21  9:24             ` Dmitry A. Kazakov
2008-08-21 20:54               ` Maciej Sobczak
2008-08-21 21:27                 ` Dmitry A. Kazakov
2008-08-22 11:53                   ` Maciej Sobczak
2008-08-22 13:22                     ` Dmitry A. Kazakov
2008-08-22 21:41                       ` Maciej Sobczak
2008-08-23 10:25                         ` Dmitry A. Kazakov
2008-08-23 13:41                           ` Steve
2008-08-23 14:33                             ` Dmitry A. Kazakov
     [not found]                         ` <Q7adnfmCI6Ly6S3VnZ2dnUVZ_jOdnZ2d@earthlink.com>
2008-08-23 22:00                           ` Maciej Sobczak
2008-08-20 13:19     ` Georg Bauhaus
2008-08-20 14:41       ` Maciej Sobczak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox