comp.lang.ada
 help / color / mirror / Atom feed
From: "Warren W. Gay VE3WWG" <ve3wwg@cogeco.ca>
Subject: Re: A Customer's Request For Open Source Software
Date: Fri, 05 Sep 2003 13:06:02 -0400
Date: 2003-09-05T13:06:02-04:00	[thread overview]
Message-ID: <Ln36b.22618$su.611773@news20.bellglobal.com> (raw)
In-Reply-To: <wvbrptihkolk.fsf@sun.com>

I had written a detailed reply to this by my crappy
Netscape 7 had an error posting via NNTP, and threw
the whole reply away (major grumbling here about that!)
So this will be a very brief repeat of the essential
items ;-)

olehjalmar kristensen - Sun Microsystems - Trondheim Norway wrote:
>>>>>>"WWGV" == Warren W Gay VE3WWG <ve3wwg@cogeco.ca> writes:
>     WWGV> You might wonder why buffered "block" devices are not good
>     WWGV> enough for the purpose. I can't answer to the specifics, but
>     WWGV> only that database engines are in a position to better manage
>     WWGV> the cache based upon what they "know" needs to be done. Another
>     WWGV> important reason to control caching details is that when a
>     WWGV> transaction is committed, you need to guarantee that the
>     WWGV> data is written to the disk media (or can be recovered if
>     WWGV> the database is to be restarted at that point).
> 
> The cache management will typically be the same with both OS files and
> raw devices. The main difference is as you say, that you have better
> control over the raw device with respect to the layout of your tables,
> so you may be able to optimize disk writes better.

Control is where it is at. With block devices you can only:

1) leave it up to the O/S when to flush out writes
2) call sync(2) and have all dirty blocks written out
3) call fsync(3) and have all of your own blocks related
    to the file descriptor written out.

With a block device, a return from write(2) does not guarantee
that your data has been recorded on the magnetic media.

With a raw device, it should be a guarantee, with the following
exceptions:

1) Smart caching IDE like disk drives
2) EMC like subsystems with electronic cache


> Most operating systems will allow you to wait until the data are
> even if you are using the file system, so there is no difference with
> respect to the durability of data.

With UNIX, a return from write(2) is only a promise that
it will someday be written. No specific timeline is
guaranteed (see above). I don't expect it is much different
with win32 systems.

> All portable DBMS need to do their own cache management, so if you are
> running on files, both the OS and the DBMS cache the same blocks,
> thereby wasiting RAM. Also, the replacement strategies may be
> conflicting, resulting in suboptimal performance.

Yes, if the DBMS is expecting a raw device, and doing its own
caching, then using a block device (or file system file) is
asking for double-buffering. Very bad indeed for performance.

>     WWGV> The database
>     WWGV> must be recoverable at any given point anyhow, and this
>     WWGV> usually requires fine grained control over physical writes
>     WWGV> to the media. This aspect and performance means that the
>     WWGV> engine must balance performance with reliability (persistance),
>     WWGV> which are conflicting goals when using disk.
> 
>     WWGV> It is this last area where oodles of persistent fast memory
>     WWGV> (instead of disk), can make a world of difference. In this
>     WWGV> case persistence = memory performance, which of course is
>     WWGV> where the win is. If disks became obsolete (one can hope),
>     WWGV> then I could see that new database engine design (internals)
>     WWGV> will become much different than what it is today.  Certainly,
>     WWGV> many of the present compromises would be eliminated.
> 
>     WWGV> -- 
>     WWGV> Warren W. Gay VE3WWG
>     WWGV> http://home.cogeco.ca/~ve3wwg
> 
> Possibly, but keep in mind that most current DBMS's are already CPU
> bound when it comes to throughput. Fast disks and techniques like
> group commit ensures that the log is rarely a bottleneck, and all you
> need to recover is in the log. 

This is primarily where I disagree: I don't disagree that there
are fast disks and subsystems today. But keep in mind two things:

1) There are oodles of traditional slow disk based databases around
    still
2) The database engines were designed to handle the difficult case
    of #1.

Designing a new RDBMS would be much simpler, if you could ignore
the performance cost of putting data to the magnetic media. But
I would venture that all of the popular database vendor offerings
available today, were designed to get performance out of slow
disk I/O technologies.

Otherwise, they could just insist on
this new technology, and make life easier forthemselves from
a design and enhancement point of view.

> What you can get with large non-volatile memory is much lower latency
> per transaction, that is, the response time for a single transaction
> can be dramatically lower, even if the throughput in terms of TPS
> stays the same.

It also buys an opportunity to re-design the database engine,
because now you can eliminate:

  1) the cost of slow disk I/O
  2) the worries about data persistance (no disk
     write operations need to be issued)

This opens the doors to a whole new set of database engine
design dynamics.

> And in case you were wondering if you could do away with the log, the
> answer is yes, if you create some kind of multi-version system. But
> you always need some way of keeping track of the history, so you can
> roll back your changes in case a transaction decides to abort for some
> reason. Actually, one may say that the log IS the database, all the
> rest is there only to give faster access to the latest version.

Going to fast, persistent memory, opens a lot of doors indeed.
-- 
Warren W. Gay VE3WWG
http://home.cogeco.ca/~ve3wwg




  reply	other threads:[~2003-09-05 17:06 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-21 12:35 A Customer's Request For Open Source Software Marin David Condic
2003-08-21 12:46 ` Stephane Richard
2003-08-22 12:32   ` Marin David Condic
2003-08-22 13:38     ` David Holm
2003-08-22 14:33       ` Stephane Richard
2003-08-22 16:40         ` David Holm
2003-08-23 13:18           ` Marin David Condic
2003-08-25  8:27             ` Preben Randhol
2003-08-25 20:50               ` Chad R. Meiners
2003-08-26  1:28                 ` Stephane Richard
2003-08-26  9:27                   ` Preben Randhol
2003-08-26 17:06                     ` Chad R. Meiners
2003-08-26 12:51                   ` Marin David Condic
2003-08-26 18:25                     ` Wes Groleau
2003-08-27 10:05                     ` Dave Head
2003-08-27 12:11                       ` Marin David Condic
2003-08-27 19:45                       ` [off-topic] military medicine Wes Groleau
2003-08-30 17:33               ` A Customer's Request For Open Source Software Jan Kroken
2003-09-01 10:03                 ` Preben Randhol
2003-09-17 18:21                   ` Jan Kroken
2003-08-22 16:46         ` Larry Kilgallen
2003-08-22 17:09           ` Stephane Richard
2003-08-22 18:03             ` Larry Kilgallen
2003-08-25  8:33             ` Preben Randhol
2003-08-23 13:05       ` Marin David Condic
2003-08-30  5:58     ` Adrian Hoe
2003-08-30 11:09       ` Stephane Richard
     [not found]         ` <rti721-4p3.ln1@beastie.ix.netcom.com>
2003-08-31 18:28           ` Christopher Browne
2003-09-02 15:44             ` Adrian Hoe
2003-08-30 13:18       ` Marin David Condic
2003-08-21 20:46 ` Warren W. Gay VE3WWG
2003-08-21 21:12   ` Stephane Richard
2003-08-22 20:26     ` Warren W. Gay VE3WWG
2003-08-22  3:09   ` Hyman Rosen
2003-08-22 12:45     ` Marin David Condic
2003-08-22 20:46     ` Warren W. Gay VE3WWG
2003-08-22 12:37   ` Marin David Condic
2003-08-22 18:59     ` Warren W. Gay VE3WWG
2003-08-22 20:59       ` Warren W. Gay VE3WWG
2003-08-23 13:24         ` Marin David Condic
2003-08-25 19:35         ` Brien
2003-08-25  8:52   ` Preben Randhol
2003-08-25 16:32     ` Warren W. Gay VE3WWG
2003-08-26  9:19       ` Preben Randhol
2003-08-26 21:01         ` Warren W. Gay VE3WWG
2003-08-27  5:15           ` Preben Randhol
2003-08-21 22:53 ` David Holm
2003-08-21 23:11   ` Stephane Richard
2003-08-22  0:04     ` David Holm
2003-08-22  0:17       ` Stephane Richard
2003-08-22  0:19       ` Stephane Richard
2003-08-22 17:49         ` Robert I. Eachus
2003-08-22 20:56           ` Warren W. Gay VE3WWG
2003-08-23  2:29             ` Alexander Kopilovitch
2003-08-24  2:54               ` Robert I. Eachus
2003-08-24  3:11                 ` Matthew Heaney
2003-08-24 14:57                 ` Marin David Condic
2003-08-24 16:31                   ` Robert I. Eachus
2003-08-25 12:37                     ` Marin David Condic
2003-08-25 14:08                       ` Robert I. Eachus
2003-08-25 21:07                         ` Alexander Kopilovitch
2003-08-26 18:34                       ` Christopher Browne
2003-08-27 12:21                         ` Marin David Condic
2003-08-27 13:37                           ` Warren W. Gay VE3WWG
2003-08-28 22:04                 ` chris
2003-08-29  1:02                   ` Robert I. Eachus
2003-08-29 10:46                     ` Larry Kilgallen
2003-08-29 21:15                       ` Robert I. Eachus
2003-08-29 16:10                     ` Jon S. Anthony
2003-08-30 19:01                     ` Alexander Kopilovitch
2003-08-30 22:57                       ` Robert I. Eachus
2003-08-31 23:04                         ` Alexander Kopilovitch
2003-09-01  2:09                           ` Robert I. Eachus
2003-09-01 16:29                             ` Alexander Kopilovitch
2003-09-01 21:22                               ` Robert I. Eachus
2003-09-02  2:12                                 ` Christopher Browne
2003-09-02  4:16                                   ` Ludovic Brenta
2003-09-02 14:53                                     ` Christopher Browne
2003-09-02 19:52                                       ` Charlie Spitzer
2003-09-03  2:18                                     ` Robert I. Eachus
2003-09-12 12:21                                   ` Jacob Sparre Andersen
2003-09-12 17:49                                     ` Robert I. Eachus
2003-09-13  5:49                                       ` Edward Rice
2003-09-13 18:59                                         ` Robert I. Eachus
2003-09-12 20:45                                     ` Christopher Browne
2003-09-02 16:16                                 ` Alexander Kopilovitch
2003-09-03  2:36                                   ` Robert I. Eachus
2003-09-01 21:44                             ` Larry Kilgallen
     [not found]                             ` <e2e5731a.0309010Organization: LJK Software <og4DamrQ9AuX@eisner.encompasserve.org>
2003-09-03  3:02                               ` Robert I. Eachus
2003-09-03 16:57                                 ` Warren W. Gay VE3WWG
2003-09-04  7:19                                   ` olehjalmar kristensen - Sun Microsystems - Trondheim Norway
2003-09-05 17:06                                     ` Warren W. Gay VE3WWG [this message]
2003-09-05 19:27                                     ` Robert I. Eachus
2003-09-03 12:38                             ` Larry Kilgallen
2003-09-06  2:10                             ` Larry Kilgallen
     [not found]                             ` <e2e5731a.03090Organization: LJK Software <D1upWhxUuOLF@eisner.encompasserve.org>
2003-09-06 20:34                               ` Warren W. Gay VE3WWG
2003-09-06 23:36                             ` Larry Kilgallen
     [not found]                       ` <hli721-4p3.ln1@beastie.ix.netcom.com>
2003-08-31  2:35                         ` Robert I. Eachus
2003-08-22 20:49       ` Warren W. Gay VE3WWG
2003-08-23  1:47     ` jim hopper
2003-08-23  1:47       ` Ludovic Brenta
2003-08-25  9:12         ` Preben Randhol
2003-08-25  8:59     ` Preben Randhol
2003-08-25  8:02 ` Preben Randhol
2003-08-25 20:55   ` Chad R. Meiners
2003-08-26  9:28     ` Preben Randhol
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox