Re: Verified compilers? - Dmitry A. Kazakov

comp.lang.ada
 help / color / mirror / Atom feed

From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Verified compilers?
Date: Sun, 11 Mar 2012 10:47:36 +0100
Date: 2012-03-11T10:47:36+01:00	[thread overview]
Message-ID: <1oln2mjlozzh5$.1mxrd97lrgndo.dlg@40tude.net> (raw)
In-Reply-To: 9s1s7tF6pcU1@mid.individual.net

On Sat, 10 Mar 2012 22:35:09 +0200, Niklas Holsti wrote:

> On 12-03-10 12:36 , Dmitry A. Kazakov wrote:
>>
>> 1234 is an example of a language (of decimal literal).
> 
> Or an example of octal literals, or hexadecimal literals, or telephone 
> numbers. The meaning is not intrinsic in the string; it must be defined 
> by other means.

Which only supports my point.

>> The point is that
>> the meaning of the sentence in that language does depend on how you would
>> parse it.
> 
> Is there a "not" missing in that sentence?

Yes, sorry.

>> An OCR system could simply compare pictures. What human brain
>> does, nobody knows. An Ada program would yield a machine number. Where is a
>> tree? Which tree?
> 
> It is not intrinsic in the string.

Yes, meaning is a mapping of which string is the argument. The way this
mapping is computed (if computable) is irrelevant.

> The grammar, and the corresponding 
> parse trees, are tools for defining the strings in the language, and the 
> meaning of each string.

No, only the former. Example:

   <numeral> ::= <decimal-digit>[<numeral>]

This does not define the meaning of 1234. It may hint the human reader that
<numeral> has something to do with numbers. But numbers and their
properties are defined elsewhere (in mathematics). The mapping: <numeral>
corresponds to the number from the set X identified by the means Y must be
stated additionally to the grammar.

>>     +
>>    / \
>>   4  +
>>      / \
>>     *
>>    / \
>>   3  10
>>
>> This one? There is a countable infinity of such trees. A German would say
>> that this is twelve hundred, four and thirty (12,4,3). A way different
>> structure then?
> 
> Yes, the grammar of (decimal) literals can be written in several ways, 
> and all of these grammars can be attributed to define the same meaning 
> (numerical value) of the string (of course they can also be attributed 
> in different ways, to define different meanings). So what? I never 
> claimed that the grammar is unique, just that grammars and parse trees 
> are IMO the unique known method to define the meanings of complex sentences.

1. If not unique, then you cannot claim that the structure of a particular
tree is inherent to the language. It is not.

2. If same grammar may be used with different meanings, then the grammar
does not define any of them.

>> The meaning has nothing to do with parsing.
> 
> Ok, if you can define the meaning in some other way, and not by parsing. 
> How do you define it?

   http://en.wikipedia.org/wiki/Positional_notation

does not even use the word "grammar" when describes positional notation.
(Of course parsing is involved, e.g. your browser will parse the page in
order to render it. But that defines neither the meaning of this page nor
the meaning of numerals.) Maybe our disagreement lies in that you think
that all definitions have to be constructive? They should not. Anyway I can
always skip parsing and define meaning by brute force:

"1" means 1
"2" means 2
"3" means 3
...

>> In fact parsing has to be validated against the meaning.
> 
> Why? If you define the meaning without using parsing, why is that 
> validation necessary?

Because the meaning exists independently on the parsing, thus I must show
that the result of translation corresponds to the meaning. "1234" must be
translated into 1234, because that is the meaning of "1234". Not otherwise.

>>>>> How else could you explain why "2*3+4*5" means 26, and not 70, nor
>>>>> something else?
>>>>
>>>> Explain? It is a subject of proof. There are more than one way to do it.
>>>
>>> Operator precedence, perhaps? See earlier post...
>>
>> I didn't mean variations of operator precedence generating different
>> results. I meant that 2*3+4*5 = 26 is a statement to be proven. It is
>> either true or false (or undecidable). 26 is the value of the formula.
> 
> The truth of the formula depends on the meaning assigned to the 
> expressions in the formula. Before one can prove anything, one must 
> define the proof rules, including the rules for modifying sentences 
> (expressions) while preserving their meaning.

Yes.

> The latter rules are 
> derived from the rules that define the meaning of an expression based on 
> the structure (parse tree) of the expression, and the properties of the 
> operators in the expression.

No, it is based on algebraic properties of numbers.

> Note that the tree is not yet the meaning, but the meaning of the tree 
> (and thus the meaning of the sentence) is defined by structural 
> inductive rules over the tree.

Sure, tree is just another representation. The meanings of the string and
of its tree is supposed to be same. Neither is the definition of the
meaning, they both just have it in the given context. Moreover, this
meaning will be preserved during following stages of compilation. The
object code will have it, the executable code will have it too. Keeping
this meaning intact is what makes the compiler correct.

>>>>> In practice, the role of the parse/derivation tree appears in the fact
>>>>> that parser/compiler generators let you attach "semantic actions" to
>>>>> grammar productions, and often help you to consider larger parts of the
>>>>> derivation by passing information around the parse/derivation tree in
>>>>> well-defined ways. Attribute grammars are one example.
>>>>
>>>> Parsing tree is just a less ambiguous representation of the program.
>>>
>>> "Just"? It is the first step in the assignment of meaning to a sentence.
>>
>> It could be a step of evaluating the sentence, validating it etc. But the
>> meaning of the sentence is already assigned by the sentence itself
>> (provided the sentence is legal).
> 
> No, meanings are not inherent in sentences. There must be a definition 
> of the meaning of an arbitrary sentence: a function "meaning" from the 
> sent of sentences to the set of meanings.

This is precisely same. "A sentence in the language X" reads as follows: a
set of strings + a domain set + a mapping from the former to the latter.
Without that it is not a sentence.

> For use in computation, this 
> function must be defined in a practically computable way.

No, because the domain set in most cases is incomputable or ill-defined.

>>>> You don't even need to generate it in order to evaluate 2*3+4*5
>>>> using a finite state machine.
>>>
>>> You cannot do it with a finite state machine (assuming you want to allow
>>> literals of any size and expressions of any length). You need a stack of
>>> numbers, too.
>>
>> First, stack is not tree.
> 
> True, but it is the simplest structure that lets finite state machines 
> handle this kind of computation.

That presumes a certain class of machines. There could be others, computing
the result without stack, without any memory, for that matter, just per
transitions.

OT. I wondered what would happen if the hardware development took other
path. I mean 20 years ago, computing was cheaper than volatile storage in
relation to what we have now. Consider it shifting back. It is possible
when molecular or optical, or whatever architectures evolve. This might
require approaches quite different from ones presently in use. This is true
for a shift in the opposite direction as well.

>> Second, regarding finiteness, that is of course
>> true, just because an infinite language cannot be mapped (injectively) onto
>> a finite number of states. But this has nothing to do with the "structure."
>> Once you limit the language you can parse it by brute force, no trees, no
>> stacks, just strcmp.
> 
> Eh? I don't understand what you are talking about. Finite languages? Not 
> worth discussing, IMO.

All computers are finite and all programming languages are effectively
finite both in the terms of the source code length and the number of states
compiler programs may have.

>>> The generation of parsing machines from grammars is one of the main
>>> achievements of formal language and parsing theory. Why sweat to make
>>> the machine manually, when you can do it automatically?
>>
>> Because:
>>
>> 1. You need not to generate anything. All infix expressions can be parsed
>> by the same engine controlled by a few tables.
> 
> Infix expressions are a simple case.

And the rest is just trivial recursive descent.

>> 2. The efforts are immense. I already gave an example of Ada grammar
>> capturing operator precedence, association constraints (e.g. 2**+6), left
>> to right, right to left etc. It is a huge work.
> 
> Hardly huge. There are only 6 levels: expression, relation, 
> simple_expression, term, factor, primary.

For a grammar, it is, because it is hard to reason about productions.

>> And you also need to verify
>> if the mess of productions you have written indeed means that "unary + is
>> illegal as power." Because productions is NOT the meaning, they are just a
>> program.
> 
> And have you created tables for parsing Ada? How long did it take you, 
> and are you sure they are correct?

I knew it, because a table has only symbol and its left and right
precedence levels. These are directly described in LRM. Association
constraints are verified by a single function, which takes left and right
operators. When left is ** and right is unary +, that is an error. And note
how simple it is to generate a meaningful error message in this case.

>> 3. You need tools, you need integration of the generated code into your
>> project. You need to keep track of changes of the grammar program, of the
>> generated program, of the rest.
> 
> True. But many SW projects already use one or more such tools: 
> preprocessors, GUI code generators, DB interface generators, ...

Ah that sort of argument: people already keep on dying in car accidents.
Why should we bother tightening the belts? (:-))

>> Formally speaking, the language being parsed consists of all possible
>> strings. Nothing is a syntax error. If the input is illegal in the language
>> 1 (e.g. Ada), then in the language 2 it is legal and yields "closing
>> bracket is missing for the opening bracket found at xx:yy."
> 
> I don't understand your point. Are you being ironical?

Not at all. The point is that 7#FF# need to be parsed although illegal. You
have to make your parser tolerant to the level typical to a human reader.
This necessarily means that the grammar being parsed is not the original
grammar, but one relaxed to accept the malformed literal above as well as
illegal formula 2**+6, for which in the original grammar there is no
production.

>>> If you want to describe what language your hand-written parser accepts,
>>> and what its processing means, I think you have to present the grammar
>>> and describe the meaning of the sentences.
>>
>> Or, I can present the tables with the same effect.
> 
> I'll bet any users would prefer the grammar to the tables.

Maybe. They are indoctrinated in YACC etc as they are in C and Java. Users
do prefer Java, but we both agree that Java is inferior to Ada.

I have to write compilers from domain-specific languages frequently under
very tight time constraints. Nobody pays for a compiler in these days.
Customers just expect the job done as a part of a larger project. They
don't even aware that this requires writing a compiler.

Simplicity is one advantage of the table-driven approach. Another is reuse.
You can reuse not only the parser itself but 80% of the code around it from
project to project. That is because it is Ada and because recursive descent
parsers ideally map onto object/procedural decomposition.

>>>>>> On one hand these classes do not capture real differences
>>>>>> between syntaxes of programming languages.
>>>>>
>>>>> What sort of "real differences" do you mean?
>>>>
>>>> Like between C++ and Ada.
>>>
>>> Sorry, not clear yet. I'm sure you don't mean just that one uses {} and
>>> the other begin-end, but what do you mean?
>>
>> I mean that a human reader instantly recognizes that the language is "like
>> C++" or "like Ada".
> 
> Sorry, not clear yet. Is the human reader looking at some C++ code and 
> some Ada code, or at a C++ grammar and an Ada grammar?

At the code sample, the only recognizable thing about a written grammar is
- dear God, why should it happen to me!

>> This means that somewhere in the brain there is a
>> picture, a class of syntaxes, we, as humans, are capable to operate.
> 
> I suspect that if you show some non-programmers examples of C++ and Ada 
> code, they will find both equally incomprehensible.

Nothing in Plato's sense, the brain above is one of a trained programmer
who was exposed to procedural languages.

>> If anybody were presented by a sample from this or that class, he would not
>> be able to figure out what it were. It is like the FFT of an image. Can you
>> tell if the picture shows a cat or a girl by looking at its spectrum? Ergo,
>> spectral analysis is poor for scene recognition.
> 
> It is a poor way for humans to recognize scenes, sure. But it can be 
> useful in machine vision and recognition applications.

Yes, but scenes are defined in terms of human perception, so you have to
play by the rules. Compilers are similar in that respect because a human
programmer is always involved.

>> (The difference is though
>> spectral analysis is useful for other things, while grammars are useful for
>> nothing... (:-))
> 
> Going back to the differences between C++ and Ada, one systematic 
> difference that comes to mind is the "overloading" of the same terminal 
> symbols in C++, especially the closing braces, '}'. Writing too few or 
> too many closing braces can lead to confusing error messages. In 
> contrast, Ada uses different terminal symbol combinations (end if, end 
> loop, end case, end <identifier>) which usually lets the compiler give 
> more informative error messages.
> 
> I don't think this corresponds to a difference in the formal language 
> classes of C++ and Ada, but I think the difference could be observed by 
> som kind of "grammar metrics" (or "language metrics", if you prefer) 
> that would give Ada a better score than C++, on this metric. But such 
> grammar metrics have not been developed AFAIK.

The problem is that we cannot rationalize this. We just don't know how
human brain works. Formal languages failed to reflect human perception of a
language necessary for building a compiler. Other failures, alleged or not,
we have already discussed.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

next prev parent reply	other threads:[~2012-03-11  9:47 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-21 15:42 Verified compilers? Yannick Duchêne (Hibou57)
2012-02-24  1:41 ` Shark8
2012-02-24  8:52   ` Georg Bauhaus
2012-02-24 17:36   ` Peter C. Chapin
2012-03-06  1:27 ` Randy Brukardt
2012-03-06 17:24   ` Shark8
2012-03-06 17:43     ` Dmitry A. Kazakov
2012-03-06 19:03       ` Shark8
2012-03-07  5:33       ` Yannick Duchêne (Hibou57)
2012-03-07  9:12         ` Dmitry A. Kazakov
2012-03-07 17:49           ` Niklas Holsti
2012-03-07 20:17             ` Dmitry A. Kazakov
2012-03-07 23:28               ` Usefulness of Formal Notions in Programming (was: Verified compilers?) Georg Bauhaus
2012-03-08  9:24                 ` Usefulness of Formal Notions in Programming Dmitry A. Kazakov
2012-03-08 10:30                   ` Nasser M. Abbasi
2012-03-08 12:37                     ` Dmitry A. Kazakov
2012-03-08  0:42               ` Verified compilers? Randy Brukardt
2012-03-08  9:25                 ` Dmitry A. Kazakov
2012-03-08 18:10                   ` Niklas Holsti
2012-03-08 20:41                     ` Dmitry A. Kazakov
2012-03-08 18:02               ` Niklas Holsti
2012-03-08 20:40                 ` Dmitry A. Kazakov
2012-03-09  0:44                   ` Georg Bauhaus
2012-03-09 22:13                   ` Niklas Holsti
2012-03-10 10:36                     ` Dmitry A. Kazakov
2012-03-10 20:35                       ` Niklas Holsti
2012-03-11  9:47                         ` Dmitry A. Kazakov [this message]
2012-03-11 22:22                           ` Niklas Holsti
2012-03-12  5:12                             ` Niklas Holsti
2012-03-12  9:43                             ` Dmitry A. Kazakov
2012-03-14  8:36                               ` Niklas Holsti
2012-03-14  9:24                                 ` Georg Bauhaus
2012-03-14 11:14                                   ` REAL (was: Verified compilers?) stefan-lucks
2012-03-14 12:59                                     ` REAL Dmitry A. Kazakov
2012-03-14 13:30                                       ` REAL Georg Bauhaus
2012-03-14 13:51                                         ` REAL Dmitry A. Kazakov
2012-03-14 20:37                                           ` REAL Brian Drummond
2012-03-14 21:52                                             ` REAL Dmitry A. Kazakov
2012-03-14 13:52                                         ` REAL georg bauhaus
2012-03-14 17:42                                         ` REAL Jeffrey Carter
2012-03-14 10:14                                 ` Verified compilers? Dmitry A. Kazakov
2012-03-14 20:13                                   ` Niklas Holsti
2012-03-11 10:55                         ` Georg Bauhaus
2012-03-10 13:46                     ` Brian Drummond
2012-03-07  1:00     ` Randy Brukardt
2012-03-07 12:42   ` Stuart
2012-03-08  1:06     ` Randy Brukardt
2012-03-08  9:04       ` Jacob Sparre Andersen
2012-03-08  9:37         ` Dmitry A. Kazakov
2012-03-08 11:23           ` Simon Wright
2012-03-08 12:27             ` Dmitry A. Kazakov
2012-03-08 10:23         ` Brian Drummond
2012-03-08 23:38           ` Bill Findlay
2012-03-09 13:56             ` Brian Drummond
2012-03-09 14:43               ` Shark8
2012-03-09 21:51                 ` Brian Drummond
2012-03-09 15:49               ` Bill Findlay
2012-03-09 20:34                 ` Brian Drummond
2012-03-09 19:40               ` Jeffrey Carter
2012-03-09 20:39                 ` Brian Drummond
2012-03-09 23:59               ` phil.clayton
2012-03-08 15:23         ` Peter C. Chapin
2012-03-09  2:04         ` Randy Brukardt

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox