From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5412c98a3943e746
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.227.166 with SMTP id sb6mr5203615pbc.4.1331375806902;
        Sat, 10 Mar 2012 02:36:46 -0800 (PST)
Path: 
 h9ni9717pbe.0!nntp.google.com!news1.google.com!goblin2!goblin.stu.neva.ru!aioe.org!.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Newsgroups: comp.lang.ada
Subject: Re: Verified compilers?
Date: Sat, 10 Mar 2012 11:36:35 +0100
Organization: cbb software GmbH
Message-ID: <4pp58h1br5sp$.yylr20e5vzxb.dlg@40tude.net>
References: <op.v91cleyiule2fv@douda-yannick> <jj3p5v$2tp$1@munin.nbi.dk>
 <9207716.776.1331054644462.JavaMail.geo-discussion-forums@ynaz38>
 <nhzeozam35it$.13mzx8ap4cmdc$.dlg@40tude.net>
 <op.wascesqrule2fv@douda-yannick>
 <4edda5mav3cf$.149pbgyxl1wx5.dlg@40tude.net>
 <9rplcgF5a2U1@mid.individual.net>
 <1psd0g0womgxi.1sle7ol12x3d5.dlg@40tude.net>
 <9rsahhFmr3U1@mid.individual.net> <ezfn0og1qfov.14k34sxoky5ke.dlg@40tude.net>
 <9rvdjvFfa8U1@mid.individual.net>
Reply-To: mailbox@dmitry-kazakov.de
NNTP-Posting-Host: /bBpnkeEm9kG1v1C1CjDFw.user.speranza.aioe.org
Mime-Version: 1.0
X-Complaints-To: abuse@aioe.org
User-Agent: 40tude_Dialog/2.0.15.1
X-Notice: Filtered by postfilter v. 0.8.2
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Date: 2012-03-10T11:36:35+01:00
List-Id: <comp.lang.ada>

On Sat, 10 Mar 2012 00:13:19 +0200, Niklas Holsti wrote:

> On 12-03-08 22:40 , Dmitry A. Kazakov wrote:
>> On Thu, 08 Mar 2012 20:02:25 +0200, Niklas Holsti wrote:
>>
>>> On 12-03-07 22:17 , Dmitry A. Kazakov wrote:
>>>> On Wed, 07 Mar 2012 19:49:04 +0200, Niklas Holsti wrote:
>>>>
>>>>> On 12-03-07 11:12 , Dmitry A. Kazakov wrote:
>>>>>
>>>>>> I was always wondering why people bother to study grammars, LR, LL parsers
>>>>>> and other useless stuff.
>>>>>
>>>>> Grammars and the notion of parsing (derivations) rigorously define what
>>>>> a given string of characters means, in some formally defined language.
>>>>
>>>> No. A grammar defines some formal language: a set of strings.
>>>
>>> Ok, I was being too brief. You are right that a grammar defines a
>>> language. But notice that I was also speaking of parsing and
>>> derivations, not just grammars.
>>>
>>> If you want to define the meanings of the sentences of a language in a
>>> rigorous way, you have basically two options:
>>>
>>> 1. Define the meaning operationally, by giving a standard interpreter, a
>>> black box, for example a Turing machine, that accepts the language, and
>>> define that the actions of that machine, while reading a sentence, are
>>> the meaning of that sentence. (From your point of view, the meaning
>>> could be the "translation" that appears on the output tape.)
>>>
>>> 2. Define the meaning mathematically in some way. Note that by
>>> "mathematically" I don't necessarily mean formally proven math, but
>>> mathematically precise definitions with the usual amount of hand-waving
>>> and shortcuts.
>>
>> 1 and 2 are same to me. In all cases it is an input for some machine. An
>> inference system based on Peano axioms or whatever, is no less a machine
>> than the Turing one. The only difference is in the computational power of
>> such machines.
> 
> I see a big difference in the structure of the machine.

Which is more or less equivalent to its power.

>>> As far as I know, all methods -- formal or informal -- for defining the
>>> meaning of sentences in formal languages are based on how the sentence
>>> is derived, or equivalently how it is parsed.
>>
>> No. 1234 has the same meaning for Ada parser, human reader and an OCR
>> system.
> 
> 1234 has no meaning, except as a literal string, until you give it some 
> other meaning, for example as the integer one thousand two hundred and 
> thirty four.

1234 is an example of a language (of decimal literal). The point is that
the meaning of the sentence in that language does depend on how you would
parse it. An OCR system could simply compare pictures. What human brain
does, nobody knows. An Ada program would yield a machine number. Where is a
tree? Which tree?

   +
  / \
 4  +
    / \
   * 
  / \
 3  10

This one? There is a countable infinity of such trees. A German would say
that this is twelve hundred, four and thirty (12,4,3). A way different
structure then?

The meaning has nothing to do with parsing. In fact parsing has to be
validated against the meaning.

>>> How else could you explain why "2*3+4*5" means 26, and not 70, nor
>>> something else?
>>
>> Explain? It is a subject of proof. There are more than one way to do it.
> 
> Operator precedence, perhaps? See earlier post...

I didn't mean variations of operator precedence generating different
results. I meant that 2*3+4*5 = 26 is a statement to be proven. It is
either true or false (or undecidable). 26 is the value of the formula.

>> BTW the "meaning" you are advocating for looks rather something like
>> 23*45*+.
> 
> More like: the expression is a SUM of (a TERM that is the PRODUCT of 
> (the number 2) and (the number 3)) and (a TERM that is the PRODUCT of 
> (the number 4) and (the number 5)), together with a rule for evaluating 
> such expression trees.

The above is alike, the difference is that it uses the Polish Reverse
instead of your tree. Why is your "meaning" better?

>>> In practice, the role of the parse/derivation tree appears in the fact
>>> that parser/compiler generators let you attach "semantic actions" to
>>> grammar productions, and often help you to consider larger parts of the
>>> derivation by passing information around the parse/derivation tree in
>>> well-defined ways. Attribute grammars are one example.
>>
>> Parsing tree is just a less ambiguous representation of the program.
> 
> "Just"? It is the first step in the assignment of meaning to a sentence.

It could be a step of evaluating the sentence, validating it etc. But the
meaning of the sentence is already assigned by the sentence itself
(provided the sentence is legal).

> The second step is a computation defined inductively on the parse tree.

The second step is semantic analysis, if we consider compilers. AST gets a
lot of kicking at this stage: transformed, manipulated, pruned etc. 

>> You don't even need to generate it in order to evaluate 2*3+4*5
>> using a finite state machine.
> 
> You cannot do it with a finite state machine (assuming you want to allow 
> literals of any size and expressions of any length). You need a stack of 
> numbers, too.

First, stack is not tree. Second, regarding finiteness, that is of course
true, just because an infinite language cannot be mapped (injectively) onto
a finite number of states. But this has nothing to do with the "structure."
Once you limit the language you can parse it by brute force, no trees, no
stacks, just strcmp.

> The generation of parsing machines from grammars is one of the main 
> achievements of formal language and parsing theory. Why sweat to make 
> the machine manually, when you can do it automatically?

Because:

1. You need not to generate anything. All infix expressions can be parsed
by the same engine controlled by a few tables.

2. The efforts are immense. I already gave an example of Ada grammar
capturing operator precedence, association constraints (e.g. 2**+6), left
to right, right to left etc. It is a huge work. And you also need to verify
if the mess of productions you have written indeed means that "unary + is
illegal as power." Because productions is NOT the meaning, they are just a
program.

3. You need tools, you need integration of the generated code into your
project. You need to keep track of changes of the grammar program, of the
generated program, of the rest.

>> And you certainly do not need it when using your head
> 
> How do you know what your head is using? :-)

Last time I did it, there were no trees inside! (:-))

>> Do you want to say that the trace of
>> productions applied during parsing is the program semantics?
> 
> That trace (or rather the parse tree) is the structure on which the 
> semantics is defined, by structurally inductive definitions.

We have to disagree here. To me AST is just a representation, e.g. another
language. Together with the operations defined on the ASTs, it is a machine
into which language the parser might translate the input. There exist other
machines, and there is nothing special in this particular one. 

>> This is a remnant of a delusion shared by some that
>> formal languages could describe, if not natural languages, but at least
>> machine languages. I thought nobody seriously believes in this now.
> 
> Natural languages, ok, since they have no precise definition to begin 
> with, so can only be approximated by formal languages. But I am amazed 
> by your position regarding machine languages. Perhaps we have a 
> different notion of what "describing" means.
> 
>> Care to carve a grammar where
>>
>>     declare
>>        X : Integer := 1.0 + 2;
>>
>> would be a syntax error?
> 
> Easy, the grammar that defines only the empty string :-)
> 
> But I guess you are thinking about types, and how to make the mixing of 
> integers and floating point illegal. It is trivial to write a grammar 
> that makes it a syntax error to mix integer literals and floating point 
> literals in the same expression, but you have to duplicate the 
> definition of expressions and have separate metasymbols 
> Integer_Expression and FP_Expression, and all the way down to literals.

... and for all types, user-defined included. And also - "a return required
in the function". And also - all alternatives of a case do not intersect
and covering all choices ... Just for the start.

(Grammar is not the semantics)

> If you are thinking about identifiers with types, use an attribute 
> grammar that checks types by means of a symbol table, and there you are. 
> Perhaps you don't want to call that a "syntax error", but at least it is 
> a "parse-time error".

That is yet another nail in the coffin. Grammars are not only incapable to
describe language semantics beyond classroom exercises. Even if they could,
that would not yet make a compiler. A compiler should also explain why it
rejects illegal program.

Formally speaking, the language being parsed consists of all possible
strings. Nothing is a syntax error. If the input is illegal in the language
1 (e.g. Ada), then in the language 2 it is legal and yields "closing
bracket is missing for the opening bracket found at xx:yy."

>>>> Furthermore the merits of formal grammar definitions are debatable, because
>>>> the complexity of grammar descriptions usually make it impossible to judge
>>>> if a given string do belong to the formal language or not.
>>>
>>> Entirely wrong. Can you give an example of this impossibility?
>>
>> Are you claiming that if I gave you a random set of production rules and a
>> random string you could tell me [in 30 seconds time] if that string belongs
>> to the language?
> 
> What have my personal abilities to do with anything?

If nobody can understand grammar descriptions, they cannot serve as a
meaning-carrier.

> Perhaps you want to use informal methods, and ask Jill, Joe, and John if 
> the sentence is legal? Perhaps with majority voting?

Absolutely. To me returning a limited object from a function is rubbish,
but the majority voted it meaningful.

Anyway, there is a rock bottom for any formalism, beyond it everything
becomes informal.

> From this point of view, grammars are an application-specific 
> programming language, for programming parsers.

Yes.

> Nothing wrong with that.

Wrong is that they serve that purpose rather poorly.

> And it was made possible by the research into formal languages and parsing.

These are not worth the paper they were printed on. But that is my personal
opinion. CS is in general turned to be quite fruitless. Maybe because the
subject is too hard to our contemporary mathematics. Maybe because humans
are simply stupid, and we will never advance beyond our limits. Who knows.

> If you want to describe what language your hand-written parser accepts, 
> and what its processing means, I think you have to present the grammar 
> and describe the meaning of the sentences.

Or, I can present the tables with the same effect.

> You can show the grammar in 
> BNF, and describe the meaning informally, but I would be very surprised 
> if you could define the meaning without referring to the BNF, that is, 
> without using some structural induction method (informally, of course).

BTW, mathematical notation existed long before BNF.

>>>> On one hand these classes do not capture real differences
>>>> between syntaxes of programming languages.
>>>
>>> What sort of "real differences" do you mean?
>>
>> Like between C++ and Ada.
> 
> Sorry, not clear yet. I'm sure you don't mean just that one uses {} and 
> the other begin-end, but what do you mean?

I mean that a human reader instantly recognizes that the language is "like
C++" or "like Ada". This means that somewhere in the brain there is a
picture, a class of syntaxes, we, as humans, are capable to operate. On the
contrary the classes of formal languages are difficult to us to recognize.
If anybody were presented by a sample from this or that class, he would not
be able to figure out what it were. It is like the FFT of an image. Can you
tell if the picture shows a cat or a girl by looking at its spectrum? Ergo,
spectral analysis is poor for scene recognition. (The difference is though
spectral analysis is useful for other things, while grammars are useful for
nothing... (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de