From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: border2.nntp.dca1.giganews.com!nntp.giganews.com!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!feeder.erje.net!eu.feeder.erje.net!news2.arglkargh.de!news.mixmin.net!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: [Slightly OT] How to process lightweight text markup languages? Date: Tue, 20 Jan 2015 20:44:35 +0100 Organization: cbb software GmbH Message-ID: <1wclq766iu82d.b0k1hx30rgrt.dlg@40tude.net> References: Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: 0MSBVPcE8EdvhPFyEbPM4g.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 Xref: number.nntp.giganews.com comp.lang.ada:191947 Date: 2015-01-20T20:44:35+01:00 List-Id: On Tue, 20 Jan 2015 18:47:13 +0000 (UTC), Natasha Kerensikova wrote: > On 2015-01-18, Dmitry A. Kazakov wrote: >> On Sun, 18 Jan 2015 18:04:08 +0000 (UTC), Natasha Kerensikova wrote: >> >> [...] >> >>> My latest attempts involve keeping the online architecture with separate >>> input and output types and streams, and keeping a stack of currently >>> opened constructs, with a dynamically dispatching ending test on each >>> character for each construct on the stack. It feels horribly >>> inefficient and complicated. >> >> Nothing complicated and most efficient in the sense that depending on the >> formal language classification you could not be able eliminate the stack >> (memory) in some or other form. You could think of it in terms of possible >> states of the parser. If the number of states is infinite you must have >> stack or else the parser itself must be infinite. Simple example is parsing >> the language of balanced brackets: ()(()). > > Well it feels insurmountably complicated to me, that's why I posted in > first place -- to be enlightened. Nothing complicated to me, so far. > What I still can't make fit with what I know is how to deal > simultaneously with the precedence and the "implicit escaping", which is > further mudded by the interpretation of what is in the constructs > depends on the particular current construct. > > To put it in a grammar-like way (even though I doubt the considered > language has a grammar), I would have something like: > > formatted-text ::= code-fragment | link | ... | literal-text > code-fragment ::= '`' literal-text '`' > link ::= '[' formatted-text ']' '(' url [ link-title ] ')' > link-title ::= '"' literal-text '"' [] - brackets () - brackets `` - quotation marks "" - quotation marks > So if you remember my example, > > [alpha`](http://example.com/)`](http://other.com) Since `` are quotation marks, the above should be: + |_ [] | |_ + | |_ alpha | |_ ](http://example.com/) |_ () |_ http://other.com + is an assumed infix catenation operation. No backtracking needed. [...] > Am I right so far? Am I missing something? Distinguishing lexical and syntactical elements? You don't bother with operators until expression terms (lexemes) matched. Once you matched them you never return back. They are all on the stack if not already bound by an operation. If `` is declared literal, it is a term of the expression, atomically matched. It naturally takes precedence over anything else. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de