comp.lang.ada
 help / color / mirror / Atom feed
* Encaspulation: What to export
@ 2017-11-27 19:25 Victor Porton
  2017-11-27 20:00 ` Dmitry A. Kazakov
  2017-11-27 21:39 ` Simon Wright
  0 siblings, 2 replies; 22+ messages in thread
From: Victor Porton @ 2017-11-27 19:25 UTC (permalink / raw)


I am writing free software containing a kinda parser, which converts from 
external representation into the internal format of my Ada program.

What should be in the public package interface and what in package body 
only?

Should I export only the parser for the main object (as it is the only 
parser used by the rest of the program)?

or should I export parsers for all subobjects (because exporting only the 
main parser and not exporting the rest ones is kinda asymmetric: some parser 
is exported and some are not)?

-- 
Victor Porton - http://portonvictor.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-27 19:25 Encaspulation: What to export Victor Porton
@ 2017-11-27 20:00 ` Dmitry A. Kazakov
  2017-11-27 20:34   ` Victor Porton
  2017-11-27 21:39 ` Simon Wright
  1 sibling, 1 reply; 22+ messages in thread
From: Dmitry A. Kazakov @ 2017-11-27 20:00 UTC (permalink / raw)


On 2017-11-27 20:25, Victor Porton wrote:
> I am writing free software containing a kinda parser, which converts from
> external representation into the internal format of my Ada program.

What you describe is not a parser, it a deserialization operation.

> What should be in the public package interface and what in package body
> only?

Serialization/deserialization are public operations of the type (and the 
medium type, e.g. stream type). If the type is public so must be the 
operation. If private, the operation cannot be made public anyway.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-27 20:00 ` Dmitry A. Kazakov
@ 2017-11-27 20:34   ` Victor Porton
  2017-11-27 20:35     ` Victor Porton
  2017-11-28  2:12     ` Randy Brukardt
  0 siblings, 2 replies; 22+ messages in thread
From: Victor Porton @ 2017-11-27 20:34 UTC (permalink / raw)


Dmitry A. Kazakov wrote:

> On 2017-11-27 20:25, Victor Porton wrote:
>> I am writing free software containing a kinda parser, which converts from
>> external representation into the internal format of my Ada program.
> 
> What you describe is not a parser, it a deserialization operation.
> 
>> What should be in the public package interface and what in package body
>> only?
> 
> Serialization/deserialization are public operations of the type (and the
> medium type, e.g. stream type). If the type is public so must be the
> operation. If private, the operation cannot be made public anyway.

I parse not a string but an RDF tree. It is similar to convert AST (abstract 
syntax tree) to another format. RDF is similar to an AST but more abstract.

So it is NOT a deserialization operation in Ada sense.

-- 
Victor Porton - http://portonvictor.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-27 20:34   ` Victor Porton
@ 2017-11-27 20:35     ` Victor Porton
  2017-11-28  2:12     ` Randy Brukardt
  1 sibling, 0 replies; 22+ messages in thread
From: Victor Porton @ 2017-11-27 20:35 UTC (permalink / raw)


Victor Porton wrote:

> Dmitry A. Kazakov wrote:
> 
>> On 2017-11-27 20:25, Victor Porton wrote:
>>> I am writing free software containing a kinda parser, which converts
>>> from external representation into the internal format of my Ada program.
>> 
>> What you describe is not a parser, it a deserialization operation.
>> 
>>> What should be in the public package interface and what in package body
>>> only?
>> 
>> Serialization/deserialization are public operations of the type (and the
>> medium type, e.g. stream type). If the type is public so must be the
>> operation. If private, the operation cannot be made public anyway.
> 
> I parse not a string but an RDF tree. It is similar to convert AST
> (abstract syntax tree) to another format. RDF is similar to an AST but
> more abstract.

I fact RDF is a directed graph format.

I transform from this special kind of directed graphs (which I receive on 
input) into my internal program's format.

> So it is NOT a deserialization operation in Ada sense.
> 
-- 
Victor Porton - http://portonvictor.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-27 19:25 Encaspulation: What to export Victor Porton
  2017-11-27 20:00 ` Dmitry A. Kazakov
@ 2017-11-27 21:39 ` Simon Wright
  1 sibling, 0 replies; 22+ messages in thread
From: Simon Wright @ 2017-11-27 21:39 UTC (permalink / raw)


Victor Porton <porton@narod.ru> writes:

> I am writing free software containing a kinda parser, which converts from
> external representation into the internal format of my Ada program.
>
> What should be in the public package interface and what in package body
> only?
>
> Should I export only the parser for the main object (as it is the only
> parser used by the rest of the program)?
>
> or should I export parsers for all subobjects (because exporting only the
> main parser and not exporting the rest ones is kinda asymmetric: some parser
> is exported and some are not)?

The public part of the spec should contain only what clients need to
see.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-27 20:34   ` Victor Porton
  2017-11-27 20:35     ` Victor Porton
@ 2017-11-28  2:12     ` Randy Brukardt
  2017-11-28 15:22       ` Victor Porton
  1 sibling, 1 reply; 22+ messages in thread
From: Randy Brukardt @ 2017-11-28  2:12 UTC (permalink / raw)


"Victor Porton" <porton@narod.ru> wrote in message 
news:ovhst9$1a43$1@gioia.aioe.org...
...
> I parse not a string but an RDF tree. It is similar to convert AST 
> (abstract
> syntax tree) to another format. RDF is similar to an AST but more 
> abstract.
>
> So it is NOT a deserialization operation in Ada sense.

Calling the operation of creating an RDF (or XML or HTML or SGML ...) tree 
from "parsing" is a gross distortion from what really is going on. As Dmitry 
says, it is much more a deserialization operation, since you don't need any 
sort of traditional parser to implement it. You just need text operations 
(lexical analysis) and a stack of pending tree nodes (both of which are 
needed even if you had used a traditional parser anyway). In general, any 
"language" where the node types are immediately determined (as in all of the 
above, the prefix determines the type of node) does not need parsing, 
because there are never any situations where the choice of what to do has to 
be deferred.

                                                   Randy.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-28  2:12     ` Randy Brukardt
@ 2017-11-28 15:22       ` Victor Porton
  2017-11-28 22:36         ` Randy Brukardt
  0 siblings, 1 reply; 22+ messages in thread
From: Victor Porton @ 2017-11-28 15:22 UTC (permalink / raw)


Randy Brukardt wrote:

> "Victor Porton" <porton@narod.ru> wrote in message
> news:ovhst9$1a43$1@gioia.aioe.org...
> ...
>> I parse not a string but an RDF tree. It is similar to convert AST
>> (abstract
>> syntax tree) to another format. RDF is similar to an AST but more
>> abstract.
>>
>> So it is NOT a deserialization operation in Ada sense.
> 
> Calling the operation of creating an RDF (or XML or HTML or SGML ...) tree
> from "parsing" is a gross distortion from what really is going on. As

No. I "parse" RDF (not a text file) and create data in other format.

> Dmitry says, it is much more a deserialization operation, since you don't
> need any sort of traditional parser to implement it. You just need text

I write an "untraditional" parser which extracts data from RDF.

> operations (lexical analysis) and a stack of pending tree nodes (both of
> which are needed even if you had used a traditional parser anyway). In
> general, any "language" where the node types are immediately determined
> (as in all of the above, the prefix determines the type of node) does not
> need parsing, because there are never any situations where the choice of
> what to do has to be deferred.
> 
>                                                    Randy.
-- 
Victor Porton - http://portonvictor.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-28 15:22       ` Victor Porton
@ 2017-11-28 22:36         ` Randy Brukardt
  2017-11-28 23:29           ` Shark8
                             ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Randy Brukardt @ 2017-11-28 22:36 UTC (permalink / raw)


"Victor Porton" <porton@narod.ru> wrote in message 
news:ovjv01$1r5c$1@gioia.aioe.org...
> Randy Brukardt wrote:
>
>> "Victor Porton" <porton@narod.ru> wrote in message
>> news:ovhst9$1a43$1@gioia.aioe.org...
>> ...
>>> I parse not a string but an RDF tree. It is similar to convert AST
>>> (abstract
>>> syntax tree) to another format. RDF is similar to an AST but more
>>> abstract.
>>>
>>> So it is NOT a deserialization operation in Ada sense.
>>
>> Calling the operation of creating an RDF (or XML or HTML or SGML ...) 
>> tree
>> from "parsing" is a gross distortion from what really is going on. As
>
> No. I "parse" RDF (not a text file) and create data in other format.

Yes, you (do something, but don't parse) RDF and create some other format. 
Format transformations surely don't require parsing. My objection is that 
what you are doing to create that other format is not parsing; it's rather 
just a text to tree transformation that is deterministic (really a very 
simple state machine). Calling that "parsing" trivializes the much more 
complex languages that parsers can make sense of. None of the on-line 
languages (with the possible exception of CSS) require any parsing; SGML was 
designed to not require parsing and all of these other formats kept the 
basic design of SGML.

                                          Randy.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-28 22:36         ` Randy Brukardt
@ 2017-11-28 23:29           ` Shark8
  2017-11-29  4:44             ` gautier_niouzes
  2017-11-29  8:40             ` Dmitry A. Kazakov
  2017-11-29  7:31           ` G. B.
  2017-11-29 18:46           ` Victor Porton
  2 siblings, 2 replies; 22+ messages in thread
From: Shark8 @ 2017-11-28 23:29 UTC (permalink / raw)


On Tuesday, November 28, 2017 at 3:36:23 PM UTC-7, Randy Brukardt wrote:
> None of the on-line 
> languages (with the possible exception of CSS) require any parsing; SGML was 
> designed to not require parsing and all of these other formats kept the 
> basic design of SGML.

Doesn't there have to be some sort of parsing for HTML? Specifically the TABLE-tag? (And possibly keeping track of opening-/closing tags, in general.)

I mean, that's why it's not Regular and you [therefore] can't use RegEx for it.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-28 23:29           ` Shark8
@ 2017-11-29  4:44             ` gautier_niouzes
  2017-11-29 20:32               ` Randy Brukardt
  2017-11-29  8:40             ` Dmitry A. Kazakov
  1 sibling, 1 reply; 22+ messages in thread
From: gautier_niouzes @ 2017-11-29  4:44 UTC (permalink / raw)


Shark8:

> Doesn't there have to be some sort of parsing for HTML? Specifically the TABLE-tag? (And possibly keeping track of opening-/closing tags, in general.)

For sure, HTML needs to be parsed. You find a parser here:

https://sf.net/p/wasabee/code/HEAD/tree/zrt_dev/common/wasabee-hypertext-parsing.adb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-28 22:36         ` Randy Brukardt
  2017-11-28 23:29           ` Shark8
@ 2017-11-29  7:31           ` G. B.
  2017-11-29  7:38             ` G. B.
  2017-11-29  8:14             ` Simon Wright
  2017-11-29 18:46           ` Victor Porton
  2 siblings, 2 replies; 22+ messages in thread
From: G. B. @ 2017-11-29  7:31 UTC (permalink / raw)


Randy Brukardt <randy@rrsoftware.com> wrote:
>; SGML was 
> designed to not require parsing and all of these other formats kept the 
> basic design of SGML.

SGML as opposed to misrepresentations of what is, 
by definition, and design, is configurable so as to support
writers who like to omit, e.g. closing tags.
This is just one example of when SGML may
require substantial amounts of parsing in order to find
the (set of) permissible sentence structures.
Somewhat like an error correcting parser.

I agree that a fail fast reader of a serialized tree
is not a parser.

Full SGML, it’s not-grammar-defining parts at least,
is more like Markdown. The latter is notoriously 
hard to parse correctly, given the many ways in
which writers can make reasonable mistakes which
are therefore to be recognized and corrected , by Markdown
parsers. Most are not mature enough to cope.
And markdown typically doesn’t even have
comments.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-29  7:31           ` G. B.
@ 2017-11-29  7:38             ` G. B.
  2017-11-29  8:14             ` Simon Wright
  1 sibling, 0 replies; 22+ messages in thread
From: G. B. @ 2017-11-29  7:38 UTC (permalink / raw)


G. B. <nonlegitur@nmhp.invalid> wrote:

> Full SGML, it’s

its, rather.






^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-29  7:31           ` G. B.
  2017-11-29  7:38             ` G. B.
@ 2017-11-29  8:14             ` Simon Wright
  1 sibling, 0 replies; 22+ messages in thread
From: Simon Wright @ 2017-11-29  8:14 UTC (permalink / raw)


G. B. <nonlegitur@nmhp.invalid> writes:

> the many ways in which writers can make reasonable mistakes which are
> therefore to be recognized and corrected , by Markdown parsers.

Or the ways in which a writer can forget/mistake the markdown dialect
they are writing for. Each site I write markdown for has its own
variant. Standardisation would be great.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-28 23:29           ` Shark8
  2017-11-29  4:44             ` gautier_niouzes
@ 2017-11-29  8:40             ` Dmitry A. Kazakov
  1 sibling, 0 replies; 22+ messages in thread
From: Dmitry A. Kazakov @ 2017-11-29  8:40 UTC (permalink / raw)


On 29/11/2017 00:29, Shark8 wrote:

> Doesn't there have to be some sort of parsing for HTML?

Yes. Parsing presumes a human-readable language. HTML was meant to be 
human-readable.

> I mean, that's why it's not Regular and you [therefore] can't use RegEx for it.

It is a different matter. There are classes of formal languages. Regular 
expressions belong to such a class. If some language is out of the class 
its sentences cannot be recognized/described/generated.

This is not directly related to parsing and parsing is only partially 
about recognizing valid statements. A parser must also deal with invalid 
ones in order to provide meaningful diagnostics and have an output, 
beyond just yes-no.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-28 22:36         ` Randy Brukardt
  2017-11-28 23:29           ` Shark8
  2017-11-29  7:31           ` G. B.
@ 2017-11-29 18:46           ` Victor Porton
  2017-11-29 19:43             ` Dmitry A. Kazakov
  2017-11-29 20:42             ` Randy Brukardt
  2 siblings, 2 replies; 22+ messages in thread
From: Victor Porton @ 2017-11-29 18:46 UTC (permalink / raw)


Randy Brukardt wrote:

> "Victor Porton" <porton@narod.ru> wrote in message
> news:ovjv01$1r5c$1@gioia.aioe.org...
>> Randy Brukardt wrote:
>>
>>> "Victor Porton" <porton@narod.ru> wrote in message
>>> news:ovhst9$1a43$1@gioia.aioe.org...
>>> ...
>>>> I parse not a string but an RDF tree. It is similar to convert AST
>>>> (abstract
>>>> syntax tree) to another format. RDF is similar to an AST but more
>>>> abstract.
>>>>
>>>> So it is NOT a deserialization operation in Ada sense.
>>>
>>> Calling the operation of creating an RDF (or XML or HTML or SGML ...)
>>> tree
>>> from "parsing" is a gross distortion from what really is going on. As
>>
>> No. I "parse" RDF (not a text file) and create data in other format.
> 
> Yes, you (do something, but don't parse) RDF and create some other format.
> Format transformations surely don't require parsing. My objection is that
> what you are doing to create that other format is not parsing; it's rather
> just a text to tree transformation that is deterministic (really a very

It is not text to tree transformation.

It is a transformation from a tree (in fact a directed graph) into another 
format. (This digraph is created parsing a text file, but this is a 
different story, because I use already ready binding of a C library to parse 
a text format into RDF graph.)

Transformation from tree is very similar to parsing (yes, I use this word) 
from an abstract syntax tree to another format.

When reading an AST, the word "parsing" is correct, despite it is not in any 
way text parsing.

I do use the word "parsing" because it uses techniques very similar to text 
parsing, namely I use recursive descent.

> simple state machine). Calling that "parsing" trivializes the much more
> complex languages that parsers can make sense of. None of the on-line
> languages (with the possible exception of CSS) require any parsing; SGML
> was designed to not require parsing and all of these other formats kept
> the basic design of SGML.
> 
>                                           Randy.
-- 
Victor Porton - http://portonvictor.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-29 18:46           ` Victor Porton
@ 2017-11-29 19:43             ` Dmitry A. Kazakov
  2017-11-29 19:57               ` Victor Porton
  2017-11-29 20:42             ` Randy Brukardt
  1 sibling, 1 reply; 22+ messages in thread
From: Dmitry A. Kazakov @ 2017-11-29 19:43 UTC (permalink / raw)


On 2017-11-29 19:46, Victor Porton wrote:

> I do use the word "parsing" because it uses techniques very similar to text
> parsing, namely I use recursive descent.

The commonly used term for what you do is "depth-first tree traversal".

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-29 19:43             ` Dmitry A. Kazakov
@ 2017-11-29 19:57               ` Victor Porton
  0 siblings, 0 replies; 22+ messages in thread
From: Victor Porton @ 2017-11-29 19:57 UTC (permalink / raw)


Dmitry A. Kazakov wrote:

> On 2017-11-29 19:46, Victor Porton wrote:
> 
>> I do use the word "parsing" because it uses techniques very similar to
>> text parsing, namely I use recursive descent.
> 
> The commonly used term for what you do is "depth-first tree traversal".

Tree traversal is not the same as recursive descent. For example recursive 
descent may analyze the same fragment of a tree more than once or it may 
skip some parts of the tree entirely.

However, I advise to stop arguing about used terms, as it is an offtopic 
here and there is no utility in this discussion.

-- 
Victor Porton - http://portonvictor.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-29  4:44             ` gautier_niouzes
@ 2017-11-29 20:32               ` Randy Brukardt
  2017-11-29 21:08                 ` gautier_niouzes
  0 siblings, 1 reply; 22+ messages in thread
From: Randy Brukardt @ 2017-11-29 20:32 UTC (permalink / raw)


<gautier_niouzes@hotmail.com> wrote in message 
news:b9353594-3831-4fcb-b27c-b0afca754a60@googlegroups.com...
> Shark8:
>
>> Doesn't there have to be some sort of parsing for HTML? Specifically the 
>> TABLE-tag? (And possibly keeping track of opening-/closing tags, in 
>> general.)
>
> For sure, HTML needs to be parsed. You find a parser here:

Really? We don't have any parser (just a lexer) in the search engine 
crawler. As I recall, section closes are counted rather than anything more 
complex. I stand by my original statement.

                                Randy.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-29 18:46           ` Victor Porton
  2017-11-29 19:43             ` Dmitry A. Kazakov
@ 2017-11-29 20:42             ` Randy Brukardt
  2017-11-29 22:18               ` Shark8
  1 sibling, 1 reply; 22+ messages in thread
From: Randy Brukardt @ 2017-11-29 20:42 UTC (permalink / raw)


"Victor Porton" <porton@narod.ru> wrote in message 
news:ovmva2$1q20$1@gioia.aioe.org...
...
> I do use the word "parsing" because it uses techniques very similar to 
> text
> parsing, namely I use recursive descent.

I'd argue that most of what is typically called "recursive descent" per se 
isn't parsing, either, as it is a free-form glob whose set of languages is 
rather hard to specify formally. (There is a rather restrictive subset of 
"recursive descent" which strictly accepts LL(1) languages; that is the only 
thing, IMHO, that deserves the term. That's rarely used in practice for a 
variety of reasons, the main one being that LL(1) languages aren't very 
interesting.)

>...  For example recursive descent may analyze the same fragment of a tree
> more than once or it may skip some parts of the tree entirely.

That's not "recursive descent", that's just some code doing whatever it is 
that you need to do. Nothing wrong with that, but calling that "parsing" or 
"recursive descent" or anything other well-known term is just abusing those 
terms.

                                             Randy.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-29 20:32               ` Randy Brukardt
@ 2017-11-29 21:08                 ` gautier_niouzes
  2017-11-30 21:42                   ` Randy Brukardt
  0 siblings, 1 reply; 22+ messages in thread
From: gautier_niouzes @ 2017-11-29 21:08 UTC (permalink / raw)


Randy Brukardt:

> Really? We don't have any parser (just a lexer) in the search engine 
> crawler. As I recall, section closes are counted rather than anything more 
> complex.

Sure. You could say the same for gathering identifiers from Ada sources for a search engine for Ada sources. A lexer is ok for that job. Would you conclude that Ada doesn't need to be parsed ?
Or reversely, how would you manage to display HTML lists or tables without a parser ? With a search engine crawler you just throw away the HTML structures. This is okay for your crawler: you just need the text between the tags.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-29 20:42             ` Randy Brukardt
@ 2017-11-29 22:18               ` Shark8
  0 siblings, 0 replies; 22+ messages in thread
From: Shark8 @ 2017-11-29 22:18 UTC (permalink / raw)


On Wednesday, November 29, 2017 at 1:42:07 PM UTC-7, Randy Brukardt wrote:
> 
> >...  For example recursive descent may analyze the same fragment of a tree
> > more than once or it may skip some parts of the tree entirely.
> 
> That's not "recursive descent", that's just some code doing whatever it is 
> that you need to do. Nothing wrong with that, but calling that "parsing" or 
> "recursive descent" or anything other well-known term is just abusing those 
> terms.

I'd suggest "tree transformation"... except that somewhat implies that the result is the same type of tree [like a Procedure Transform( Item : in out Tree )] when it sounds like he's converting some other sort of tree into the sort he wants [like a Function Convert( Input : RDF_Tree ) return Whatever_Tree].

Perhaps "Tree Conversion" then?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Encaspulation: What to export
  2017-11-29 21:08                 ` gautier_niouzes
@ 2017-11-30 21:42                   ` Randy Brukardt
  0 siblings, 0 replies; 22+ messages in thread
From: Randy Brukardt @ 2017-11-30 21:42 UTC (permalink / raw)


<gautier_niouzes@hotmail.com> wrote in message 
news:1aab7965-08cf-472f-9322-bfabb6f2c728@googlegroups.com...
> Randy Brukardt:
>
>> Really? We don't have any parser (just a lexer) in the search engine
>> crawler. As I recall, section closes are counted rather than anything 
>> more
>> complex.
>
> Sure. You could say the same for gathering identifiers from Ada sources 
> for a search engine for Ada sources. A lexer is ok for that job. Would you 
> conclude that Ada doesn't need to be parsed ?
> Or reversely, how would you manage to display HTML lists or tables without 
> a parser ? With a search engine crawler you just throw away the HTML 
> structures. This is okay for your crawler: you just need the text between 
> the tags.

Not really true, since we tag the URLs with the type of reference 
(automatic, like images, or manual, like links) and that requires 
identifying the enclosing construct as well as the "attribute" containing 
the URL. And there are a few cases were the meaning of the attribute is 
different in different constructs.

As far as lists or tables goes, you can build an HTML tree without any 
parsing, so I don't see any requirement to parse HTML. (It might be easier 
to build a parser using a tool than a hand-constructed tree builder if ones 
needs are complex enough, but that doesn't change the underlying issue.)

                         Randy.



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-11-30 21:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27 19:25 Encaspulation: What to export Victor Porton
2017-11-27 20:00 ` Dmitry A. Kazakov
2017-11-27 20:34   ` Victor Porton
2017-11-27 20:35     ` Victor Porton
2017-11-28  2:12     ` Randy Brukardt
2017-11-28 15:22       ` Victor Porton
2017-11-28 22:36         ` Randy Brukardt
2017-11-28 23:29           ` Shark8
2017-11-29  4:44             ` gautier_niouzes
2017-11-29 20:32               ` Randy Brukardt
2017-11-29 21:08                 ` gautier_niouzes
2017-11-30 21:42                   ` Randy Brukardt
2017-11-29  8:40             ` Dmitry A. Kazakov
2017-11-29  7:31           ` G. B.
2017-11-29  7:38             ` G. B.
2017-11-29  8:14             ` Simon Wright
2017-11-29 18:46           ` Victor Porton
2017-11-29 19:43             ` Dmitry A. Kazakov
2017-11-29 19:57               ` Victor Porton
2017-11-29 20:42             ` Randy Brukardt
2017-11-29 22:18               ` Shark8
2017-11-27 21:39 ` Simon Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox