From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,6bf9d4ba0cfd8cb6 X-Google-Attributes: gid103376,public From: Hyman Rosen Subject: Re: Announce: OpenToken 2.0 released Date: 2000/02/01 Message-ID: #1/1 X-Deja-AN: 580436107 Sender: hymie@calumny.jyacc.com References: <3890C62B.18309585@telepath.com> <876unj$jcs$1@nnrp1.deja.com> X-Complaints-To: abuse@panix.com X-Trace: news.panix.com 949428965 12219 209.49.126.226 (1 Feb 2000 18:16:05 GMT) Organization: PANIX Public Access Internet and UNIX, NYC NNTP-Posting-Date: 1 Feb 2000 18:16:05 GMT Newsgroups: comp.lang.ada Date: 2000-02-01T18:16:05+00:00 List-Id: Ted Dennison writes: > Proper binary support is not in OT because it has just never come up > before. But as you can see, it could be modified fairly easily to > support parsing binaries. But using a sentinel character for the end of > file has always seemed like a nice simplification. So what are the uses > of parsing binaries? I kinda thought that binaries are, by their very > nature, already parsed. Well, at one point I was writing code to parse Adobe PDF files. They have a binary format, where arbitrary 8-bit bytes can appear, and a structure which I think lends itself well to syntax-oriented parsing. In general, I like to avoid arbitrary restrictions in tools. Before GNU, most classic UNIX utilities had arbitrary limits, especially on line size. This led to unexpected and sometimes silent breakage when the tools were fed files with lines which were too large. And the tool reporting the problem isn't of much help, when I still have that file I need to process and the tool won't work. By the way, the normal C/C++ style for handling EOF is to have the return type of the character reader be such that it can hold any value of the character set, plus an out-of-band value representing EOF. The usual is '#define EOF -1' and 'int getchar()'.