From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,8de7eedad50552f1
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!news4.google.com!news.glorb.com!npeer.de.kpn-eurorings.net!newsfeed.arcor.de!news.arcor.de!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Ada bench : count words
Newsgroups: comp.lang.ada
User-Agent: 40tude_Dialog/2.0.14.1
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Reply-To: mailbox@dmitry-kazakov.de
Organization: cbb software GmbH
References: <1cssfg6rke0bv$.1ou69sudrkdgz$.dlg@40tude.net>
 <OaWdnR5o0rTNKN3fRVn-sg@comcast.com>
Date: Wed, 23 Mar 2005 08:25:03 +0100
Message-ID: <1ico7d29vnxgh$.70o3jmlo8zmf$.dlg@40tude.net>
NNTP-Posting-Date: 23 Mar 2005 08:25:00 MET
NNTP-Posting-Host: bf05fc12.newsread4.arcor-online.net
X-Trace: 
 DXC=Nb0`M88`bi0CDTkn::RD0?:ejgIfPPld4jW\KbG]kaM8dbobRQ:W=W2UU`JMhHn\K7WRXZ37ga[7:eh9Z=IkP^14Z0NLZFc\ZD>
X-Complaints-To: abuse@arcor.de
Xref: g2news1.google.com comp.lang.ada:9772
Date: 2005-03-23T08:25:00+01:00
List-Id: <comp.lang.ada>

On Tue, 22 Mar 2005 18:16:16 -0600, tmoran@acm.org wrote:

>>It seems (from the description) that the separators are HT, SP, LF.
>> ...
> 
>   The "same thing" FAQ says "We prefer plain vanilla programs - after all
> we're trying to compare language implementations not programmer effort and
> skill."  So the "Ada version" should match the "Timing Trials" C version,
> which uses a block read, not the line oriented "gets".  So
> Ada.Text_IO.Get_Line should not be used.  (I grant that probably zero
> people paid attention to this "prefer".)
> 
>   When I browsed to the "input" page and cut&pasted to save the part that
> appeared to be the input file, the file size was 6102, whereas it's
> apparently supposed to be 6096.  Also, if CR is not a separator, then the
> lines consisting of CR-LF only should count CR as a short word, no?

Probably yes. Look at this:

/* -*- mode: c -*-
 * $Id: wc-gcc.code,v 1.9 2005/03/21 08:36:50 bfulgham Exp $
 * http://www.bagley.org/~doug/shootout/
 *
 * Author: Waldemar Hebisch (hebisch@math.uni.wroc.pl)
 * Optimizations: Michael Herf (mike@herfconsulting.com)
 * Further Revisions: Paul Hsieh (qed@pobox.com)
 */

#include <stdio.h>
#include <unistd.h>
#include <limits.h>

#define BSIZ 4096

unsigned long ws[UCHAR_MAX + 1];
unsigned long nws[UCHAR_MAX + 1];
char buff[BSIZ];

int main(void) {
    unsigned long prev_nws = 0x10000L, w_cnt = 0, l_cnt = 0, b_cnt = 0,
cnt;

    /* Fill tables */
    for (cnt = 0; cnt <= UCHAR_MAX; cnt++) {
         ws[cnt] =  (cnt == ' ' || cnt == '\n' || cnt == '\t') + (0x10000L
& -(cnt == '\n'));
	nws[cnt] = !(cnt == ' ' || cnt == '\n' || cnt == '\t') +  0x10000L;
    }

    /* Main loop */
    while (0 != (cnt = read (0, buff, BSIZ))) {
        unsigned long vect_count = 0;
	unsigned char *pp, *pe;

	b_cnt += cnt;
	pe = buff + cnt;
	pp = buff;

	while (pp < pe) {
	    vect_count +=  ws[*pp] & prev_nws;
	    prev_nws    = nws[*pp];
	    pp ++;
	}
	w_cnt += vect_count  & 0xFFFFL;
	l_cnt += vect_count >> 16;
    }

    w_cnt += 1 & prev_nws;

    printf ("%d %d %d\n", l_cnt, w_cnt, b_cnt);
    return 0;
}

This will count CR as a word.

> Or is the file supposed to be Unix-style, with CRs removed?

Yes if they accept "read", and no if they don't. Usual schism of all
contests of this sort...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de