From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,3a6a9f1d654285ba X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!postnews.google.com!d4g2000yqa.googlegroups.com!not-for-mail From: jonathan Newsgroups: comp.lang.ada Subject: Re: Ada Shootout program for K-Nucleotide (patches) Date: Tue, 18 Aug 2009 09:59:45 -0700 (PDT) Organization: http://groups.google.com Message-ID: <4d48b846-bb2d-4126-86c2-487b2244c9ad@d4g2000yqa.googlegroups.com> References: <4a743343$0$32674$9b4e6d93@newsspool2.arcor-online.net> <4bc4b12d-40f8-4140-8ef6-326d9e6b8adf@k30g2000yqf.googlegroups.com> <4a897b61$0$30221$9b4e6d93@newsspool1.arcor-online.net> NNTP-Posting-Host: 143.117.23.233 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: posting.google.com 1250614785 17977 127.0.0.1 (18 Aug 2009 16:59:45 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Tue, 18 Aug 2009 16:59:45 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: d4g2000yqa.googlegroups.com; posting-host=143.117.23.233; posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.12) Gecko/2009072220 Iceweasel/3.0.6 (Debian-3.0.6-1),gzip(gfe),gzip(gfe) Xref: g2news2.google.com comp.lang.ada:7850 Date: 2009-08-18T09:59:45-07:00 List-Id: OK, I did some more work on knuceotide.adb. I'll say in advance that the code may be .. a bit messy. There is sometimes no way around this if you are trying for fast rather than elegant. The 1st and most important change improves running time of the bounded string comparison operator "=". This "=" is how the hash routine decides if it has found the desired item, so it could not be more important. I'm rather proud of this because it the speedup surprised me: apparently comparison of 32 or 64 bit ints is a lot faster than comparing the Characters one by one in the string. So I used an unchecked_conversion from strings of 4 characters to 32 bit int. The speedup was 15-20%. With this change and with the second change described below at the end, the program running time on my machine falls from about 53 sec to about 37 sec, using GNAT 4.3.2 and gnatmake -O3 -gnatp. with Unchecked_Conversion; separate (KNucleotide_1) package body Fragments_1 is -- old version of "=" -- function "=" (Left, Right: Bounded_String) return Boolean is -- begin -- return Left.Last = Right.Last -- and then Left.Data(1 .. Left.Last) = Right.Data(1 .. Right.Last); -- end "="; -- new version of "=". (Tell me if its right;) Bytes_per_Word : constant := 4; -- 4 good, even on 64-bit archs. type Uns is mod 2**(8*Bytes_per_Word); for Uns'Size use 8*Bytes_per_Word; subtype Str is String(1..Bytes_per_Word); function To_Uns is new Unchecked_Conversion (Str, Uns); function "=" (Left, Right: Bounded_String) return Boolean is strt : Integer := 1; fnsh : Integer := Bytes_per_Word; Last : constant Integer := Left.Last; begin if Last /= Right.Last then return False; end if; loop exit when fnsh > Last; if To_Uns (Left.Data(strt..fnsh)) /= To_Uns (Right.Data (strt..fnsh)) then return False; end if; strt := strt + Bytes_per_Word; fnsh := fnsh + Bytes_per_Word; end loop; for i in strt .. Last loop if Left.Data(i) /= Right.Data(i) then return False; end if; end loop; return True; end "="; The second change gives another speedup of about half that of the previous one. I am shamelessly advocating removing a layer of abstraction. Remember, I moved the declaration of the bounded string to the public part, so that the procedures can now see the 2 components of the record: Data, and Last. Here all we do is avoid a call to function To_Bounded_String in package Fragments_1. Below this call is commented out: -- -- Calculate the calculates the nucleotide frequencies -- procedure Calculate_Frequencies (Length : Frequencies) is Key : Fragment; Len : constant Integer := Length; begin -- NEW initialization of Key.Last: Key.Last := Len; Table.Reset; for I in 1 .. Buffer'Last - Len + 1 loop -- NEW initialization of Key.Data: Key.Data(1 .. Len) := Buffer (I .. I + Len - 1); declare -- OLD initialization of Key.Data: --Key : constant Fragment := -- Fragments_1.To_Bounded_String (Buffer (I .. I+Len-1)); Element : constant Element_Access := Table.Get (Key); begin I glance at the shootout benchmark site suggests we get a respectable result. Their machine is slower than mine, so I would expect somewhere around 55 sec on their machine. (But I am all the more impressed by the ones that are faster than 50 sec.) By the way, I notice that the top c++ entry uses a static 250 megabyte array to store the data. We can also gain a few seconds by doing the following. (Notice the optional 250 megabyte string. Yes, it worked.) procedure Read_Section is --Buffer : String (1 .. 1024*1024*256); -- best; ~8% faster than 10240 Buffer : String (1 .. 1024*1024*64); -- almost as good as above. --Buffer : String (1 .. 1024*10); -- standard setting --Buffer : String (1 .. 8); -- this works fine Jonathan