From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,FORGED_MUA_MOZILLA
	autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,a65bb7bde679ed1d
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.31.165 with SMTP id b5mr3713062pbi.1.1322818297090;
        Fri, 02 Dec 2011 01:31:37 -0800 (PST)
Path: 
 lh20ni57320pbb.0!nntp.google.com!news1.google.com!goblin2!goblin.stu.neva.ru!news.internetdienste.de!news.tu-darmstadt.de!news.belwue.de!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail
Date: Fri, 02 Dec 2011 10:30:11 +0100
From: Georg Bauhaus <rm.dash-bauhaus@futureapps.de>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6;
 rv:8.0) Gecko/20111105 Thunderbird/8.0
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Ann: Natools.Chunked_Strings, beta 1
References: <slrnjd9tpk.1lme.lithiumcat@sigil.instinctive.eu>
 <4ed4fc37$0$2537$ba4acef3@reader.news.orange.fr>
 <op.v5q874xcule2fv@douda-yannick>
 <ouubrb3trn06$.1jl5q3ausoy2v.dlg@40tude.net> <jb6gn0$47g$1@munin.nbi.dk>
 <de6vkicxgv4x$.1c89iragml7xf$.dlg@40tude.net>
 <op.v5t3efz4lzeukk@jellix.jlfencey.com>
 <7nz692j39hkt$.146ba4w7yczck$.dlg@40tude.net>
In-Reply-To: <7nz692j39hkt$.146ba4w7yczck$.dlg@40tude.net>
Message-ID: <4ed89aa8$0$7616$9b4e6d93@newsspool1.arcor-online.net>
Organization: Arcor
NNTP-Posting-Date: 02 Dec 2011 10:30:16 CET
NNTP-Posting-Host: 5d718ee9.newsspool1.arcor-online.net
X-Trace: 
 DXC=T[IVcfDbaJ8Tia]Ho99G50ic==]BZ:af>4Fo<]lROoR1<`=YMgDjhg2lY\DnWb[:E2PCY\c7>ejV8EnoN?K8a4O4kE4FWJRl531
X-Complaints-To: usenet-abuse@arcor.de
Xref: news1.google.com comp.lang.ada:19304
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Date: 2011-12-02T10:30:16+01:00
List-Id: <comp.lang.ada>

On 02.12.11 09:27, Dmitry A. Kazakov wrote:
> On Fri, 02 Dec 2011 00:26:29 +0100, Vinzent Hoefler wrote:
>
>> Dmitry A. Kazakov wrote:
>>
>>> This is very likely. But my concern was not performance, rather the idea of
>>> having long strings. Since long text strings do not exist in "nature"
>>> (:-)), nobody should like to have them.
>>
>> Hmm. What's the average length of a DNA-string? ;)
>
> Yes, this is what I had in mind when specifically indicated strings as
> *text* ones.
>
> DNA chain is not a text string. Furthermore it would likely have some
> specific operations and a representation tailored substring search.

I have tried this once.  Sequence information is given and is using just
a handful of characters. I mapped those to some 4bit type, even tried less.
Added lots of purportedly smart unchecked conversions, some shifting,
made my head spin by thinking about what combinations of "characters"
might suggest there could be clever additions, not shifts and the like
for obtaining info about substrings or single "characters", noted that
addition is faster than shifting or logical operations on the processor,
etc. Tried specializing searching. If there is a solution, it seems tricky.
Perhaps to be found by someone with more than ordinary combinatorial skills.
In my case this effort has produced only minuscule advantages, sometimes
the opposite, but the cost was a large number of specialized subprograms.

Then I stopped and went back to a comparatively stupid subtype of String.

There might still be an algorithm that uses the standard String type
and some fast String search, but on rearranged data: such that not just
one DNA_Character'(x) maps to same Character'(x) of the subject string, but,
if there are no more than 16 different DNA characters, such that a pair of
DNA characters from enumeration type ('A', 'C', 'G', 'T', ...) maps
to a single ordinary Character, where 'Pos gives the usual sequence of
0, 1, 2, ... So that, for example, String'('G', 'C') becomes
     Character'Val (2 * 2**4 + 1) = '!'.

Or store a few DNA characters in exact floats to be processed
in parallel with SIMD instructions of SSE on Intelian processors,
maybe. Isn't this an interesting challenge in some circles?

If someone achieves any of this using plain Ada, the solution should
create some interest in higher order languages.