comp.lang.ada
 help / color / mirror / Atom feed
From: Georg Bauhaus <rm.dash-bauhaus@futureapps.de>
Subject: Re: Ann: Natools.Chunked_Strings, beta 1
Date: Fri, 02 Dec 2011 10:30:11 +0100
Date: 2011-12-02T10:30:16+01:00	[thread overview]
Message-ID: <4ed89aa8$0$7616$9b4e6d93@newsspool1.arcor-online.net> (raw)
In-Reply-To: <7nz692j39hkt$.146ba4w7yczck$.dlg@40tude.net>

On 02.12.11 09:27, Dmitry A. Kazakov wrote:
> On Fri, 02 Dec 2011 00:26:29 +0100, Vinzent Hoefler wrote:
>
>> Dmitry A. Kazakov wrote:
>>
>>> This is very likely. But my concern was not performance, rather the idea of
>>> having long strings. Since long text strings do not exist in "nature"
>>> (:-)), nobody should like to have them.
>>
>> Hmm. What's the average length of a DNA-string? ;)
>
> Yes, this is what I had in mind when specifically indicated strings as
> *text* ones.
>
> DNA chain is not a text string. Furthermore it would likely have some
> specific operations and a representation tailored substring search.

I have tried this once.  Sequence information is given and is using just
a handful of characters. I mapped those to some 4bit type, even tried less.
Added lots of purportedly smart unchecked conversions, some shifting,
made my head spin by thinking about what combinations of "characters"
might suggest there could be clever additions, not shifts and the like
for obtaining info about substrings or single "characters", noted that
addition is faster than shifting or logical operations on the processor,
etc. Tried specializing searching. If there is a solution, it seems tricky.
Perhaps to be found by someone with more than ordinary combinatorial skills.
In my case this effort has produced only minuscule advantages, sometimes
the opposite, but the cost was a large number of specialized subprograms.

Then I stopped and went back to a comparatively stupid subtype of String.

There might still be an algorithm that uses the standard String type
and some fast String search, but on rearranged data: such that not just
one DNA_Character'(x) maps to same Character'(x) of the subject string, but,
if there are no more than 16 different DNA characters, such that a pair of
DNA characters from enumeration type ('A', 'C', 'G', 'T', ...) maps
to a single ordinary Character, where 'Pos gives the usual sequence of
0, 1, 2, ... So that, for example, String'('G', 'C') becomes
     Character'Val (2 * 2**4 + 1) = '!'.

Or store a few DNA characters in exact floats to be processed
in parallel with SIMD instructions of SSE on Intelian processors,
maybe. Isn't this an interesting challenge in some circles?

If someone achieves any of this using plain Ada, the solution should
create some interest in higher order languages.



  reply	other threads:[~2011-12-02  9:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-29 15:16 Ann: Natools.Chunked_Strings, beta 1 Natasha Kerensikova
2011-11-29 15:37 ` Pascal Obry
2011-11-29 16:34   ` Natasha Kerensikova
2011-11-29 17:08     ` Georg Bauhaus
2011-11-30  9:51       ` Natasha Kerensikova
2011-11-29 20:25     ` Randy Brukardt
2011-11-30 10:44     ` Yannick Duchêne (Hibou57)
2011-11-30 10:39   ` Yannick Duchêne (Hibou57)
2011-11-30 10:57     ` Dmitry A. Kazakov
2011-12-01  0:11       ` Randy Brukardt
2011-12-01  8:30         ` Dmitry A. Kazakov
2011-12-01 23:26           ` Vinzent Hoefler
2011-12-02  8:27             ` Dmitry A. Kazakov
2011-12-02  9:30               ` Georg Bauhaus [this message]
2011-12-02 13:11                 ` Dmitry A. Kazakov
2011-12-02  0:39           ` Randy Brukardt
2011-12-01  9:02         ` Yannick Duchêne (Hibou57)
2011-11-30 13:08     ` Natasha Kerensikova
2011-11-30 19:39       ` Jeffrey Carter
2011-12-01 10:57         ` Natasha Kerensikova
2011-12-01 19:07           ` Jeffrey Carter
2011-12-01 21:19             ` Yannick Duchêne (Hibou57)
2011-12-01 22:49               ` Natasha Kerensikova
2011-12-02 16:16         ` Tero Koskinen
2011-12-02 17:36           ` Adam Beneschan
2011-12-02 18:52             ` Tero Koskinen
2011-12-02 18:14           ` Yannick Duchêne (Hibou57)
2011-12-02 19:07             ` Adam Beneschan
2011-11-30 10:33 ` Yannick Duchêne (Hibou57)
2011-11-30 11:04   ` Natasha Kerensikova
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox