comp.lang.ada
 help / color / mirror / Atom feed
* Implementing character sets for Wide_Character
@ 2015-03-06 18:01 Martin Trenkmann
  2015-03-06 18:15 ` Bob Duff
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Martin Trenkmann @ 2015-03-06 18:01 UTC (permalink / raw)


I need to implement a containment check for Wide_Character in a wide character set. I have two approaches in mind.

1. Using an array type similar to Ada.Strings.Maps.Character_Set.

   type Wide_Character_Set is array (Wide_Character) of Boolean with Pack;
   
   XYZ_Charset_Set : constant Wide_Character_Set
     := (Wide_Character'Val (100) .. Wide_Character'Val (900) => True,
         others                                               => False);

2. Using a function with a case statement.

   function Is_In_XYZ_Charset_Set (Item : Wide_Character) return Boolean is
   begin
      case Item is
         when Wide_Character'Val (100) .. Wide_Character'Val (900) => return True;
         when others                                               => return False;
      end case;
   end Is_In_XYZ_Charset_Set;


I know that using an array will consume more memory, but with the Pack aspect it should only be 8 KB - please correct me if I am wrong. The function approach is more memory friendly, but might be a bit slower as an array lookup.

Should I definitely avoid one of the solutions or is just a matter of available memory?

Thanks for your help.

Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing character sets for Wide_Character
  2015-03-06 18:01 Implementing character sets for Wide_Character Martin Trenkmann
@ 2015-03-06 18:15 ` Bob Duff
  2015-03-06 21:02   ` Martin Trenkmann
  2015-03-06 18:18 ` G.B.
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Bob Duff @ 2015-03-06 18:15 UTC (permalink / raw)


Martin Trenkmann <martin.trenkmann@posteo.de> writes:

> I need to implement a containment check for Wide_Character in a wide
> character set. I have two approaches in mind.

If you really care about speed, then implement both and measure the
speed.  It's not clear that the bit map will be faster -- using more
memory harms cache behavior.

Also take a look at package Ada.Strings.Wide_Maps.
It contains type Wide_Character_Set, represented as a sorted
sequence of ranges.  Measure that one, too.

Another possibility is an array of pointers to bit maps,
with "null" in place of "all-False".

It all depends on what kinds of sets you're going to create,
and what operations you need to do.

Why Wide_Character rather than Wide_Wide_Character?

- Bob


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing character sets for Wide_Character
  2015-03-06 18:01 Implementing character sets for Wide_Character Martin Trenkmann
  2015-03-06 18:15 ` Bob Duff
@ 2015-03-06 18:18 ` G.B.
  2015-03-06 21:06   ` Martin Trenkmann
  2015-03-06 19:24 ` Dmitry A. Kazakov
  2015-03-06 21:48 ` Jeffrey Carter
  3 siblings, 1 reply; 8+ messages in thread
From: G.B. @ 2015-03-06 18:18 UTC (permalink / raw)


On 06.03.15 19:01, Martin Trenkmann wrote:
> I know that using an array will consume more memory, but with the Pack aspect it should only be 8 KB - please correct me if I am wrong. The function approach is more memory friendly, but might be a bit slower as an array lookup.

The function's switch may well be translated into a table,
so memory consumption may be the same, albeit in a different
section of the executable. It might be even more then, since
the packing does not occur. If so, it may be possible to
prevent this by using IF.

Like the translation of the switch, speed of lookup
requires experimentation, I think, that's usually been
the only reliable predictor given an implementation.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing character sets for Wide_Character
  2015-03-06 18:01 Implementing character sets for Wide_Character Martin Trenkmann
  2015-03-06 18:15 ` Bob Duff
  2015-03-06 18:18 ` G.B.
@ 2015-03-06 19:24 ` Dmitry A. Kazakov
  2015-03-06 21:48 ` Jeffrey Carter
  3 siblings, 0 replies; 8+ messages in thread
From: Dmitry A. Kazakov @ 2015-03-06 19:24 UTC (permalink / raw)


On Fri, 06 Mar 2015 19:01:33 +0100, Martin Trenkmann wrote:

> I need to implement a containment check for Wide_Character in a wide
> character set. I have two approaches in mind.

The key question is which operations are you going to implement on the set.
E.g. set complement.

> Should I definitely avoid one of the solutions or is just a matter of
> available memory?

As Robert asked, why Wide_Character? It is pretty much useless for any
purpose. None of conventional OSes use Wide_Character (UCS-2). E.g. Windows
API W-variants use UTF-16, not UCS-2.

If you need Unicode use plain String as UTF-8. An implementation of Unicode
character sets and maps (Ada 95) is here

http://www.dmitry-kazakov.de/ada/strings_edit.htm#7.9

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing character sets for Wide_Character
  2015-03-06 18:15 ` Bob Duff
@ 2015-03-06 21:02   ` Martin Trenkmann
  0 siblings, 0 replies; 8+ messages in thread
From: Martin Trenkmann @ 2015-03-06 21:02 UTC (permalink / raw)


>> I need to implement a containment check for Wide_Character in a wide
>> character set. I have two approaches in mind.
> 
> If you really care about speed, then implement both and measure the
> speed.  It's not clear that the bit map will be faster -- using more
> memory harms cache behavior.

Yes that's true.

> Also take a look at package Ada.Strings.Wide_Maps.
> It contains type Wide_Character_Set, represented as a sorted
> sequence of ranges.  Measure that one, too.

Don't know why I overlooked that package. I don't intend to reinvent the wheel - thanks for the hint.

> Why Wide_Character rather than Wide_Wide_Character?

Because my pretty old character set I have to deal with assigns Japanese characters to 16-bit values.

- Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing character sets for Wide_Character
  2015-03-06 18:18 ` G.B.
@ 2015-03-06 21:06   ` Martin Trenkmann
  2015-03-06 22:21     ` Bob Duff
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Trenkmann @ 2015-03-06 21:06 UTC (permalink / raw)


>> I know that using an array will consume more memory, but with the Pack aspect it should only be 8 KB - please correct me if I am wrong. The function approach is more memory friendly, but might be a bit slower as an array lookup.
> 
> The function's switch may well be translated into a table,
> so memory consumption may be the same, albeit in a different
> section of the executable. It might be even more then, since
> the packing does not occur. If so, it may be possible to
> prevent this by using IF.
> 
> Like the translation of the switch, speed of lookup
> requires experimentation, I think, that's usually been
> the only reliable predictor given an implementation.

Thanks for pointing out that both methods may consume the same amount of memory. So at the end it's up to the compiler and caching effects which version runs faster. At the meantime, however, I was made aware of package Ada.Strings.Wide_Maps that I will use now.

- Martin



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing character sets for Wide_Character
  2015-03-06 18:01 Implementing character sets for Wide_Character Martin Trenkmann
                   ` (2 preceding siblings ...)
  2015-03-06 19:24 ` Dmitry A. Kazakov
@ 2015-03-06 21:48 ` Jeffrey Carter
  3 siblings, 0 replies; 8+ messages in thread
From: Jeffrey Carter @ 2015-03-06 21:48 UTC (permalink / raw)


On 03/06/2015 11:01 AM, Martin Trenkmann wrote:
> I need to implement a containment check for Wide_Character in a wide character set. I have two approaches in mind.

3.

function Is_In_XYZ_Charset_Set (Item : Wide_Character) return Boolean is
begin -- Is_In_XYZ_Charset_Set
   return Item in Wide_Character'Val (100) .. Wide_Character'Val (900);
end Is_In_XYZ_Charset_Set;

-- 
Jeff Carter
"You empty-headed animal-food-trough wiper."
Monty Python & the Holy Grail
04

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing character sets for Wide_Character
  2015-03-06 21:06   ` Martin Trenkmann
@ 2015-03-06 22:21     ` Bob Duff
  0 siblings, 0 replies; 8+ messages in thread
From: Bob Duff @ 2015-03-06 22:21 UTC (permalink / raw)


Martin Trenkmann <martin.trenkmann@posteo.de> writes:

> Thanks for pointing out that both methods may consume the same amount of
> memory. So at the end it's up to the compiler and caching effects which version
> runs faster. 

Compilers can do all sorts of things, which is one reason why it's wise
to measure.

But I doubt any Ada compiler would generate a jump table for the case
statement you showed in your OP.  That would be a 2**16-entry table with
almost all of the entries duplicated.  I'd expect something more like a
subtract, a compare, and a conditional jump.  And the function should be
inlined.

> ...At the meantime, however, I was made aware of package
> Ada.Strings.Wide_Maps that I will use now.

And keep using it, unless you discover efficiency problems.  ;-)

- Bob

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-03-06 22:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-06 18:01 Implementing character sets for Wide_Character Martin Trenkmann
2015-03-06 18:15 ` Bob Duff
2015-03-06 21:02   ` Martin Trenkmann
2015-03-06 18:18 ` G.B.
2015-03-06 21:06   ` Martin Trenkmann
2015-03-06 22:21     ` Bob Duff
2015-03-06 19:24 ` Dmitry A. Kazakov
2015-03-06 21:48 ` Jeffrey Carter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox