comp.lang.ada
 help / color / mirror / Atom feed
* Does OpenToken support Unicode
@ 2011-12-15 14:09 mtrenkmann
  2011-12-15 15:16 ` Dmitry A. Kazakov
  2011-12-17  0:58 ` Stephen Leake
  0 siblings, 2 replies; 7+ messages in thread
From: mtrenkmann @ 2011-12-15 14:09 UTC (permalink / raw)


Hey guys,

I am a student from Bauhaus University Weimar (Germany) currently
writing my Master thesis where I implement an ASN.1 to Ada compiler
and runtime codec. For the parsing part I am using the OpenToken
library. Now, as some aspects of ASN.1 deal with Unicode I want to ask
if there is any build-in support for that or could be added by the
user in some way?

For example, can it somehow be made possible that

procedure OpenToken.Recognizer.Analyze
   (The_Token : in out Instance;
    Next_Char : in       Character;
    Verdict      : out     Analysis_Verdict) is abstract;

does support Wide_Wide_Character for the Next_Char parameter?

Thanks in advance for any advice.

Martin



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Does OpenToken support Unicode
  2011-12-15 14:09 Does OpenToken support Unicode mtrenkmann
@ 2011-12-15 15:16 ` Dmitry A. Kazakov
  2011-12-17  0:58 ` Stephen Leake
  1 sibling, 0 replies; 7+ messages in thread
From: Dmitry A. Kazakov @ 2011-12-15 15:16 UTC (permalink / raw)


On Thu, 15 Dec 2011 06:09:13 -0800 (PST), mtrenkmann wrote:

> I am a student from Bauhaus University Weimar (Germany) currently
> writing my Master thesis where I implement an ASN.1 to Ada compiler
> and runtime codec. For the parsing part I am using the OpenToken
> library. Now, as some aspects of ASN.1 deal with Unicode I want to ask
> if there is any build-in support for that or could be added by the
> user in some way?
> 
> For example, can it somehow be made possible that
> 
> procedure OpenToken.Recognizer.Analyze
>    (The_Token : in out Instance;
>     Next_Char : in       Character;
>     Verdict      : out     Analysis_Verdict) is abstract;
> 
> does support Wide_Wide_Character for the Next_Char parameter?
> 
> Thanks in advance for any advice.

I don't use OpenToken for my parsing projects, but normally encoding should
play almost no any role whatever tool you are using. Recode the input into
UTF-8, if it is not already is, and process it as if it were character
strings.

P.S. My condolences regarding ASN.1. We had a pair or two parsers/protocols
implemented, for which the documentation was in ASN.1. I still remember how
dreadful it was.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Does OpenToken support Unicode
  2011-12-15 14:09 Does OpenToken support Unicode mtrenkmann
  2011-12-15 15:16 ` Dmitry A. Kazakov
@ 2011-12-17  0:58 ` Stephen Leake
  2012-01-23 22:03   ` mtrenkmann
  2012-01-23 22:48   ` mtrenkmann
  1 sibling, 2 replies; 7+ messages in thread
From: Stephen Leake @ 2011-12-17  0:58 UTC (permalink / raw)


mtrenkmann <martin.trenkmann@googlemail.com> writes:

> For example, can it somehow be made possible that
>
> procedure OpenToken.Recognizer.Analyze
>    (The_Token : in out Instance;
>     Next_Char : in       Character;
>     Verdict      : out     Analysis_Verdict) is abstract;
>
> does support Wide_Wide_Character for the Next_Char parameter?

Copy the package, add _Wide_Wide to the name, change all occurences of
Character to Wide_Wide_Character, and compile. It will probably fail, so
iterate until you've modified all the necessary packages.

If you are ambitious, instead of copy and edit, change them into
generics on the character type.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Does OpenToken support Unicode
  2011-12-17  0:58 ` Stephen Leake
@ 2012-01-23 22:03   ` mtrenkmann
  2012-01-23 22:48   ` mtrenkmann
  1 sibling, 0 replies; 7+ messages in thread
From: mtrenkmann @ 2012-01-23 22:03 UTC (permalink / raw)


Just for closing this thread, here is what I have done.

Beginning at the Text_Feeder level I changed all occurences of
Character/String variables that are involved in storing parsing data
(buffers, lexemes, etc) to the Wide_Wide_Character/Wide_Wide_String
type.

Then I provided a derivation of Text_Feeder that reads UTF-8 (multi-
byte) characters from Ada.Text_IO and decode them into
Wide_Wide_Characters. The decoding is currently based on
System.WCh_Con (GNAT).

As mentioned by Stephe I also tried to implement a generic solution
regarding the character type, but that wasn't completely possible. For
instance in the top-level OpenToken package there are constants for
EOL and EOF that are of type Character. Text_Feeder.Text_IO uses
Ada.Text_IO.Get_Line which is not generic. Furthermore, as far as I
know, Ada exceptions cannot carry Wide_Wide_Strings to report the
lexemes of unexpected/unrecognized tokens ...

To support constants and non-generic Ada procedures one has to turn
them into formal parameters of generic OpenToken packages, right?
Maybe this could end in an generics instantiation nightmare. This let
me come to the question why in Ada are some packages prefixed with
Wide_Wide_ and not generic. (Sorry for this question, but a come from
the C++ universe.)

Ok, thanks again for your previous hints. If there is any interest I
will provide the modified OpenToken code with UTF-8 support after
finishing my thesis.

-- Martin



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Does OpenToken support Unicode
  2011-12-17  0:58 ` Stephen Leake
  2012-01-23 22:03   ` mtrenkmann
@ 2012-01-23 22:48   ` mtrenkmann
  2012-01-24 10:40     ` Georg Bauhaus
  2012-01-24 13:47     ` Stephen Leake
  1 sibling, 2 replies; 7+ messages in thread
From: mtrenkmann @ 2012-01-23 22:48 UTC (permalink / raw)


Just for closing this thread, here is what I have done.

Beginning at the Text_Feeder level I changed all occurences of
Character/String variables that are involved in storing parsing data
(buffers, lexemes, etc) to the Wide_Wide_Character/Wide_Wide_String
type.

Then I provided a derivation of Text_Feeder that read UTF-8
(multibyte) characters from Ada.Text_IO and decode them into
Wide_Wide_Characters. The decoding is currently based on
System.WCh_Con (GNAT).

As mentioned by Stephe I also tried to implement a generic solution
regarding the character type, but that wasn't completely possible. For
instance in the top-level OpenToken package there are constants for
EOL and EOF that are of type Character. Text_Feeder.Text_IO uses
Ada.Text_IO.Get_Line which is not generic. Furthermore, as far as I
know, Ada exceptions cannot carry Wide_Wide_Strings to report the
lexemes of unexpected tokens ...

To support constants and non-generic Ada procedures one has to turn
them into formal parameters of generic OpenToken packages, right?
Maybe this could end in an generics instantiation nightmare. This let
me come to the question why in Ada are some packages prefixed with
Wide_Wide_ and not generic. (Sorry for this question, but a come from
the C++ universe.)

Ok, thanks again for your previous hints. If there is any interest I
will provide the modified OpenToken code with UTF-8 support after
finishing my thesis.

-- Martin



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Does OpenToken support Unicode
  2012-01-23 22:48   ` mtrenkmann
@ 2012-01-24 10:40     ` Georg Bauhaus
  2012-01-24 13:47     ` Stephen Leake
  1 sibling, 0 replies; 7+ messages in thread
From: Georg Bauhaus @ 2012-01-24 10:40 UTC (permalink / raw)


On 23.01.12 23:48, mtrenkmann wrote:

> To support constants and non-generic Ada procedures one has to turn
> them into formal parameters of generic OpenToken packages, right?
> Maybe this could end in an generics instantiation nightmare. This let
> me come to the question why in Ada are some packages prefixed with
> Wide_Wide_ and not generic. (Sorry for this question, but a come from
> the C++ universe.)

The question (not the first time someone asks) seems quite justified
by Ada, since it does have such generic standard packages, that is,
packages that feature a formal of a scalar type, such as
Generic_Elementary_Functions, or the traditional I/O packages.

One answer is backwards compatibility business, IIRC.
Guessing, it might not have felt nice to have both Character
packages and generic character packages; using a 1-1 onto
correspondence to replace packages with instances of packages
was perhaps still considered a cause of forced modification of
working systems...

For today, and new systems, maybe using streams might lead
to a solution that can be adapted to some abstract character type.

There is some overlap between containers' subprograms and string
subprograms, which might be another path.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Does OpenToken support Unicode
  2012-01-23 22:48   ` mtrenkmann
  2012-01-24 10:40     ` Georg Bauhaus
@ 2012-01-24 13:47     ` Stephen Leake
  1 sibling, 0 replies; 7+ messages in thread
From: Stephen Leake @ 2012-01-24 13:47 UTC (permalink / raw)


mtrenkmann <martin.trenkmann@googlemail.com> writes:

> Just for closing this thread, here is what I have done.

Thanks for the update.

> Beginning at the Text_Feeder level I changed all occurences of
> Character/String variables that are involved in storing parsing data
> (buffers, lexemes, etc) to the Wide_Wide_Character/Wide_Wide_String
> type.
>
> Then I provided a derivation of Text_Feeder that read UTF-8
> (multibyte) characters from Ada.Text_IO and decode them into
> Wide_Wide_Characters. The decoding is currently based on
> System.WCh_Con (GNAT).
>
> As mentioned by Stephe I also tried to implement a generic solution
> regarding the character type, but that wasn't completely possible. For
> instance in the top-level OpenToken package there are constants for
> EOL and EOF that are of type Character. 

Yes, that's an annoying hack. You could try moving them down lower.

> Text_Feeder.Text_IO uses Ada.Text_IO.Get_Line which is not generic.

You'd have to write a generic wrapper for Ada.Text_IO. That might be
useful in other contexts, but it is a lot of work.

> Furthermore, as far as I know, Ada exceptions cannot carry
> Wide_Wide_Strings to report the lexemes of unexpected tokens ...

True, but they can carry UTF-8.

> To support constants and non-generic Ada procedures one has to turn
> them into formal parameters of generic OpenToken packages, right?

Right.

> Maybe this could end in an generics instantiation nightmare. 

Well, complicated anyway :).

> This let me come to the question why in Ada are some packages prefixed
> with Wide_Wide_ and not generic. (Sorry for this question, but a come
> from the C++ universe.)

Good point. For example, Elementary_Functions is generic, and
instantiations are provided for the various float types.

There may be a problem with the functions that convert to other string
types, but those could be moved to child packages.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-01-24 13:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-15 14:09 Does OpenToken support Unicode mtrenkmann
2011-12-15 15:16 ` Dmitry A. Kazakov
2011-12-17  0:58 ` Stephen Leake
2012-01-23 22:03   ` mtrenkmann
2012-01-23 22:48   ` mtrenkmann
2012-01-24 10:40     ` Georg Bauhaus
2012-01-24 13:47     ` Stephen Leake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox