Equivalent keys/elements in Ada.Containers Maps and Sets

comp.lang.ada
 help / color / mirror / Atom feed

From: Niklas Holsti <niklas.holsti@nospam.please>
Subject: Equivalent keys/elements in Ada.Containers Maps and Sets
Date: Tue, 23 Jan 2007 14:56:16 +0200
Date: 2007-01-23T14:56:16+02:00	[thread overview]
Message-ID: <45b60602$0$24602$39db0f71@news.song.fi> (raw)

To learn about Ada.Containers in Ada 2005 I have been reading the LRM 
section A.18 and John Barnes' Rationale. Some details about Maps and 
Sets are puzzling me enough to make me consider submitting a couple of 
comments to ada-auth. As I'm a novice in Ada.Containers I thought I 
would first ask here, on comp.lang.ada, in case I am misunderstanding 
things or raking up issues that have been discussed before. I looked 
through the Containers-related Ada Issues for Ada 2005 on ada-auth.org 
but found none that seemed relevant.

The points I want to ask about are so technical that I felt it necessary 
to make this posting rather long and detailed.

-- Background

The generic container packages for Maps and Sets deal with a concept 
called "equivalent keys" or "equivalent elements". This equivalence is 
called an "equivalence relation" in RM A.18.4(5/2) for Maps and RM 
A.18.7(5/2) for Sets. An equivalence relation is defined as a reflexive, 
symmetric and transitive relation: any key or element X is equivalent to 
itself; if X is equivalent to Y then Y is equivalent to X; and if X and 
Y are equivalent and Y and Z are equivalent then X and Z are equivalent.

I think I understand the idea, which is that two different values of the 
formal types Key_Type (for Maps) or Element_Type (for Sets) may still be 
similar enough (that is, "equivalent") to count as the same key or the 
same element when considered as (unique) keys to a map or as (unique) 
elements of a set. This is analogous to user-defined relational 
operators that only inspect meaningful components of their parameters.

-- First problem: "equivalence" in Ordered Maps and Sets

For hashed Maps and Sets, the actual "equivalence" is directly defined 
at instantiation time by the actual functions associated to the formal 
functions Equivalent_Keys (for Hashed_Maps) and Equivalent_Elements (for 
Hashed_Sets). The actual functions are required to be equivalence 
relations in A.18.5(44/2) and A.18.8(66/2). I have no problem with that.

For the Ordered_Maps and Ordered_Sets, however, the "equivalence" is 
defined *indirectly* by the actual ordering functions associated to the 
formal functions "<" on Keys or Elements, respectively, at instantiation 
time. Two keys/elements X and Y are defined to be "equivalent" if they 
are incomparable, that is, if both X < Y and Y < X are False. This is in 
A.18.6(55/2) and A.18.9(78/2).

The first problem I see is that the "equivalence" relation defined in 
this way from "<" is *not* necessarily an equivalence relation because 
it may not be transitive. It is obviously symmetric because "and" is 
commutative. It is reflexive because the actual for "<" is required to 
be a strict ordering and thus irreflexive. However, one can easily make 
an example of a strict ordering "<" in which the "incomparable" relation 
is not transitive; say we have three values X, Y, Z where the only 
relation is X < Z; thus X and Y are incomparable and Y and Z are 
incomparable, but X and Z are not incomparable.

So what? What if this "equivalence" is not transitive and is thus not 
really an equivalence (without quotes)? Well, firstly I think it is 
quite confusing to call something an "equivalence" when it is not 
transitive, but secondly the semantics of the Map and Set operations are 
defined in terms of "equivalent" keys or elements, and this can lead to 
surprising results if the "equivalence" relation is not transitive.

For example, by A.18.7(22/2) the function Equivalent_Sets reports two 
sets S1 and S2 as equivalent if they have the same number of elements 
and if for every element E1 of S1 there is an element E2 of S2 such that 
E1 and E2 are "equivalent". (Note the asymmetry of this definition: it 
"traverses" S1 and "searches" for equivalent elements in S2.)

Now, if "equivalence" of elements is not transitive, we can have two 
sets S1 and S2 such that Equivalent_Sets (S1, S2) is True even if S2 
contains an element that is not "equivalent" to any element in S1.

This strange result is possible because two different elements of S1 can 
be "equivalent" to the *same* element of S2, leaving one or more 
elements of S2 unpaired with any "equivalent" elements of S1.

A consequence is that Equivalent_Sets is not necessarily symmetric; it 
can return True for parameters (S1, S2) but False for (S2, S1).

If "equivalence" of elements is transitive, this strange result cannot 
happen. If two different elements E1 and E1' of S1 are "equivalent" to 
the same element E2 of S2, then by transitivity E1 and E1' are 
"equivalent" to each other, which is impossible because no Set can 
contain two or more "equivalent" elements, by A.18.6(5/2).

I haven't looked for similar strange results from the other operations 
because I think that the case of Equivalent_Sets is strange enough.

There is also a problem of definition. The definition of the "=" 
operator for Maps in A.18.4(22/2, 23/2) implicitly assumes that each key 
in the Left map is equivalent to at most one key in the Right map. This 
assumption is false if "equivalence" of keys is not transitive; then 
there may be many "equivalent" keys in the Right map that could be 
matched against the same key in the Left map, so the definition of "=" 
for Maps is ambiguous.

To avoid this problem I think that the RM paragraphs that specify the 
properties required or expected of the actual "<" functions should be 
extended to say that the "incomparable" relation must be transitive. 
This is A.18.6(56/2) and A.18.9(79/2).

-- Second problem: "=" for Element_Type in Sets

All generic packages for Maps and Sets take a generic formal "=" 
operator for the Element_Type. According to the Annotated LRM it is 
meant to be used only in the "=" operator for Maps and Sets. The actual 
"=" function is required to be deterministic, symmetric and reflexive. 
This is all understandable and natural.

For Sets, however, I find it puzzling that no relationship is required 
or expected between this "=" and the other formal functions 
Equivalent_Elements, Hash and "<". For example, in the case of two 
elements E1 and E2 in an Ordered_Set it would be very natural to require 
that if E1 "=" E2 then we cannot also have E1 "<" E2 or E2 "<" E1, in 
other words we should have E1 "equivalent" to E2.

One point where this may be significant is the definition of the "=" 
operator on Sets in A.18.7(20/2). The definition requires each element 
of the Left set to be compared for "=" with the elements of the Right 
set. If the actual "=" function is unrelated to the Hash function or to 
the "<" operator for elements, I don't see how the "=" operator on Sets 
can be implemented in less than O(N**2) complexity.

Moreover, the "=" operator for Sets as now defined can report two sets 
S1 and S2 as "=" although S2 contains elements that are not "=" to any 
elements of S1. This is because the definition is asymmetric (it 
"traverses" S1 and "searches" S2), so two different elements of S1 can 
be "=" to the same element of S2, leaving some elements of S2 unmatched 
with elements of S1.

I think the definition of "=" for Sets expects the implementation to 
compare only "equivalent" elements of the Left and Right Sets. If 
"equivalence" is transitive this means that each element of Left can be 
compared with "=" to exactly one element of Right (or, for Hashed_Sets, 
to the elements in one bucket, if the implementation prefers to use "=" 
directly and not first check Equivalent_Elements).

I know that there are no RM requirements or advice on the complexity of 
the "=" operator for Sets, but under the current rules I think that the 
natural, fast implementation that only compares the equivalent elements 
is wrong.

To solve this problem I think that A.18.6(3/2) should be extended to 
also require or expect that two "=" elements are "equivalent".

For Hashed_Sets this means that if E1 "=" E2 then 
Equivalent_Elements(E1,E2) must be True. A consequence is that Hash(E1) 
= Hash(E2).

For Ordered_Sets this means that if E1 "=" E2 then both E1 "<" E2 and
E2 "<" E1 must be False.

-- Last words

Should I submit these comments to ada-comment@ada-auth.org? Or were the 
issues already considered in the design of Ada.Containers? I would be 
grateful for your advice.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

next             reply	other threads:[~2007-01-23 12:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-23 12:56 Niklas Holsti [this message]
2007-01-23 15:45 ` Equivalent keys/elements in Ada.Containers Maps and Sets Matthew Heaney
2007-01-23 16:47   ` Matthew Heaney
2007-01-23 23:04   ` Niklas Holsti
2007-01-24  0:43     ` Randy Brukardt
2007-01-24  7:52       ` Niklas Holsti
2007-01-24 21:06         ` Randy Brukardt
2007-01-24 16:50       ` Matthew Heaney
2007-01-24 21:18         ` Randy Brukardt
2007-01-24 21:27           ` Adam Beneschan
2007-01-24 22:50             ` Matthew Heaney
2007-01-25  2:22               ` Adam Beneschan
2007-01-26  0:08                 ` Georg Bauhaus
2007-01-24 20:56       ` Adam Beneschan
2007-01-24 20:57         ` Adam Beneschan
2007-01-24 22:56         ` Matthew Heaney
2007-01-25  0:44           ` Randy Brukardt
2007-01-24 16:11     ` Matthew Heaney
2007-01-24 18:12       ` Niklas Holsti

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox