From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,99222a5bd46ef3c9,start
X-Google-Attributes: gid103376,public
From: smize@news.imagin.net (Samuel Mize)
Subject: GOTO considered necessary (reworked)
Date: 1997/06/11
Message-ID: <5nn2fm$11dk$1@prime.imagin.net>
X-Deja-AN: 247729316
Organization: ImagiNet Communications Ltd, Arlington, Texas
Reply-To: smize@imagin.net (Samuel Mize)
Newsgroups: comp.lang.ada
Date: 1997-06-11T00:00:00+00:00
List-Id: <comp.lang.ada>


Greetings,

[-------------------------------------------------------------------]
[ I'm re-doing this since a lot of people were apparently at an Ada ]
[ conference recently, and weren't reading net news.  This version  ]
[ is updated based on previous comments.  -- Mize                   ]
[-------------------------------------------------------------------]

I'm rebeating a dead horse, in order to write up a consensus paper to
submit to adahome.  I'll appreciate your input.

This one concerns use of "goto" statements.

I have scanned through the back c.l.a messages in DejaNews.  I request
your comments on this summary.

Samuel Mize

                        USE OF "GOTO" IN ADA
                        --------------------

1.0 SUMMARY:
------------

a) In general, "goto" statements should be avoided.  They create confusing
   execution flow structures -- at first, or after a few maintenance changes.
   They also reduce opportunities for compiler optimization of code.

b) As a rule of thumb, if your code needs a "goto" to work efficiently
   (or at all), you may have overlooked a simpler, better design.

c) Exceptions to "goto" avoidance:

   c1) "Goto"s can be used to create a well-defined code structure that is
       not supported in the language.  Examples are in Section 2.0.

   c2) "Goto"s are acceptable in machine-generated code, IF a human is
       NEVER expected to read the code.  (For example, a module generated
       by a tool from data-flow diagrams.)

   c3) Finite State Machines (FMSs) are often implemented with "goto"s.
       These are also called Finite State Automatons (FSAs).  This is
       discussed in detail in Section 3.0.  These are used in compiler
       theory, business processes, command and control, ASIC or other
       custom chips.

   c4) Other usages are suspect, but may be OK.  If a "goto" really
       improves clarity, use it -- but document WHY you used it, and
       why other approaches were found lacking.  Look at your design at
       a higher level to see if you can eliminate "goto" usage with a
       clearer design approach -- the advantage here, of course, is not
       the mechanical elimination of the "goto", but the clearer design.
       See section 4.0, and the example in section 5.0.

d) Eliminating a "goto" may prevent maintenance errors in the future (for
   instance, code being put on the wrong side of a label).  However, it also
   may not be worth the effort to redesign without the "goto."  Using an
   extraneous state variable or other kludge is no improvement on a "goto".

e) Norm Cohen recommended Knuth's article in the December 1974 Computing
   Surveys as "the best discussion I've ever seen of the goto issue."
   Other respected Ada developers have concurred.

f) The shared Ada culture avoids "goto" statements.  Gellerich et al examined
   34428 files containing 8.5 million lines of code, coded by many projects,
   institutions, and programmers.  They found a "goto" density of 1 "goto" per
   13614 lines of code.  They excluded some directly-translated Fortran, and
   some code written to validate Ada compilers (which must test "goto").
   ["Where does GOTO go to?", Gellerich Kosiol and Erhard, U of Stuttgart,
   http://www.informatik.uni-stuttgart.de/ifi/ps/Gellerich/publications.html]


2.0 UNSUPPORTED CODE STRUCTURES
-------------------------------
The central feature of structured programming is not some specific set
of code structures, but the fact that well-defined structures are used
to organize the code.  Arbitrary "goto"s are avoided; transfers of
control occur as part of a larger, coherent structure that helps the
reader understand the overall program's activity.

Modern languages like Ada provide a large set of useful structures: "if"
statements (supporting "then", "elsif", and "else" clauses), "case"
statements, "while" and "for" loops, etc.

However, there are some useful structures that are not directly supported.
This will always be true.  Some of these structures are rarely used, so
are not worth the cost to add into the compiler; others may be invented,
or may be domain-specific.

It is generally acceptable to use a "goto" to implement a well-defined
code structure that is not directly implemented in the language.

For instance, the Ada gurus at ACT use "goto" in the GNAT compiler.  On the
other hand, it occurs in only 3 places (in 3.07), and all are instances of
the "not-found" pattern shown below.

Many recommend using "flag" comments to identify jumps that are not part of
a structure.  This includes "return" and "exit" statements, as well as "goto"
statements.  Thus, a loop might look like:

   loop
      ...
      exit when A; -------------------EXIT
      ...
      if B then
         return; ---------------------RETURN
      ...
      if C then
         goto CONTINUE; --------------GOTO
      ...
      <<CONTINUE>>
   end loop;

For example, here are three structures that require "goto" in Ada:

a) Process continuation logic

   A sequential process may have several natural completion points.  One
   example is parsing a string that ends with optional items.  Avoiding
   "goto" in this case requires increasing levels of "if" indentation:

      [process first part of string]
      if [not end-of-line] then
         [process option1]

         if [not end-of-line] then
            [process option2]

            if [not end-of-line] then
               [process option3]

               if [not end-of-line] then
                  [process option4]
                   ...
               endif;
            endif;
         endif;
      endif;

   This is hard to read, and confusing since all the parsing is really at
   the same logical level.  A "completed" flag may also be used, but this
   does not usually improve clarity over the "goto" solution:

      [process first part of string]
      if [end-of-line] then
         goto NO_MORE_OPTIONS;
      end if;

      [process option1]
      if [end-of-line] then
         goto NO_MORE_OPTIONS;
      end if;

      [process option2]
      if [end-of-line] then
         goto NO_MORE_OPTIONS;
      end if;

      [process option3]
      if [end-of-line] then
         goto NO_MORE_OPTIONS;
      end if;
      [process option4]
      ...
      <<NO_MORE_OPTIONS>>

   The same effect can be achieved by putting the processing sequence into a
   block statement and raising an exception on end-of-line.  However, this
   is generally considered even worse than the "goto" solution.  Most
   standards limit exceptions to error conditions, not normal processing.

b) "Continue" loop logic

   The "continue" function of C has to be emulated in Ada with a "goto":

      loop
        ...
        goto <<CONTINUE>>
        ...
        <<CONTINUE>>
      end loop;

   This is a special case of the process continuation structure.

c) "Not-Found" loop logic

   Loops frequently scan through a sequence -- for instance, an array --
   followed by special processing if the item was not found.  An example
   would be inserting an item in an ordered array:

      Items: array (Positive range Lo .. Hi) of Item;
      Last_Item: Natural := Items'First - 1;
      ...
      -- for each item to be inserted

         for I in Items'First .. Last_Item loop
            -- duplicates are allowed, and inserted as separate items
            if Items (I) >= New_Item then
               Items (I + 1 .. Last_Item + 1) := Items (I .. Last_Item);
               Items (I) := New_Item;
               goto ITEM_INSERTED;
            end if;
         end loop;

         -- New_Item's place not found before end of array
         Last_Item := Last_Item + 1;
         Items (Last_Item) := Item;

         <<ITEM_INSERTED>>
         ...

   Avoiding "goto" in this case requires one of these approaches:

    - A "found" boolean.  When element is processed in loop, Found is set
      and loop exits.  No-element processing is skipped when Found is true.

    - An "Index" variable, initially set past the end of the array, and reset
      in the loop if the element is found.  Processing is always done after
      loop exit, based on this index.

    - Last-iteration detection in the loop.

    Each of these is also unclear in its own way.


3.0 FINITE STATE MACHINES (FSMs)
--------------------------------
An FSM is defined by a set of states.  In each state, some processing may
occur; then the abstract "machine" selects a state to transition into (this
may be the same state again).

Some developers find that "goto"s improve FSM clarity, as they more directly
reflect the FSM being modelled.  Other developers prefer a case statement
inside a loop.

The "goto" implementation generally looks like:


    <<State_1>> -- A label like this exists for each state of the machine.
       declare
          [local variables]
       begin
          [processing for state 1]

          if [condition] then        -- A variable holding the "current"
             goto [State_X];         -- state is not needed -- it is
          elsif [condition] then     -- represented by the current place
             goto [State_Y];         -- of executiion.
          else
             goto [State_Z];
          end if;
       end;

       -- Defensive programming: ensures that the machine does
       -- not accidentally "fall through" to the next state.
       raise No_Transition_Occurred; 
                                       
    <<State2>>
       ...

The "loop-and-case" alternative would be:

    type State is (Initial_State, State1, ... Staten);
    Current_State: State;
   
    -- variables that may affect the next transition
    ...

    function Next_State (Current_State: State) return State;
   
    loop
       exit when
         (Current_State = [a terminal state])
         or 
         (Current_State = [another terminal state])
             ...;
   
       -------------------------------------------------------- AAA

       case Current_State is
          when State_1 =>
             [Processing for state 1]
             Current_State := Next_State (Current_State);
          ...
       end case;

       if Debug then
          Text_Io.Put_Line (State'Image (Current_State)); ----- BBB
       end if;

    end loop;

Next_State can use an if/elsif cascade, or an array or hash-table lookup,
or whatever is appropriate for the application-specific transition table.

Different developers will argue for the clarity of either approach.

Aside from clarity, the "goto" design has these advantages:

a) The transition logic is separate from the processing logic for each state.
   This may be good or bad, depending on how distinct these naturally are.
   If they are naturally separate, having transition logic gathered into a
   separate function is clearer.  If they are intertwined, the transition
   function will duplicate much of the logic from the processing.

b) It is more efficient.  Instead of looping and and evaluating a case
   expression, it jumps directly to the next state.

c) It may be easier to locate accidentally "killed" states: a
   cross-reference tool could identify unused statement labels.

Aside from clarity, the "loop-and-case" design has these advantages:

a) The transition logic is separate from the processing logic for each state.

b) There is a place to put processing that must occur at the start or end of
   each state (note lines AAA and BBB).

c) The possible states are listed in an enumeration, usable for I/O or to
   represent a "memory" of previous states, if needed.

Adding or deleting a state is about equally hard for both designs.  It is
harder with the case structure because you have to edit in two locations,
keeping the type definition and the case structure synchronized.  It is
harder with the "goto" structure because the logic that jumps to the new
(or deleted) state is less localized -- you have to hunt through all the
code for the "goto"s.  This seems to be a wash.

The final decision must be made by the developer, according to clarity or
other advantages for the specific application, within the constraints of
the project and the overall design.


4.0 IMPROVING A "GOTO"-BASED DESIGN
---------------------------------
The existence of "goto"s in code creates some purely mechanical problems for
maintenance programmers.  New code may be added on the wrong side of a "goto"
target, and "goto"-based structures are often harder to maintain when
intervening code changes significantly.  This creates a small but definite
advantage to eliminating many "goto" statements.

However, once a "goto"-based design has been coded, it may not be worth the
effort to redesign it to eliminate the "goto."  This is especially the case
if the "goto" is close to its target, and it is unlikely that other code
would be inserted between them.

In many cases, use of a "goto" indicates that a clearer, simpler design may
be available.  This is especially the case if the "goto" is not part of a
clearly-defined design structure.  The advantage to eliminating THESE "goto"
statements is not the mechanical elimination of a "goto", but the clearer
design that results.

Extraneous state variables or other unclear design kludges do not improve on
a "goto".  If there is no overall design improvement that replaces a "goto"
with a cleaner structure, you may as well leave the "goto".  For instance,
the code in section 5.0 would not be improved by mechanically replacing the
"goto" by adding a "Done" boolean or some other kludge -- this would simply
trade one unclear design for another.

The critical point is that a program should be built in discrete sections,
each of which accomplishes a clear task.  These sections should be defined
and separated by control-flow structures that show their relationship.  This
lets you (or a later maintenance programmer) define clearly in what state the
program will be when it reaches a given point, and reason about what the
section of code is supposed to be doing.  Formally defined approaches to this
kind of design are called "structured programming."

For instance, you should be able to say what a loop does each iteration, and
what it has done once it exits.  (If you like semi-mathematical jargon, it
needs an invariant and a termination condition).

So, IF (1) it's worth the effort to review a "goto"-based design, and
(2) the code can be reorganized into a clearer structure, then you should
do so to eliminate the "goto".  If a "goto" is needed to create the
appropriate structure for your code, use it -- and document WHY you used it.

The best approach to eliminating "goto" from your code is to write code in
the first place that divides the task into logical chunks, organized by
simple control structures.  If you design your overall approach carefully,
you will only use "goto" when it really does improve your code.

5.0 EXAMPLE OF "GOTO"-REDUCING DESIGN IMPROVEMENT
-----------------------------------------------
Some code was posted to the net (quoting an example by Knuth) as one example
of a "goto" being a reasonable design decision.  It appears at first to be an
example of "not-found" loop logic.  However, a little more analysis reveals a
non-"goto" design that is clearer.

Again, the advantage to eliminating these "goto" statements is not the
mechanical elimination of a "goto", but the clearer design that results.

The code as given (with minor corrections) was: - - - - - - - - - - - - -

   type Node_Type;

   type Tree_Type is access Node_Type;

   type Node_Type is
      record
         Count       : Natural := 0;
         Left, Right : Tree_Type;
      end record;

   type Bit_Vector_Index is Natural range 0 .. Natural'Last - 1;
   type Bit_Vector_Type is array (Bit_Vector_Index) of Boolean;

   type Bit_Vector_Pointer_Type is access Bit_Vector_Type;

   Root : Tree_Type := new Node_Type;
   BV   : Bit_Vector_Pointer_Type;
   Bit  : Bit_Vector_Index;

-- BV points to a bit vector describing a path (False = left, True = right)
-- through the tree pointed to by Root.  We are counting the number of times
-- a given bit vector is encountered in the Count component of the
-- corresponding tree node.  Root starts out pointing to a tree consisting
-- only of a root node, and new nodes are added to the tree as needed.  The
-- loop below has three exits, and the actions appropriate when one of the
-- exits is taken is different from the actions appropriate when the other
-- two are taken, so a goto comes in handy.  Here's the code:

      Node := Root;
      Bit := BV'First;
      loop
         if Bit > BV'Last then
            goto Increment;
         end if;
         if BV(Bit) then
            if Node.Left = null then
               Node.Left := new Node;
               Node := Node.Left;
               exit;
            else
               Node := Node.Left;
            end if;
         else
            if Node.Right = null then
               Node.Right := new Node;
               Node := Node.Right;
               exit;
            else
               Node := Node.Right;
            end if;
         end if;
         Bit := Bit + 1;
      end loop;

      -- We've reached a leaf.  Read the rest of BV.all while extending
      --    the tree along the corresponding path only.
      for I in Bit+1 .. BV'Last loop
         if BV(I) then
            Node.Left := new Node;
            Node := Node.Left;
         else
            Node.Right := new Node;
            Node := Node.Right;
         end if;
      end loop;

   <<Increment>>
      Node.Count := Node.Count + 1;


It would be clearer to just let the first loop do all the work.  Eliminate
the "exit" statements and turn it into a "for" loop over the array bounds.
This works, but it checks for "null" in nodes that were just created.  If
this makes a significant performance difference for your application, you can
hand-optimize the code.  (The compiler probably can't do this optimization.)

First, we could simply replace the "goto" with an exit statement.  The loop
parameter specification would compute a null range and skip the loop.

The end processing could also be enclosed in an "if" statement to skip it if
the entire boolean array were processed.

However, this would still be less readable, since nodes are created inside
two different loops.  Instead, let's use two clearly-defined loops:

    -- Move to the last existing node on the path
    FOR all existing nodes on path
        move to node

    -- Create any new nodes
    FOR all new nodes on path
        create the node

The new code is: - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

     -- Declarations the same
   begin

      Node := Root;
      Bit := BV'First;

      ---------------------------------------------
      -- Move to the connect-node for the first  --
      -- node that needs to be created (if any). --
      ---------------------------------------------
      -- We are manually simulating a "for" loop to
      -- retain the index after loop termination.
      loop
         exit when Bit > BV'Last;

         if BV(Bit) then
            if Node.Left = null then
               exit;
            else
               Node := Node.Left;
            end if;
         else
            if Node.Right = null then
               exit;
            else
               Node := Node.Right;
            end if;
         end if;
         Bit := Bit + 1;
      end loop;

      ---------------------------------------------------
      -- We are at the last existing node in the path. --
      -- If there are any non-existent nodes on the    --
      -- path, Bit defines the first; create them.     --
      ---------------------------------------------------

      for I in Bit .. BV'Last loop
         if BV(I) then
            Node.Left := new Node;
            Node := Node.Left;
         else
            Node.Right := new Node;
            Node := Node.Right;
         end if;
      end loop;

      -------------------------------------------------------
      -- Increment the count at the node identified by BV. --
      -------------------------------------------------------

      Node.Count := Node.Count + 1;

This is exactly as efficient, and easier to maintain.  The loops have
clear missions, node creation is more localized, and the Increment
statement is only reached by one path of execution.

-- end --

-- 
Samuel Mize -- smize@imagin.net -- Team Ada
(personal net account)