From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,99222a5bd46ef3c9,start X-Google-Attributes: gid103376,public From: smize@news.imagin.net (Samuel Mize) Subject: GOTO considered necessary (reworked) Date: 1997/06/11 Message-ID: <5nn2fm$11dk$1@prime.imagin.net> X-Deja-AN: 247729316 Organization: ImagiNet Communications Ltd, Arlington, Texas Reply-To: smize@imagin.net (Samuel Mize) Newsgroups: comp.lang.ada Date: 1997-06-11T00:00:00+00:00 List-Id: Greetings, [-------------------------------------------------------------------] [ I'm re-doing this since a lot of people were apparently at an Ada ] [ conference recently, and weren't reading net news. This version ] [ is updated based on previous comments. -- Mize ] [-------------------------------------------------------------------] I'm rebeating a dead horse, in order to write up a consensus paper to submit to adahome. I'll appreciate your input. This one concerns use of "goto" statements. I have scanned through the back c.l.a messages in DejaNews. I request your comments on this summary. Samuel Mize USE OF "GOTO" IN ADA -------------------- 1.0 SUMMARY: ------------ a) In general, "goto" statements should be avoided. They create confusing execution flow structures -- at first, or after a few maintenance changes. They also reduce opportunities for compiler optimization of code. b) As a rule of thumb, if your code needs a "goto" to work efficiently (or at all), you may have overlooked a simpler, better design. c) Exceptions to "goto" avoidance: c1) "Goto"s can be used to create a well-defined code structure that is not supported in the language. Examples are in Section 2.0. c2) "Goto"s are acceptable in machine-generated code, IF a human is NEVER expected to read the code. (For example, a module generated by a tool from data-flow diagrams.) c3) Finite State Machines (FMSs) are often implemented with "goto"s. These are also called Finite State Automatons (FSAs). This is discussed in detail in Section 3.0. These are used in compiler theory, business processes, command and control, ASIC or other custom chips. c4) Other usages are suspect, but may be OK. If a "goto" really improves clarity, use it -- but document WHY you used it, and why other approaches were found lacking. Look at your design at a higher level to see if you can eliminate "goto" usage with a clearer design approach -- the advantage here, of course, is not the mechanical elimination of the "goto", but the clearer design. See section 4.0, and the example in section 5.0. d) Eliminating a "goto" may prevent maintenance errors in the future (for instance, code being put on the wrong side of a label). However, it also may not be worth the effort to redesign without the "goto." Using an extraneous state variable or other kludge is no improvement on a "goto". e) Norm Cohen recommended Knuth's article in the December 1974 Computing Surveys as "the best discussion I've ever seen of the goto issue." Other respected Ada developers have concurred. f) The shared Ada culture avoids "goto" statements. Gellerich et al examined 34428 files containing 8.5 million lines of code, coded by many projects, institutions, and programmers. They found a "goto" density of 1 "goto" per 13614 lines of code. They excluded some directly-translated Fortran, and some code written to validate Ada compilers (which must test "goto"). ["Where does GOTO go to?", Gellerich Kosiol and Erhard, U of Stuttgart, http://www.informatik.uni-stuttgart.de/ifi/ps/Gellerich/publications.html] 2.0 UNSUPPORTED CODE STRUCTURES ------------------------------- The central feature of structured programming is not some specific set of code structures, but the fact that well-defined structures are used to organize the code. Arbitrary "goto"s are avoided; transfers of control occur as part of a larger, coherent structure that helps the reader understand the overall program's activity. Modern languages like Ada provide a large set of useful structures: "if" statements (supporting "then", "elsif", and "else" clauses), "case" statements, "while" and "for" loops, etc. However, there are some useful structures that are not directly supported. This will always be true. Some of these structures are rarely used, so are not worth the cost to add into the compiler; others may be invented, or may be domain-specific. It is generally acceptable to use a "goto" to implement a well-defined code structure that is not directly implemented in the language. For instance, the Ada gurus at ACT use "goto" in the GNAT compiler. On the other hand, it occurs in only 3 places (in 3.07), and all are instances of the "not-found" pattern shown below. Many recommend using "flag" comments to identify jumps that are not part of a structure. This includes "return" and "exit" statements, as well as "goto" statements. Thus, a loop might look like: loop ... exit when A; -------------------EXIT ... if B then return; ---------------------RETURN ... if C then goto CONTINUE; --------------GOTO ... <> end loop; For example, here are three structures that require "goto" in Ada: a) Process continuation logic A sequential process may have several natural completion points. One example is parsing a string that ends with optional items. Avoiding "goto" in this case requires increasing levels of "if" indentation: [process first part of string] if [not end-of-line] then [process option1] if [not end-of-line] then [process option2] if [not end-of-line] then [process option3] if [not end-of-line] then [process option4] ... endif; endif; endif; endif; This is hard to read, and confusing since all the parsing is really at the same logical level. A "completed" flag may also be used, but this does not usually improve clarity over the "goto" solution: [process first part of string] if [end-of-line] then goto NO_MORE_OPTIONS; end if; [process option1] if [end-of-line] then goto NO_MORE_OPTIONS; end if; [process option2] if [end-of-line] then goto NO_MORE_OPTIONS; end if; [process option3] if [end-of-line] then goto NO_MORE_OPTIONS; end if; [process option4] ... <> The same effect can be achieved by putting the processing sequence into a block statement and raising an exception on end-of-line. However, this is generally considered even worse than the "goto" solution. Most standards limit exceptions to error conditions, not normal processing. b) "Continue" loop logic The "continue" function of C has to be emulated in Ada with a "goto": loop ... goto <> ... <> end loop; This is a special case of the process continuation structure. c) "Not-Found" loop logic Loops frequently scan through a sequence -- for instance, an array -- followed by special processing if the item was not found. An example would be inserting an item in an ordered array: Items: array (Positive range Lo .. Hi) of Item; Last_Item: Natural := Items'First - 1; ... -- for each item to be inserted for I in Items'First .. Last_Item loop -- duplicates are allowed, and inserted as separate items if Items (I) >= New_Item then Items (I + 1 .. Last_Item + 1) := Items (I .. Last_Item); Items (I) := New_Item; goto ITEM_INSERTED; end if; end loop; -- New_Item's place not found before end of array Last_Item := Last_Item + 1; Items (Last_Item) := Item; <> ... Avoiding "goto" in this case requires one of these approaches: - A "found" boolean. When element is processed in loop, Found is set and loop exits. No-element processing is skipped when Found is true. - An "Index" variable, initially set past the end of the array, and reset in the loop if the element is found. Processing is always done after loop exit, based on this index. - Last-iteration detection in the loop. Each of these is also unclear in its own way. 3.0 FINITE STATE MACHINES (FSMs) -------------------------------- An FSM is defined by a set of states. In each state, some processing may occur; then the abstract "machine" selects a state to transition into (this may be the same state again). Some developers find that "goto"s improve FSM clarity, as they more directly reflect the FSM being modelled. Other developers prefer a case statement inside a loop. The "goto" implementation generally looks like: <> -- A label like this exists for each state of the machine. declare [local variables] begin [processing for state 1] if [condition] then -- A variable holding the "current" goto [State_X]; -- state is not needed -- it is elsif [condition] then -- represented by the current place goto [State_Y]; -- of executiion. else goto [State_Z]; end if; end; -- Defensive programming: ensures that the machine does -- not accidentally "fall through" to the next state. raise No_Transition_Occurred; <> ... The "loop-and-case" alternative would be: type State is (Initial_State, State1, ... Staten); Current_State: State; -- variables that may affect the next transition ... function Next_State (Current_State: State) return State; loop exit when (Current_State = [a terminal state]) or (Current_State = [another terminal state]) ...; -------------------------------------------------------- AAA case Current_State is when State_1 => [Processing for state 1] Current_State := Next_State (Current_State); ... end case; if Debug then Text_Io.Put_Line (State'Image (Current_State)); ----- BBB end if; end loop; Next_State can use an if/elsif cascade, or an array or hash-table lookup, or whatever is appropriate for the application-specific transition table. Different developers will argue for the clarity of either approach. Aside from clarity, the "goto" design has these advantages: a) The transition logic is separate from the processing logic for each state. This may be good or bad, depending on how distinct these naturally are. If they are naturally separate, having transition logic gathered into a separate function is clearer. If they are intertwined, the transition function will duplicate much of the logic from the processing. b) It is more efficient. Instead of looping and and evaluating a case expression, it jumps directly to the next state. c) It may be easier to locate accidentally "killed" states: a cross-reference tool could identify unused statement labels. Aside from clarity, the "loop-and-case" design has these advantages: a) The transition logic is separate from the processing logic for each state. b) There is a place to put processing that must occur at the start or end of each state (note lines AAA and BBB). c) The possible states are listed in an enumeration, usable for I/O or to represent a "memory" of previous states, if needed. Adding or deleting a state is about equally hard for both designs. It is harder with the case structure because you have to edit in two locations, keeping the type definition and the case structure synchronized. It is harder with the "goto" structure because the logic that jumps to the new (or deleted) state is less localized -- you have to hunt through all the code for the "goto"s. This seems to be a wash. The final decision must be made by the developer, according to clarity or other advantages for the specific application, within the constraints of the project and the overall design. 4.0 IMPROVING A "GOTO"-BASED DESIGN --------------------------------- The existence of "goto"s in code creates some purely mechanical problems for maintenance programmers. New code may be added on the wrong side of a "goto" target, and "goto"-based structures are often harder to maintain when intervening code changes significantly. This creates a small but definite advantage to eliminating many "goto" statements. However, once a "goto"-based design has been coded, it may not be worth the effort to redesign it to eliminate the "goto." This is especially the case if the "goto" is close to its target, and it is unlikely that other code would be inserted between them. In many cases, use of a "goto" indicates that a clearer, simpler design may be available. This is especially the case if the "goto" is not part of a clearly-defined design structure. The advantage to eliminating THESE "goto" statements is not the mechanical elimination of a "goto", but the clearer design that results. Extraneous state variables or other unclear design kludges do not improve on a "goto". If there is no overall design improvement that replaces a "goto" with a cleaner structure, you may as well leave the "goto". For instance, the code in section 5.0 would not be improved by mechanically replacing the "goto" by adding a "Done" boolean or some other kludge -- this would simply trade one unclear design for another. The critical point is that a program should be built in discrete sections, each of which accomplishes a clear task. These sections should be defined and separated by control-flow structures that show their relationship. This lets you (or a later maintenance programmer) define clearly in what state the program will be when it reaches a given point, and reason about what the section of code is supposed to be doing. Formally defined approaches to this kind of design are called "structured programming." For instance, you should be able to say what a loop does each iteration, and what it has done once it exits. (If you like semi-mathematical jargon, it needs an invariant and a termination condition). So, IF (1) it's worth the effort to review a "goto"-based design, and (2) the code can be reorganized into a clearer structure, then you should do so to eliminate the "goto". If a "goto" is needed to create the appropriate structure for your code, use it -- and document WHY you used it. The best approach to eliminating "goto" from your code is to write code in the first place that divides the task into logical chunks, organized by simple control structures. If you design your overall approach carefully, you will only use "goto" when it really does improve your code. 5.0 EXAMPLE OF "GOTO"-REDUCING DESIGN IMPROVEMENT ----------------------------------------------- Some code was posted to the net (quoting an example by Knuth) as one example of a "goto" being a reasonable design decision. It appears at first to be an example of "not-found" loop logic. However, a little more analysis reveals a non-"goto" design that is clearer. Again, the advantage to eliminating these "goto" statements is not the mechanical elimination of a "goto", but the clearer design that results. The code as given (with minor corrections) was: - - - - - - - - - - - - - type Node_Type; type Tree_Type is access Node_Type; type Node_Type is record Count : Natural := 0; Left, Right : Tree_Type; end record; type Bit_Vector_Index is Natural range 0 .. Natural'Last - 1; type Bit_Vector_Type is array (Bit_Vector_Index) of Boolean; type Bit_Vector_Pointer_Type is access Bit_Vector_Type; Root : Tree_Type := new Node_Type; BV : Bit_Vector_Pointer_Type; Bit : Bit_Vector_Index; -- BV points to a bit vector describing a path (False = left, True = right) -- through the tree pointed to by Root. We are counting the number of times -- a given bit vector is encountered in the Count component of the -- corresponding tree node. Root starts out pointing to a tree consisting -- only of a root node, and new nodes are added to the tree as needed. The -- loop below has three exits, and the actions appropriate when one of the -- exits is taken is different from the actions appropriate when the other -- two are taken, so a goto comes in handy. Here's the code: Node := Root; Bit := BV'First; loop if Bit > BV'Last then goto Increment; end if; if BV(Bit) then if Node.Left = null then Node.Left := new Node; Node := Node.Left; exit; else Node := Node.Left; end if; else if Node.Right = null then Node.Right := new Node; Node := Node.Right; exit; else Node := Node.Right; end if; end if; Bit := Bit + 1; end loop; -- We've reached a leaf. Read the rest of BV.all while extending -- the tree along the corresponding path only. for I in Bit+1 .. BV'Last loop if BV(I) then Node.Left := new Node; Node := Node.Left; else Node.Right := new Node; Node := Node.Right; end if; end loop; <> Node.Count := Node.Count + 1; It would be clearer to just let the first loop do all the work. Eliminate the "exit" statements and turn it into a "for" loop over the array bounds. This works, but it checks for "null" in nodes that were just created. If this makes a significant performance difference for your application, you can hand-optimize the code. (The compiler probably can't do this optimization.) First, we could simply replace the "goto" with an exit statement. The loop parameter specification would compute a null range and skip the loop. The end processing could also be enclosed in an "if" statement to skip it if the entire boolean array were processed. However, this would still be less readable, since nodes are created inside two different loops. Instead, let's use two clearly-defined loops: -- Move to the last existing node on the path FOR all existing nodes on path move to node -- Create any new nodes FOR all new nodes on path create the node The new code is: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- Declarations the same begin Node := Root; Bit := BV'First; --------------------------------------------- -- Move to the connect-node for the first -- -- node that needs to be created (if any). -- --------------------------------------------- -- We are manually simulating a "for" loop to -- retain the index after loop termination. loop exit when Bit > BV'Last; if BV(Bit) then if Node.Left = null then exit; else Node := Node.Left; end if; else if Node.Right = null then exit; else Node := Node.Right; end if; end if; Bit := Bit + 1; end loop; --------------------------------------------------- -- We are at the last existing node in the path. -- -- If there are any non-existent nodes on the -- -- path, Bit defines the first; create them. -- --------------------------------------------------- for I in Bit .. BV'Last loop if BV(I) then Node.Left := new Node; Node := Node.Left; else Node.Right := new Node; Node := Node.Right; end if; end loop; ------------------------------------------------------- -- Increment the count at the node identified by BV. -- ------------------------------------------------------- Node.Count := Node.Count + 1; This is exactly as efficient, and easier to maintain. The loops have clear missions, node creation is more localized, and the Increment statement is only reached by one path of execution. -- end -- -- Samuel Mize -- smize@imagin.net -- Team Ada (personal net account)