Thursday, October 15, 2009

Error handling is hard. Let's use goto!

Error conditions in subroutines are hard to handle properly. Both
of the common strategies, checking return values and exceptions, are
prone to leaving cases uncovered that aren't discovered until runtime
And leaving a function in the middle of its operation is always hard
--- you have to be careful to leave the program's world in a
consistent state.

Checking return values is subject to, well, failure to check the
return value:

Check return value; local cleanup

// returns non-zero value on error
int do_stuff() {
  int flag;
  set_up();
  flag = first_call();
  if (flag != 0) {
     tear_down();
     return 1;
  }
  flag = second_call();
  if (flag != 0) {
     cleanup_first_call();
     tear_down();
     return 1;
  }
  flag = third_call();
  // oops, didn't check the flag, or cleanup first and second calls
  flag = fourth_call();
  if (flag != 0) {
     cleanup_third_call();
     // oops, didn't clean up after second_call.  Maybe it was added later
     // and the developer missed this cleanup clause.
     cleanup_first_call();
     tear_down();
     return 1;
  }
  tear_down();
  return 0;
}

Exceptions seem like a nice fix. They make sure that you're not
going to get through to fourth_call if third_call blows up. Here's
the first naive attempt:

Pass-through exceptions; no cleanup

// raises an exception on error, but has problems
void do_stuff() {
  set_up();
  first_call();
  second_call();
  third_call();
  fourth_call();
  tear_down();
}

That's pretty, isn't it? We got rid of all that horrible error
handling code that was confusing the core flow, etc. and so forth.
But actually, we're now deep in the thickets.

Here's the first problem: if any of first_call, etc. raise an
exception, tear_down() isn't going to get called. We can use
try/catch to guard against this problem:

Exceptions rethrown by tear-down guard

// make sure to call tear_down on exception
void do_stuff() {
  set_up();
  try {
    first_call();
    second_call();
    third_call();
    fourth_call();
  } catch (...) {
    tear_down();
    throw;
  }
  tear_down();
}

Since tear_down is supposed to get run in all exit paths , in C++
we can use a stack-allocated helper class, implementing the RAII
(Resource Acquisition Is Initialization) pattern, to make this a
little less error prone:

Pass-through exceptions; RAII guard

class SetUpHelper {
public:
  SetUpHelper() {set_up();}
  ~SetUpHelper() {tear_down();}
}
void do_stuff() {
  // automatically call tear-down when the destructor runs
  SetUpHelper raii_setup;
  first_call();
  second_call();
  third_call();
  fourth_call();
}

That's better, but there's still something wrong here: nothing is
ever calling our cleanup functions, cleanup_first_call(), etc. If
there's an exceptional exit, we're presently going to leave the
program's internals in some inconsistent state. Let's fix that up
too, try/catch style:

Re-thrown exceptions with local cleanup

void do_stuff() {
  SetUpHelper raii_setup;
  first_call();
  try {
    second_call();
  } catch (...) {
    cleanup_first_call();
    throw;
  }
  try {
    third_call();
  } catch (...) {
    cleanup_second_call();
    cleanup_first_call();
    throw;
  }
  try {
    fourth_call();
  } catch (...) {
    cleanup_third_call();
    // oops, forgot to clean up second_call. Maybe it was added later
    // and the developer missed this handler
    cleanup_first_call();
    throw;
  }
}

That's not substantially better than our original flag-based
handling --- the error logic is all inline with the main path, there
are just as many opportunities to forget to try/catch or call the
right cleanup sequence, there's code duplication, etc.

We can clean up a little of the code duplication, and reduce the
opportunity for error, by nesting our handler logic:

Re-thrown exceptions with nested cleanup

void do_stuff() {
  SetUpHelper raii_setup;
  first_call();
  try {
    second_call();
    try {
      third_call();
      try {
        fourth_call();
      } catch (...) {
        cleanup_third_call();
        throw:
      }
    } catch (...) {
      cleanup_second_call();
      throw;
    }
  } catch (...) {    
    cleanup_first_call();
    throw;
  }
}

Is this really an improvement? Our main-line logic isn't
intrinsically nested, but here we've imposed a nesting structure just
to handle errors. It adds considerable visual and structural
complexity. Furthermore, the nesting interleaves main-line logic and
clean-up logic nearly as badly as our original flag logic did.
Ugly.

With all due respect to the "...Considered Harmful" philosophy,
C and C++ give us a better option than we've seen so far: goto.

Check return value; goto-based cleanup

// returns non-zero value on error
int do_stuff() {
  int flag;
  set_up();
  flag = first_call();
  if (flag != 0) { goto first_failed;}
  flag = second_call();
  if (flag != 0) { goto second_failed;}
  flag = third_call();
  if (flag != 0) { goto third_failed;}
  flag = fourth_call();
  if (flag != 0) { goto fourth_failed;}
  tear_down();
  return 0;
fourth_failed:
  cleanup_third_call();
third_failed:
  cleanup_second_call();
second_failed:
  cleanup_first_call();
first_failed:
  tear_down();
  return 1;
}

Loook at that! Our main-line logic is linear. It's not cluttered
with clean-up logic, but the correlation between a main-line exit and
a clean-up entry is clearly named. There's a regularity to the logic
which is easy to inspect for missing flag checks. There's no code
duplication calling the cleanup functions. This code
is readable.

We do have one bit of duplicate code, the two calls to tear_down().
The duplication here is a lot less verbose, and a lot more obvious to
someone reading the code, than having an extra RAII class.

To be sure, there is still room for error. You could forget to
check a flag. If you were to swap the order of second_call and
third_call, you'd need to do so in two places to make cleanup behave
properly sane; of course, this is actually just as true in the nested
exceptions example.

Finally, there's one dangling bit of ugliness we haven't addressed
anywhere yet. What if, say, second_cleanup can fail too? If it's
returning a flag, we're ignoring it in every example above. Oops!

If it's throwing an exception... well, in our exception-based
examples, it's doing so from the middle of another exception's catch
clause. That's pretty ugly too!

In my next post, I'll describe a novel error handling strategy that
we're trying out to address some of the problems inherent to the
schemes above. (By the way, it's not dependent on goto, so those of
you holding air sickness bags can put them down.)