Benutzer:Rdiez/ErrorHandling: Unterschied zwischen den Versionen

Aus /dev/tal
Wechseln zu: Navigation, Suche
(Die Seite wurde neu angelegt: „{{BenutzerSeitenNichtVeraendernWarnung|rdiez}} = Error Handling in General and C++ Exceptions in Particular = == Introduction == === Motivation === The softwa…“)
 
(Abrupt Termination)
Zeile 62: Zeile 62:
 
Leaving a [http://en.wikipedia.org/wiki/Memory_leak memory], [http://en.wikipedia.org/wiki/Handle_leak handle] or [http://en.wikipedia.org/wiki/Resource_leak resource leak] behind is not an option either, because the application will crash later on for a seemingly random reason. The user will probably not be able to provide an accurate error report, and the error will not be easy to reproduce either. The real cause will be very hard to discover and the user will quickly loose confidence in the general application stability.
 
Leaving a [http://en.wikipedia.org/wiki/Memory_leak memory], [http://en.wikipedia.org/wiki/Handle_leak handle] or [http://en.wikipedia.org/wiki/Resource_leak resource leak] behind is not an option either, because the application will crash later on for a seemingly random reason. The user will probably not be able to provide an accurate error report, and the error will not be easy to reproduce either. The real cause will be very hard to discover and the user will quickly loose confidence in the general application stability.
  
Abrupt termination is always unpleasant, but a controlled crash at least lets the user know what went wrong and, although it may sound counterintuitive, it will probably help improve the software quality in the long run, as there will be an incentive to fix the error quickly together with a helpful panic report.
+
Abrupt termination is always unpleasant, but a controlled crash at least lets the user know what went wrong. Although it may sound counterintuitive, such an immediate crash will probably help improve the software quality in the long run, as there will be an incentive to fix the error quickly together with a helpful panic report.
 +
 
 +
If you are worried about adding panic points, keep in mind that you will not be able to completely rule out abrupt termination anyway. Just touching a NULL pointer, calling some OS syscall with the wrong pointer or just using too much stack space at the wrong place may terminate your application at once.
  
 
== The rest of the article has not been written yet ==
 
== The rest of the article has not been written yet ==

Version vom 16. August 2013, 18:57 Uhr

Warning sign
Dies sind die persönlichen Benutzerseiten von rdiez, bitte nicht verändern! Ausnahmen sind nur einfache Sprachkorrekturen wie Tippfehler, falsche Präpositionen oder Ähnliches. Alles andere bitte nur dem Benutzer melden!


Error Handling in General and C++ Exceptions in Particular

Introduction

Motivation

The software industry does not seem to take software quality seriously, and a good part of it falls into the error-handling category. After putting up for years with so much misinformation, so many half-truths and with a general sentiment of apathy on the subject, I finally decided to write a lengthy article about error handling in general and C++ exceptions in particular.

I am not a professional technical writer and I cannot afford the time to start a long discussion on the subject, but I still welcome feedback, so feel free to drop me a line if you find any mistakes or would like to see some other aspect covered here.

Scope

This document focuses on the "normal" software development scenarios for user-facing applications or for non-critical embedded systems. There are of course other areas not covered here: there are systems where errors are measured, tolerated, compensated for or even incorporated into the decision process.

Audience

This document is meant for software developers who have already gathered a reasonable amount of programming experience. The main goal is to give practical information and describe effective techniques for day-to-day work.

Although you can probably guess how C++ exceptions work from the source code examples below, it is expected that you already know the basics, especially the concept of stack unwinding upon raising (throwing) an exception. Look into your favourite C++ book for a detailed description of exception semantics and their syntax peculiarities.

Causes of Neglect

Proper error-handling logic is what sets professional developers apart. Writing quality error handlers requires continuous discipline during development, because it is a tedious task that can easily cost more than the normal application logic for the sunny-day scenario that the developer is paid to write. Testing error paths manually with the debugger is recommended practice, but that doesn't make it any less time consuming. Repeatable test cases that feed the code with invalid data sequences in order to trigger and test each possible error scenario is a rare luxury. This is why error handling in general needs constant encouraging through systematic code reviews or through separate testing personnel. In my experience, lack of good error handling is also symptomatic that the code hasn't been properly developed and tested. A quick look at the error handlers in the source code can give you a pretty reliable measurement of the general code quality.

It is hard to assess how much value robust error handling brings to the end product, and therefore any extra development costs in this field are hard to justify. Free-software developers are often investing their own spare time and frequently take shortcuts in this area. Software contracts are usually drafted on positive terms describing what the software should do, and robustness in the face of errors gets then relegated to some implied general quality standards that are not properly described or quantified. Furthermore, when a customer tests a software product for acceptance, he is primarily worried about fulfilling the contractual obligations in the "normal", non-error case, and that tends to be hard enough. That the software is brittle either goes unnoticed or is not properly rated in the software bug list.

As a result, small software errors often cascade into great disasters, because all the error paths in between fail one after the next one across all the different software layers and communicating devices, as error handlers hardly ever got any attention. But even in this scenario, the common excuse sounds like "yes, but my part wouldn't have failed in the previous one hadn't in the first place".

In addition to all of the above, when the error-handling logic does fail, or when it does not yield helpful information for troubleshooting purposes, it tends to impact first and foremost the users' budget, and not the developer's, and that normally happens after the delivery and payment dates. Even if the error does come back to the original developer, it may find its way through a separate support department, which may even be able to provide a work-around and further justify the business case for that same support department. If nothing else helps, the developer's urgent help is then suddenly required for a real-world, important business problem, which may help make that original developer a well-regarded, irreplaceable person. After all, only the original person understands the code well enough to figure out what went wrong, and any newcomers will shy away from making any changes to a brittle codebase. This scenario can also hold true in open-source communities, where social credit from quickly fixing bugs may be more relevant than introducing those bugs in the first place. All these factors conspire to make poor error handling an attractive business strategy.

As a result, error handling gets mostly neglected, and that reflects in our day-to-day experience with computer software. I have seen plenty of jokes around about unhelpful or funny error messages. Many security issues have their roots in incorrect error detection or handling, and such issues are still getting patched on a weekly rhythm for operating system releases that have been considered stable for years.

Goals Overview

These are the main goals for a good error-handling strategy:

  1. Provide helpful error messages.
  2. Deliver the error messages timely and to the right person.
    The developer may want more information than the user.
  3. Limit the fallout after an error condition.
    Only the operation that failed should be affected, the rest should continue to run.
  4. Reduce the development costs of:
    • adding error checks to the source code.
    • repurposing existing code.

Compromises

Coding the error-handling logic can be costly, and sometimes compromises must be made:

Unpleasant Error Messages

In order to keep development costs under control, the techniques described below may tend to generate error messages that are too long or unpleasant to read. However, such drawbacks easily outweight the disadvantages of delivering too little error information. After all, errors should be the exception rather than the rule, so users should not need to read too many error messages during normal operation.

Abrupt Termination

It may often be desirable to let an application panic on a severe error than to try and cope with the error condition or ignore it altogether.

Some errors are just too expensive or virtually impossible to handle. An example could be a failed close( file_descriptor ); syscall, which should never fail, and when it does, there is not much the error handler can do about it. These errors are symptomatic of a serious logic error, but usually this kind of error is easy to fix.

Other error conditions may indicate that some memory is corrupt or that some data structure has invalid information that hasn't been detected soon enough. If the application carries on, its behaviour may well be undefined (it may act randomly), which may be even more undesirable than an instant crash.

Leaving a memory, handle or resource leak behind is not an option either, because the application will crash later on for a seemingly random reason. The user will probably not be able to provide an accurate error report, and the error will not be easy to reproduce either. The real cause will be very hard to discover and the user will quickly loose confidence in the general application stability.

Abrupt termination is always unpleasant, but a controlled crash at least lets the user know what went wrong. Although it may sound counterintuitive, such an immediate crash will probably help improve the software quality in the long run, as there will be an incentive to fix the error quickly together with a helpful panic report.

If you are worried about adding panic points, keep in mind that you will not be able to completely rule out abrupt termination anyway. Just touching a NULL pointer, calling some OS syscall with the wrong pointer or just using too much stack space at the wrong place may terminate your application at once.

The rest of the article has not been written yet