Back to blog

/

Libraries

error-stack

Announcing error-stack

A context-aware error library for Rust that supports arbitrary attached user data

June 10th, 2022

Tim DiekmannSenior Platform Engineer, HASH

This is an old blog post and may contain code which does not compile with the current error-stack version. Please check out the latest documentation and examples for more information.

We are proud to announce a new error library for Rust: error-stack - a context-aware error library with arbitrary attached user data.

This post outlines some of our experiences with error-handling in Rust, why we built the error-stack library, the architectural choices we made along the way, and what you can expect from the crate.

Our error-handling journey

At HASH, we've been building a simulation engine — written in Rust — called HASH Engine (or 'hEngine' for short). It's a large codebase, with a number of contributors, and its size and complexity has grown over time (check out the code on GitHub). Across some big refactors, as well as everyday feature development, we found error handling became unwieldy and ourselves struggling to keep up. In a bid to improve our debugging process and reduce maintenance overheads we decided to revisit our strategy for managing errors.

Up until recently our approach to error handling centered around using thiserror, a fantastic crate which abstracts away a lot of Rust's verbosity when creating errors. Previously we would create enumerations (generally large, top-level ones, per crate and/or major module) to describe our different error types, and use thiserror's macros to handle a lot of the trait implementations. We didn't want to be using Box<dyn Error> everywhere, meaning we wanted typed errors, so we utilized thiserror's ability to generate From implementations from a macro to handle the transformation of our error types across the module and crate boundaries. (thiserror lets you succinctly define a From implementation like so: IoError(#[from] std::io::Error)).

Handling errors across module and crate boundaries

Our approach led us to adopt a pattern where we'd call functions across boundaries (for example, from the engine module to our simulation crate), necessitating transformation between error types. For ergonomic reasons, we'd then create a From implementation for the whole Error enumeration:

// mod engine

#[derive(ThisError)]
enum Error {
    #[error("Simulation error: {0}")]
    Simulation(#[from] crate::simulation::Error),
    ...
}

This would then often turn into error messages that looked like the following:

Error: Engine Error: Simulation Error: Experiment Error: Datastore Error: No such file or directory (os error 2)

These error messages would tell us very little about the context that led to the error and weren't very helpful in actually tracking down the causes of problems. And while we have logging infrastructure in place, which is helpful with identifying events that lead to errors, it relies on a few things:

  • the logging level is set to something detailed enough to point the person debugging to the correct location (not knowing how to reproduce the error can make it difficult to just rerun with RUST_LOG=trace);
  • there are actually enough log statements in the correct places to give enough information;
  • the log statements have been written in a way that's helpful, and provide the correct information (which is a challenge by itself, as you'll often find you needed information you hadn't thought about before).

Perhaps most importantly though, hEngine is a multi-threaded, multi-language project. The consequence of this is that there will often be a mix of logs from various threads appearing before an error, resulting in it not always being straightforward to tell which piece of logic may be causing a problem.

It's worth mentioning that thiserror is able to capture backtraces, which helps a fair bit. We started utilizing this by adding a backtrace field for some of our error variants, however, in doing so we ran into two issues.

Firstly, we needed to explicitly add the field to all error variants expected to have a backtrace:

// mod engine

#[derive(ThisError)]
enum Error {
    #[error("Simulation error: {0}")]
    Simulation(#[from] crate::simulation::Error, Backtrace),
    ...
}

Secondly, this required rerunning the program with RUST_BACKTRACE=1, or always running the program with it on.

Additionally, while backtraces help us locate the line of code which generated the error (which helps with disambiguating the example error message shown above), it's still missing a lot of other information that could prove helpful, such as which iteration of a loop caused the error, or what path was missing.

Defining base errors

As mentioned, we've tried to keep our errors well-typed by defining them as variants within various enumerations. This introduced a lot of overhead that we struggled with during some periods of rapid development. A significant subset of errors seemed to happen once, and as such, we'd generally define a variant like so on our enums:

#[derive(ThisError)]
enum Error {
    /// Used when errors need to propagate but are too unique to be typed
    #[error("{0}")]
    Unique(String),
    ...
}

impl From<&str> for Error {
    fn from(s: &str) -> Self {
        Error::Unique(s.to_string())
    }
}

impl From<String> for Error {
    fn from(s: String) -> Self {
        Error::Unique(s)
    }
}

This was ergonomic at the time of error creation because we could quickly create one:

Error::from("Failed to initialize Context Package Creators")

However, in practice, we found ourselves often misusing these due to their relative convenience. This would lead to errors such as the following…

OrchestratorError::from(format!("Not a valid initial state file: {path:?}"))

…where we ended converting the Path to a (Debug) string rather than attaching the actual object in case needed at the point of handling.

This was through no fault of thiserror, but our usage of it in practice, and the patterns we found ourselves settling into.

Error conversions

thiserror supplies a macro to generate From implementations for your error types, which is great when it works, helping avoid boilerplate becoming huge and unmaintainable. Unfortunately, it can't handle cyclic dependencies where you might have two-way conversions between your error types. While we have since majorly refactored our code-base to have a much better separation of concerns, we used to have great interdependency which caused this problem to occur more often than ideal. Due to this, we found ourselves employing an embarrassingly reproachable pattern in our code:

// mod experiment_controller
...
    .map_err(|experiment_err| Error::from(experiment_err.to_string()))?;

We already had an experiment::Error From experiment_controller::Error implementation, which meant that we were unable to define a From implementation going the other way. To get around this (initially temporarily, but as with many things, this ended up sticking around for a while) we simply turned the experiment error into a String and utilized the Error::Unique variant again.

This was clearly not ideal, and we ended up with a weird mixture of well-defined error variants and ones that were just captured in strings. The conversions also resulted in information loss depending on the Display implementation, including the loss of important pieces of information like the backtrace.

So, why another error handling library?

We really love the Rust ecosystem, and although there are many great error crates out there, we struggled to find one that aligned with our needs, and which worked well with some of our project's inherent complexity. The closest error crate we could find was anyhow (or its fork eyre), however, we had some additional wants:

  1. encourage the user to provide a new error type if the scope is changed, usually by crossing module/crate boundaries (for example, a ConfigParseError when parsing a configuration file vs. an IoError when reading a file from disk)
  2. be able to use those error types within return types, without needing to handle difficult From logic
  3. be able to attach any data to an error without a lot of configuration [1], not just string-like types, which then can be requested when handling the error.

Based on our experiences outlined above, we made a few additional design decisions that differentiated our approach to error handling from those existing crates designed to facilitate it. Generally speaking, these decisions were made to force developers to be more explicit when creating or chaining errors, but most importantly, we don't supply in-built support for creating errors from strings. While it is convenient, we want to encourage developers to think more about their errors by building concrete error types like std::io::Error or std::num::ParseIntError

It's still possible to create something like Error::Unique but we do not provide shortcut functionality for that within the library, hoping that a lot of its uses can instead be solved through “attachments”.

  • [1] Clarification: eyre and other libraries are able to support the inclusion of arbitrary data, whether that be through a custom Handler implementation, or by creating Error types that wrap the data. We'd like to be able to easily attach anything to the Err arm of a Result without needing bespoke configurations.

What is error-stack?

error-stack is centered around taking a base error and building up a full richer picture of it as it propagates. We call this a Report (you might recognize the naming choice of Report from the eyre crate, which we thought was fitting).

This Report on the error is made-up of a combination of two main concepts:

  1. Contexts
  2. Attachments

A Report is organized as a stack, where you push contexts and attachments. We call this the Frame stack as each context and attachment is contained in a Frame alongside the code location of where it was created. The Report is able to iterate through the Frame stack from the most recent one to the root (on a 'Last In, First Out' basis).

Contexts

These are views of the world, they help describe the current section of code's way of seeing the error — a high-level description of the error. The current context of a Report<T> is captured in its generic argument.

When an error is created, this might often be the base error type, so for a given error E, this will be turned into Report<E> when creating a Report from E. We expect this parameter to change at major module boundaries and across crate boundaries, where the specificity of the error might not have much meaning anymore:

fn parse_config(path: impl AsRef<Path>) -> Result<Config, Report<ParseConfigError>> {
    let path = path.as_ref();

    // First, we have a base error:
    let io_result = fs::read_to_string(path);      // Result<File, std::io::Error>

    // Convert the base error into a Report:
    let io_report = io_result.report();            // Result<File, Report<std::io::Error>>

    // Change the context, which will eventually return from the function:
    let config_report = io_report
        .change_context(ParseConfigError::new())?; // Result<File, Report<ParseConfigError>>

    // ...
}

Having the concept of a current context has an important implication: when leaving the scope of a function it's likely that the return type from the caller function has a different type signature. This implies that a new Context has to be added. This encourages the developer to think more about their error types and forces reasonable good cause traces.

For a Context we provide a trait, which is implemented for std::error::Error by default (if the Error is Send + Sync + 'static):

pub trait Context: Display + Debug + Send + Sync + 'static {
    fn provide<'a>(&'a self, demand: &mut Demand<'a>) {}
}

provide() is an optional feature making use of the Provider RFC. When providing data here, they later can be requested by calling Report::request_ref() or Report::request_value().

Note: The implementation for the Provider RFC is not available on nightly at the time of writing, so we have vendored in the functionality for now. We will switch to the upstream implementation in core::any with a future release of error-stack. Using the Provider API currently requires the Rust nightly compiler.

Attachments

Frequently you'll want to attach specific information to an error, requiring it to be encapsulated within the error type, similar to crates like anyhow or failure where you are able to attach string-like types. However, error-stack is not only able to attach messages but also any other data to the Report, for example, it's possible to attach a Suggestion. Using the example from above, we add a contextual message and a Suggestion (not offered by error-stack) to the Report:

fn parse_config(path: impl AsRef<Path>) -> Result<Config, Report<ParseConfigError>> {
    let path = path.as_ref();

    let content = fs::read_to_string(path)
        .report()
        .change_context(ParseConfigError::new())
        .attach_printable_lazy(|| format!("Could not read file {path:?}"))
        .attach(Suggestion("Use a file you can read next time!"))?;

    // ...
}

It's then possible to request the Suggestions from the Report, which returns an iterator (most recent attachment first) as you can attach the same type multiple times:

if let Err(report) = parse_config("config.json") {
    eprintln!("{report:?}");
    for suggestion in report.request_ref::<Suggestion>() {
        eprintln!("Suggestion: {suggestion}");
    }
}

Note: As mentioned before, calling request_ref() on a Report currently requires a nightly compiler as this is using the Provider API.

This will eventually output:

Could not parse configuration file
             at main.rs:6:10
      - Could not read file "config.json"
      - 1 additional opaque attachment

Caused by:
   0: No such file or directory (os error 2)
             at main.rs:5:10

Stack backtrace:
   0: error_stack::report::Report<T>::new
             at error-stack/src/report.rs:187:18
   1: error_stack::context::<impl core::convert::From<C> for error::report::Report<C>>::from
             at error-stack/src/context.rs:87:9
   2: <core::result::Result<T,E> as error::result::IntoReport>::report
             at error-stack/src/result.rs:204:31
   3: parse_config
             at main.rs:4:19
   4: main
             at main.rs:13:26
   5: ...

Suggestion: Use a file you can read next time!

What's the catch?

You might think that this comes with a lot of performance overhead, but with speed being a key concern of hEngine, we've taken this into account. The impact on the ‘happy' success code path is minimal; a Report is only one pointer in size and requesting from a Report only happens at the time of error reporting. We use a similar data layout as anyhow and thus expect similar performance.

However, we've purposefully added in some requirements that result in a little bit of extra upfront development overhead. We think having explicitly typed Result values is worth requiring, and especially helpful in identifying different types across module/crate boundaries, so we still want users to create types (Contexts). This approach makes error-stack usable not only for binaries but also for libraries. In most cases, the caller of a library only cares about the error directly returned from a library, not an underlying OS error. As the current context is always known, the user directly has all type information without optimistically trying to downcast to an error type (which remains possible). This also implies that more times than not a developer is forced to add a new context because the type system requires it. We've found from experience that this automatically greatly improves error reporting.

We haven't made it easy to create a new Report from a String because it requires the type to implement Context. It's still possible to use a newtype UniqueError(String) if a user wants to, but we discourage this usage because we've found that a lot of those places benefit from attaching a string message on a well-defined context instead. In any case, other libraries most probably still have to change the context from UniqueError to their context because of the explicit type requirement.

By consciously imposing more restrictions in some areas we hope to reduce the overall cost of development, in keeping with the philosophy underpinning Rust itself.

Some other cool things about error-stack

We don't have space here to talk about all of the features of error-stack, but here are a few more teasers:

  • The complete crate is written for no-std environments. However, when using std, a blanket implementation for Context for any Error is provided
  • The blanket implementation for Error also makes the library compatible with almost all other libraries using the official trait
    • This means that if you pick this up, you should be able to migrate gradually (like we're doing with hEngine)
  • Just like eyre, the crate is able to set its own Display and Debug hooks that are called when printing the Report, which means you can write your own custom formatting and do other cool things
  • We have in-built support for backtraces on the nightly channel and feature-flagged support for tracing to capture a SpanTrace

Check out the error-stack repository for even more information about the library.

The future of error-stack

As previously noted, we are still in the process of migrating our own codebase to error-stack and we intend to make further improvements as we apply the crate to more of hEngine and our internal code at HASH.

During development of the library we included a Provider implementation while waiting for the implementation of the Provider RFC to be merged. This is the reason why the API currently requires the nightly release channel, and we will be updating the library to swap to the core::any implementation in version 0.2.0.

We cannot guarantee full backward compatibility until this crate hits v1.0.0. However, we will aim to avoid breaking changes, and in the event of any will announce them beforehand and provide a migration guide.

Contributions are greatly welcome. View the project code on GitHub and, if you like it, please star the repository!

Acknowledgments

We would like to thank the Error Project Group for their work on the Error trait and the Provider RFC. It's unlikely we would have written this library without the latter — or at least not in a way we would have been truly satisfied with.

License

This crate is published under the MIT License.

View and star error-stack on GitHub >

Get new posts in your inbox

Get notified when new long-reads and articles go live. Follow along as we dive deep into new tech, and share our experiences. No sales stuff.

Join our community of HASH developers