neuer-error 0.2.0

Ergonomic error handling for machines and humans.
Documentation
1
2
3
4
5
6
7
8
9
10
11
# Why another error library?

I have been trying many approaches to error handling over the years. Most notably, I have created production services using `anyhow`/`(color-)eyre`, `thiserror`/`snafu` and `error-stack`. Each of those are great libraries, all with their strengths and weaknesses. Needless to say, I wasn't fully satisfied. All of them annoyed me at some point.

There is the erased error approach (`anyhow`/`eyre`), which is really ergonomic. You can just provide context as human messages and interface with every other error you encounter. However, I found myself having problems once I needed to know whether I should retry an error and when. You can use dynamic `downcast`ing, but you know.. That does not feel clean and is not at all discoverable. Dynamic errors at runtime almost guaranteed in the long run. Additionally, it is recommended for applications only anyway. When you refactor your application into crates, you suddenly have libraries. That's a whole lot of work. Ergonomics are quite important though. For many people, it is the go-to for initial experiments, which end up being a product later. Sometimes it might be too ergonomic and you forget to provide context until you see the error message and have no clue what is going on.

Next, there is the proc-macros for your own error types (`thiserror`/`snafu`). They are great at showing what information you have. You can even access the information statically typed. They also provide helpers for capturing backtraces or source locations. You now have separate types everywhere, that you need to convert between. This is usually not an issue at all. In fact, it is helpful to automatically provide the error message you want. On the contrary, you have to define the conversion for every error type you interface with. Furthermore, you have to do that for every different error message! Unless of course you have a variant that includes an error message string, but that kind of defeats the point of going for typed enums in the first place. I often found myself being lazy to browse to the enum and add a variant. I just re-used the one that exists. But now the error message is less helpful. And I still have to implement methods to comfortably collect the information I want. Let's say I want to know whether I should retry an HTTP request: I have an enum with the HTTP error, I have one for serialization/deserialization, I have one for internal stuff and one for user data validation. Now I need to match my variants, try to find the getters in source error types that provide the information I need, and combine that. Not a big issue, but doing it often enough, it gets annoying. Still the best approach I knew, so I used it. Just have to be determined enough to follow best-practices right?

Then I once tried and initially liked `error-stack`. It provides a middle-ground, where you have a generic type, that contains your custom error kind and information. However, it handles conversion for you, it allows to stack multiple errors into one (useful for validation) and it allows dynamic attachments for other information you might want to add. In the end, it was quite cumbersome though. Specifying generics, having to manually call conversion methods to switch the contained context, adding attachments.. It adds up quickly. You also still have the opaque dynamic `downcast`s for the attachments.

Recently, I came across [this blog post](https://fast.github.io/blog/stop-forwarding-errors-start-designing-them/). It inspired me to challenge my approach and re-think error handling. After all, I was doing bad stuff with my errors, I was not thinking about what the human developer needs, and also not what is needed to make recover in upstream code. I often ended up with error messages, that were not helpful. I wanted something that encourages best-practices by design. So I researched a bit. The mentioned `exn` library provides a wrapper to capture source locations without backtraces, quite handy! Though you still need your machine-recovery-information. And I am unsure how it works when bubbling up multiple layers, I haven't actually tried it, sorry ^^. Then the post showed the error design of the [Apache OpenDAL library](https://github.com/apache/opendal/pull/977). It is quite interesting. It is tailored very closely to their use-case of course. So I wondered: is it actually possible to have the same approach generically? Both for libraries and applications? For all the different use-cases? Probably not, but maybe enough to be useful? Does it make sense to try? Of course, was fun! So I did, and this is the result. I invested too much time and effort probably, I hope it will at least be interesting and useful to some people! :)