or Why it is hard to get Graceful Shutdowns right
I used to think shutdowns are easy. You’re shutting down and the OS is going to take care of all my resources. Unfortunately, that’s not completely the case. Shutdowns are easy, but perfectly graceful ones are not as intuitive. Ungraceful shutdowns bring about crashes, or even worse, unpredicted change to state, and nobody likes either.
Disclaimer: Most of my knowledge is limited to windows scenarios, but I’d imagine most modern Operating Systems have parallel paradigms.
tl;dr: globals and TLS objects can make graceful shutdown a nightmare
- Pressing "X" sends a signal
- Threads & Global Objects make things tough
- Just don’t bother with TLS
Pressing "X" sends a signal
To start things off, let’s see what happens when a user ends the application pro-actively. In other words, an exit is triggered by the outside world.
When a user presses close or "X", OS starts a thread in the process (OS is all mighty and it can do such things) and this thread executes signal handling code. The way this works exactly depends on the signal raised.
Different Exit Signals in Windows
Pressing X in Windows emits a
CTRL_CLOSE_EVENT signal. However, that is not the only signal the user can send to end an app. Others include
CTRL_BREAK_EVENT (SIGBREAK) and
CTRL_C_EVENT (SIGINT) that we use frequently to end apps in the command line without having to close the CMD window. There’s also shutdown/logoff events but I am not going to discuss those.
Using a Ctrl handler
While Windows theoretically can just kill apps when users want, that would get a lot of developers mad, including Windows dev themselves. Apps need some time to clean up resources it has consumed, for example saving current work it is doing, or letting the client in a web app know it is shutting down. Because of these reasons, windows allow devs to handle those signals first.
For Windows Console apps,
SetConsoleCtrlHandler is the right way to achieve this. For example:
It’s possible to add more and more handlers by doing more calls to
SetConsoleCtrlHandler. According to MS Docs, handlers are added in a stack and the last handler appended is the first to get executed. Then, if any of them returns true, this list is not processed anymore.
I have not included
CTRL_CLOSE_EVENT above because it’s special. While handling
CTRL_BREAK_EVENT lets the program continue, that’s not the case for Close, which is the result of pressing
X. For this one, Windows gives a time limit to the app to exit, after which it kills it no matter what. I could not find documentation for how long that period is, but on my machine, it’s about 5 seconds.
Threads & Global Objects make things tough
Let’s get real – and make things a little complicated. While single-threaded applications with no global objects or state could be straightforward as shown in the snippet above, that’s not really exciting (or fast), is it?
Let’s first see how processes exit in Windows. There are multiple ways, but the one we’re discussing here involves windows terminating the process by spawning a thread that calls
ExitProcess(). This triggers a series of events, the interesting ones for us are about threads and global resources.
Global resources are tricky. They are constructed & deconstructed by the C runtime. To overly simplify things, let’s assume they are constructed before calling main, and deconstructed after all threads signal to the kernel that they have exited.
See the gotcha here? It’s the responsibility of the application to relieve any dependency by global objects on threads during those 5 seconds cleanup. Even with heavy use of counter-based pointers, things can get out of hand, for example, if global objects keep others alive for too long by holding on a reference to them. Trying to destruct those abruptly could trigger access violation. For example:
In order to keep things safe, the below model makes sense for which objects can make references to which objects:
In short, bigger rectangles cannot assume smaller ones are alive, especially in their destructors.
Just don’t bother with TLS
I am not a TLS fan, and I might go into more details into that in another post, but as the last section, I would like to wave a special warning for why TLS (Thread Local Storage) is evil in this context.
The standard way to initialize TLS storage is using the keyword
thread_local (shutter). E.g.:
The problem with TLS is mostly about the added complexity. On one hand, lifetime management becomes even more complicated, especially that there’s if there’s TLS, there are usually global objects and lifetimes have to be managed carefully. On the other hand, maintaining TLS code is no fun. It is a bit of a steep curve and I have rarely seen anyone making a change to a TLS-heavy part of the code, for the first time, bug-free.In my experience, a mix of thread & object pools can replace TLS in most cases.