Ephemeral Servers on Hetzner
I use servers on Hetzner cloud to do most of my client development. A good server type for this use case, CPX51, costs €54.90/mo (at time of writing). While this is pretty affordable, it’s also sitting idle most of the time, wasting money. This applies even if the server is powered off; it has to be deleted to stop the billing. I developed a solution that involves booting from a detachable volume, which allowed me to reduce costs to €20.68/mo, a 62% reduction! 1
My goals for this project are:
- A server that boots from a Hetzner Volume and can be deleted when not in use
- A daemon, running on the server, to automatically delete the server when it detects idle
Booting from a Cloud Volume
Aside: Ideas I tried that didn't work
I went through several iterations of this before arriving at the current solution. Here are some of the ideas that I tried, and why they didn’t work for me.
- Configuring the bootloader to just boot from the detachable volume. For some reason, GRUB is unable to detect the detachable volume. I was unable to work around this, and Hetzner support declined to offer any assistance.
- Create a minimal Linux image that can be used to kexec into the detachable volume’s kernel. This actually worked, but I found that creating a server from a custom image is dramatically slower than using a stock image (from a few seconds for a stock image to multiple minutes for a custom image).
- Using rescue mode and a script to kexec into the detachable volume’s kernel. This also worked, but it requires a complex script to boot the server (scripting SSH commands) and it was quite a bit slower than a stock image boot.
The first challenge is just booting from a cloud volume. I ended up creating a cloud-init script that does the kexec. So, the boot process looks like this:
- On my laptop, I run
hcloud server create
, passing a script in the user data that performs the kexec. - The stock image boots and runs the user data script, which performs a kexec.
- The volume boots, and reformats the server’s fixed disk for use as ephemeral storage.
In order to build this, we need to create a volume to boot from. This process is actually quite simple: we just clone a standard Hetzner image onto a detachable volume, then make a few modifications.
Once you’re in rescue mode, there’s a lengthy procedure to follow. You can see the full script by running hetzner-bootable-volume show-prepare-script VOLUME_ID
, and I will outline the steps here: reference
- (Lines 253-268) Mount the stock image’s root partition and chroot into it. Note that the
chroot
command actually applies to the “current shell”, instead of starting a subprocess. This wouldn’t work in a typical shell script, but works here because we are running this script directly from stdin. We installkexec-tools
, then exit the chroot and unmount the filesystem. - (Lines 270-276) Set up the partitions on the detachable volume. The entire drive contains a single partition, which is the root. We use
cp
to clone the filesystem from the stock image to the detachable volume, then randomize its permanent ID and resize it to fill the new disk size. - (Lines 292-301) Create a script
/boot/hbv-kexec
that does the actual kexec. We store this script on the bootable volume in case it needs to be customized in the future. - (Lines 303-323) Create a script
/usr/local/sbin/hbv-ephemeral-drive
that reformats the fixed drive. We run this script on every boot. The fixed drive will contain a swap partition, and a usable “ephemeral” partition. - (Lines 325-335) Create the
hbv-ephemeral-drive
systemd unit, which runs the script we just added. - (Lines 337-343) Prepare the new filesystem for initial boot, by telling cloud-init that it has never run before, and enabling the systemd unit we created.
- (Lines 345-367) Perform the initial boot into the new filesystem.
After the initial boot has completed, we perform one last task (lines 190-243): disabling most of cloud-init. Most of what cloud-init does (like running commands on “first” boot) doesn’t make sense when the volume is used on multiple servers.
Once that whole process is completed, the script to create a server from the volume is pretty simple:
The user data is necessary to do the kexec is simple:
The grub-editenv
line is necessary to fix an issue with rebooting the detachable volume. Without this GRUB thinks that the last boot failed, and so it will GRUB drop into an interactive menu for you to repair the system. The other lines simply set up a chroot, then execute the hbv-exec
script we put on the detachable volume.
Remember that we disabled user data on the detachable volume, so this doesn’t conflict or cause boot loops.
When you are done using your server, you can use hcloud server delete
to delete it.
Automatically shutting down idle servers
The next goal is to build a daemon that will detect when the system is not in use and shut it down. This is done in two parts:
- A script that monitors the system and runs
poweroff
when it is idle for long enough. - A script that calls
hcloud server delete
on itself when the system is shutting down.
Before we can do this, we need to define what “idle” means. There isn’t a mouse to jiggle or a screen saver like on a laptop, but instead we can list out a few signals and monitor them:
- Logged in users over SSH. This can be monitored by looking at the access time (
atime
) on all active PTS devices (/dev/pts/*
). - Active GUI sessions (I wrote support for this, but never use graphical sessions). This is done by running
xprintidle
for each user that has an X server running. - A special marker file to signal a the server should be kept on. This allows you to run
hbv-auto-shutdown caffeinate command
and leave for the weekend, the server will just down aftercommand
finishes.
The result of this is hbv-auto-shutdown. Install that script to /usr/local/bin/
, then add a systemd unit like this:
Activating this daemon will automatically shut down the system when it goes idle, but this will leave the server in Hetzner and it will still be billed normally. To actually delete the server, we use another simple script:
This script first checks that we are actually powering off the server (as a safety measure), and then simply calls hcloud server delete
. The systemd unit that we use merits some explanation:
This is a bit backward. We actually bring this unit up early in the boot, immediately after the network is available (1). Since there is no ExecStart
but RemainAfterExit
is set, the service is marked started without doing anything else. When the service comes down, which is after network.target, user.slice, and machine.slice, it invokes hbv-self-destruct
(2).
The last thing to do is provide your server with an API key to use for hcloud
:
Activate these with systemctl enable --now hbv-auto-shutdown
and systemctl enable --now hbv-auto-shutdown
.
Improving performance with RAID1
Benchmarks suggest that Hetzner volumes are able to sustain about 300 MiB/s of throughput, while local disks can achieve 700 MiB/s or more. By leveraging RAID1, we can use the server’s fixed drive as a RAID mirror, which improves the read throughput to the same level as a local disk (the write throughput will remain limited).
This is actually very simple to do: every boot, we reformat the local drive, then configure a RAID array with the two drives, where the detachable volume is “write mostly”, leaving the other to be used as a fast cache. The key commands are:
The second command treats the added drive as a blank slate, so Linux will immediately begin mirroring the main drive onto it, and automatically using it for faster reads once that finishes. If you want, you can check the progress:
Using this RAID1 setup on the root partition is a little bit more complicated; we need to modify the initramfs to run these key commands. You can use the enable-raid1 script to do the whole process.
Conclusion
I’ve been using this iteration of hetzner-bootable-volume
for 2 months now, and earlier versions of this concept for over a year. It allows me to have a completely separate development environment for each of my clients, which is especially useful when they have complex VPN requirements. By doing my development on a server in a datacenter, my docker pulls and other bandwidth-heavy activity is always fast, even when I am working over a mobile hotspot.
Footnotes
-
6.60 (150 GB cloud volume) + 0.088/hr (CPX51) * 160 hr/mo. Current prices can be found at Hetzner Pricing. ↩