Personal websites are weird. We are mostly past the era of having them, as things like twitter and hosted blog services like Medium have taken them over, but I’m a hold out. I run both my own blog, and have a landing page on the apex domain since the server that hosts the blog (Google App Engine + Cloudflare) is different to the server that hosts the other content on my domain (the NAS at home I am currently resting my feet on while writing this)
Regardless, I want to have some level of introduction and link all of the other “social” presences elsewhere on the web, so I have this:
Except this isn’t a normal landing page! Every time you go to it, it will pick a random OS image and boot it in front of your eyes! I wrote the code for it in 2014 as a remix of what another friend made and since it’s served roughly 40,000 sessions.
At the same time, I also setup a recorder so that I have a recording of all of the sessions that have been done on the site. Most sessions are boring and end fairly quickly. Some people quickly realise what’s happening and have a nostalgia tour around an operating system they once used, and others spend the next 15 minutes playing Solitaire on Windows™ 98.
The way this system works is that every time you launch the site, it will dial a websocket to another server of mine, which will pick a VM image, and launch it with QEMU, then hand over control of the VNC socket to the websocket. The browser then uses NoVNC to display the VNC socket.
Since I realised I was running what was essentially non trusted code, I made a few precautions:
- Recordings are uploaded to an external location after the tab was closed
- qemu does not run as root
- The system is not used for anything else, no other software other than qemu and the debian base is installed
- The user it ran as does not have many privileges
- The qemu version is kept up to date with security patches
This had been all running well. However one morning when I was taking a screenshot of the site for a friend, I misfired on the keyboard combination to take a screenshot and instead got greeted with this in the page:
Huh, I didn’t know you could launch the QEMU monitor console from a VNC session…
The monitor console is used to administer or debug the guest in the case of any issues, it can also be used to add drives in real time.
I mentioned this in a chat room with a bunch of friends, and after a bit of playing around the severity of this console being accessible was realised.
The initial worry was that you could mount
.ssh/authorized_keys as a drive and overwrite the permitted SSH keys. However this would have been somewhat mitigated in that SSH is not publically accessible since the server is behind NAT. The real issue started to become clear when the live migration feature was discovered. Since the feature lets you run any unix command in order to handle the migration data, it became an easy way to spawn a shell inside the system. While looked for the code repository so I could fix it, and the SSH keys to get into the system, I challenged a friend to give it a go.
As expected, it worked, and because of the recording we have a copy of what he did:
The script is pretty simple. It dials back (since the server was behind NAT, it was more convenient for the shell to work in the opposite direction) to his server and spawns a shell and connects the shell to the socket to allow remote access:
After this, I wondered if the site that I was originally inspired by had this bug too. But it didn’t. After discussing with him we discoved that the flag that prevents this from happening appears to be
-nographic. I later on discovered that this issue can also be prevented from setting a
-monitor <somewhere> flag. Either way it was a narrow miss for my friend too.
Once again, the world of security is complex and surprise features can often be fatal. In this case however despite the hole discovered I would like to reflect on a few things that prevented this from being as bad:
sudowasn’t setup at all
- There was no other service running on the system, nothing else could have been compromised as a result (since the host system was setup to treat the VM system as hostile)
- The server was not publicly reachable, it was proxied behind nginx and NAT. This prevented some very simple attacks from happening
- Recordings were stored remotely, and have been meaning a audit can start to check if anyone else has discovered this on my site before me.
You can’t prevent yourself from having security incidents, but you can change how the incidents play out when they do happen.