In May 2018, after the XSS was fixed, I realised that Google Sites was using an unpatched version of Google Caja, so I looked if it was vulnerable to the XSS. However, the XSS wasn’t exploitable there.
After a few tests, I realised that the Google Sites Caja server would only fetch Google-owned resources like https://www.google.com or https://www.gstatic.com, but not any external resource like https://www.facebook.com.
That’s a strange behavior because this functionality is meant to fetch external resources so it looks like a broken feature. More interestingly, it is hard to determine whether an arbitrary URL belongs to Google or not, given the breadth of Google services. Unless…
Finding an SSRF in Google
Whenever I find an endpoint that fetches arbitrary content server-side, I always test for SSRF. I did it a hundred times on Google services but never had any luck. Anyway the only explanation for the weird behavior of the Google Caja server was that the fetching was happening on the internal Google network and that is why it could only fetch Google-owned resources but not external resources. I already knew this was a bug, now the question was whether it was a security bug!
I opened the file and indeed it was full of private information from Google! \o/
Google, from the inside
First I want to say that I didn’t scan Google’s internal network. I only made 3 requests in the network to confirm the vulnerability and immediately sent a report to Google VRP. It took 48 hours to Google to fix the issue (I reported it on a Saturday), so in the meantime I couldn’t help but test 2-3 more requests to try to pivot the SSRF vulnerability into unrestricted file access or RCE but without luck.
The first request was to http://10.x.x.201/. It responded with a server status monitoring page of a “Borglet”. After a Google search, I could confirm that I was indeed inside Borg, Google’s internal large-scale cluster management system (here is a overview of the architecture). Google have open sourced the successor of Borg, Kubernetes in 2014. It seems that while Kubernetes is getting more and more popular, Google is still relying on Borg for its internal production infrastructure, but I can tell you it’s not because of the design of Borg interfaces! (edit: this is intended as a joke 😛 )
The second request was to http://10.x.x.1/ and it was also a monitoring page for another Borglet. The third request was http://10.x.x.1/getstatus, a different status monitoring page of a Borglet with more details on the jobs like permissions, arguments.
Each Borglet represents a machine, a server.
On the hardware side, both servers were using Haswell’s CPU @2.30GHz with 72 cores, which corresponds to a set of 2 or 3 Xeon E5 v3. Both servers were using the CPUs at 77%. They had 250GB of RAM, which was used at 70%. They had 1 HDD each with 2TB and no SSD. The HDD were almost empty with only 15GB used, so the data is stored elsewhere.
The processing jobs (alloc and tasks) are very diverse, I believe this optimizes ressource usage with some jobs using memory, others using CPU, network, some with high priority, etc… Some services seem very active : Video encoding, Gmail and Ads. That should not be surprising since video processing is very heavy, Gmail is one of the main Google services and Ads is, well, Google’s core business. 😉
I didn’t see Google Sites or Caja in the jobs list, so either the SSRF was going through a proxy or the Borglet on 10.x.x.201 was from a different network than the 10.x.x.201 IP I saw in my Google App Engine instance logs.
Regarding the architecture, we can find jobs related to almost all of the components of the Google Stack, in particular MapReduce, BitTable, Flume, GFS… On the technology side, Java seems to be heavily used. I didn’t see any mention of Python, C++, NodeJS or Go, but that doesn’t mean it wasn’t used so don’t draw conclusions. 😛
I should mention that Borg, like Kubernetes, relies on containers like Docker, and VMs. For video processing, it seems they are using Gvisor, a Google open-source tool that looks like a trade-off between containers performance and VMs security.