I was explaining on Hacker News why Windows fell behind Linux in terms of operating system kernel performance and innovation. And out of nowhere an anonymous Microsoft developer who contributes to the Windows NT kernel wrote a fantastic and honest response acknowledging this problem and explaining its cause. His post has been deleted! Why the censorship? I am reposting it here. This is too insightful to be lost. [Edit: The anonymous poster himself deleted his post as he thought it was too cruel and did not help make his point, which is about the social dynamics of spontaneous contribution. However he let me know he does not mind the repost at the condition I redact the SHA1 hash info, which I did.] [Edit: A second statement, apologetic, has been made by the anonymous person. See update at the bottom.]
I'm a developer in Windows and contribute to the NT kernel. (Proof: the SHA1 hash of revision #102 of [Edit: filename redacted] is [Edit: hash redacted].) I'm posting through Tor for obvious reasons.
Windows is indeed slower than other operating systems in many scenarios, and the gap is worsening. The cause of the problem is social. There's almost none of the improvement for its own sake, for the sake of glory, that you see in the Linux world.
Granted, occasionally one sees naive people try to make things better. These people almost always fail. We can and do improve performance for specific scenarios that people with the ability to allocate resources believe impact business goals, but this work is Sisyphean. There's no formal or informal program of systemic performance improvement. We started caring about security because pre-SP3 Windows XP was an existential threat to the business. Our low performance is not an existential threat to the business.
See, component owners are generally openly hostile to outside patches: if you're a dev, accepting an outside patch makes your lead angry (due to the need to maintain this patch and to justify in in shiproom the unplanned design change), makes test angry (because test is on the hook for making sure the change doesn't break anything, and you just made work for them), and PM is angry (due to the schedule implications of code churn). There's just no incentive to accept changes from outside your own team. You can always find a reason to say "no", and you have very little incentive to say "yes".
There's also little incentive to create changes in the first place. On linux-kernel, if you improve the performance of directory traversal by a consistent 5%, you're praised and thanked. Here, if you do that and you're not on the object manager team, then even if you do get your code past the Ob owners and into the tree, your own management doesn't care. Yes, making a massive improvement will get you noticed by senior people and could be a boon for your career, but the improvement has to be very large to attract that kind of attention. Incremental improvements just annoy people and are, at best, neutral for your career. If you're unlucky and you tell your lead about how you improved performance of some other component on the system, he'll just ask you whether you can accelerate your bug glide.
Is it any wonder that people stop trying to do unplanned work after a little while?
Another reason for the quality gap is that that we've been having trouble keeping talented people. Google and other large Seattle-area companies keep poaching our best, most experienced developers, and we hire youths straight from college to replace them. You find SDEs and SDE IIs maintaining hugely import systems. These developers mean well and are usually adequately intelligent, but they don't understand why certain decisions were made, don't have a thorough understanding of the intricate details of how their systems work, and most importantly, don't want to change anything that already works.
These junior developers also have a tendency to make improvements to the system by implementing brand-new features instead of improving old ones. Look at recent Microsoft releases: we don't fix old features, but accrete new ones. New features help much more at review time than improvements to old ones.
(That's literally the explanation for PowerShell. Many of us wanted to improve cmd.exe, but couldn't.)
- We can't touch named pipes. Let's add %INTERNAL_NOTIFICATION_SYSTEM%! And let's make it inconsistent with virtually every other named NT primitive.
- We can't expose %INTERNAL_NOTIFICATION_SYSTEM% to the rest of the world because we don't want to fill out paperwork and we're not losing sales because we only have 1990s-era Win32 APIs available publicly.
- We can't touch DCOM. So we create another %C#_REMOTING_FLAVOR_OF_THE_WEEK%!
- XNA. Need I say more?
- Why would anyone need an archive format that supports files larger than 2GB?
- Let's support symbolic links, but make sure that nobody can use them so we don't get blamed for security vulnerabilities (Great! Now we get to look sage and responsible!)
- We can't touch Source Depot, so let's hack together SDX!
- We can't touch SDX, so let's pretend for four releases that we're moving to TFS while not actually changing anything!
- Oh god, the NTFS code is a purple opium-fueled Victorian horror novel that uses global recursive locks and SEH for flow control. Let's write ReFs instead. (And hey, let's start by copying and pasting the NTFS source code and removing half the features! Then let's add checksums, because checksums are cool, right, and now with checksums we're just as good as ZFS? Right? And who needs quotas anyway?)
- We just can't be fucked to implement C11 support, and variadic templates were just too hard to implement in a year. (But ohmygosh we turned "^" into a reference-counted pointer operator. Oh, and what's a reference cycle?)
Look: Microsoft still has some old-fashioned hardcore talented developers who can code circles around brogrammers down in the valley. These people have a keen appreciation of the complexities of operating system development and an eye for good, clean design. The NT kernel is still much better than Linux in some ways --- you guys be trippin' with your overcommit-by-default MM nonsense --- but our good people keep retiring or moving to other large technology companies, and there are few new people achieving the level of technical virtuosity needed to replace the people who leave. We fill headcount with nine-to-five-with-kids types, desperate-to-please H1Bs, and Google rejects. We occasionally get good people anyway, as if by mistake, but not enough. Is it any wonder we're falling behind? The rot has already set in.
Edit: This anonymous poster contacted me, still anonymously, to make a second statement, worried by the attention his words are getting:
All this has gotten out of control. I was much too harsh, and I didn't intend this as some kind of massive exposé. This is just grumbling. I didn't appreciate the appetite people outside Microsoft have for Kremlinology. I should have thought through my post much more thoroughly. I want to apologize for presenting a misleading impression of what it's like on the inside.
First, I want to clarify that much of what I wrote is tongue-in-cheek and over the top --- NTFS does use SEH internally, but the filesystem is very solid and well tested. The people who maintain it are some of the most talented and experienced I know. (Granted, I think they maintain ugly code, but ugly code can back good, reliable components, and ugliness is inherently subjective.) The same goes for our other core components. Yes, there are some components that I feel could benefit from more experienced maintenance, but we're not talking about letting monkeys run the place. (Besides: you guys have systemd, which if I'm going to treat it the same way I treated NTFS, is an all-devouring octopus monster about crawl out of the sea and eat Tokyo and spit it out as a giant binary logfile.)
In particular, I don't have special insider numbers on poaching, and what I wrote is a subjective assessment written from a very limited point of view --- I watched some very dear friends leave and I haven't been impressed with new hires, but I am *not* HR. I don't have global facts and figures. I may very well be wrong on overall personnel flow rates, and I shouldn't have made the comment I did: I stated it with far more authority than my information merits.
Windows and Microsoft still have plenty of technical talent. We do not ship code that someone doesn't maintain and understand, even if it takes a little while for new people to ramp up sometimes. While I have read and write access to the Windows source and commit to it once in a while, so do tens and tens of thousands of other people all over the world. I am nobody special. I am not Deep Throat. I'm not even Steve Yegge. I'm not the Windows equivalent of Ingo Molnar. While I personally think the default restrictions placed on symlinks limited their usefulness, there *was* a reasoned engineering analysis --- it wasn't one guy with an ulterior motive trying to avoid a bad review score. In fact, that practically never happens, at least consciously. We almost never make decisions individually, and while I maintain that social dynamics discourage risk-taking and spontaneous individual collaboration, I want to stress that we are not insane and we are not dysfunctional. The social forces I mentioned act as a drag on innovation, and I think we should do something about the aspects of our culture that I highlighted, but we're far from crippled. The negative effects are more like those incurred by mounting an unnecessary spoiler on a car than tearing out the engine block. What's indisputable fact is that our engineering division regularly runs and releases dependable, useful software that runs all over the world. No matter what you think of the Windows 8 UI, the system underneath is rock-solid, as was Windows 7, and I'm proud of having been a small part of this entire process.
I also want to apologize for what I said about devdiv. Look: I might disagree with the priorities of our compiler team, and I might be mystified by why certain C++ features took longer to implement for us than for the competition, but seriously good people work on the compiler. Of course they know what reference cycles are. We're one of the only organizations on earth that's built an impressive optimizing compiler from scratch, for crap's sake.
Last, I'm here because I've met good people and feel like I'm part of something special. I wouldn't be here if I thought Windows was an engineering nightmare. Everyone has problems, but people outside the company seem to infuse ours with special significance. I don't get that. In any case, I feel like my first post does wrong by people who are very dedicated and who work quite hard. They don't deserve the broad and ugly brush I used to paint them.
P.S. I have no problem with family people, and want to retract the offhand comment I made about them. I work with many awesome colleagues who happen to have children at home. What I really meant to say is that I don't like people who see what we do as more of a job than a passion, and it feels like we have a lot of these people these days. Maybe everyone does, though, or maybe I'm just completely wrong.