NULL came from C. It interfered with type-safety (it depends on an implicit conversion from void* to typed pointers), so C++ introduced 0 as a better way to express null pointers. That led to problems of its own, because 0 isn't a pointer, it's an int. C++11 introduced nullptr, which embodies the idea of a null pointer better than NULL or 0. Yet NULL and 0-as-a-null-pointer remain valid. Why? If nullptr is better than both of them, why keep the inferior ways around? Backward-compatibility, that's why. Eliminating NULL and 0-as-a-null-pointer would break existing programs. In fact, it would probably break every egg in C++'s basket. Nevertheless, I'm suggesting we get rid of NULL and 0-as-a-null-pointer, thus eliminating the confusion and redundancy inherent in having three ways to say the same thing (two of which we discourage people from using). But read on. If I declare a variable of a built-in type and I don't provide an initializer, the variable is sometimes automatically set to zero (null for pointers). The rules for when "zero initialization" takes place are well defined, but they're a pain to remember. Why not just zero-initialize all built-in types that aren't explicitly initialized, thus eliminating not only the pain of remembering the rules, but also the suffering associated with debugging problems stemming from uninitialized variables? Because it can lead to unnecessary work at runtime. There's no reason to set a variable to zero if, for example, the first thing you do is pass it to a routine that assigns it a value.
So let's take a page out of D's book (in particular, page 30 of The D Programming Language) and zero-initialize built-ins by default, but specify that void as an initial value prevents initialization:
int x; // always zero-initialized int x = void; // never zero-initializedThe only effect such a language extension would have on existing code would be to change the initial value of some variables from indeterminate (in cases where they currently would not be zero-initialized) to specified (they would be zero-initialized). That doesn't lead to any backward-compatibility problems in the traditional sense, but I can assure you that some people will still object. Default zero initialization could lead to a few more instructions being executed at runtime (even taking into account compilers' ability to optimize away dead stores), and who wants to tell developers of a finely-tuned safety-critical realtime embedded system (e.g., a pacemaker) that their code might now execute some instructions they didn't plan on? I do. Break those eggs! This does not make me a crazy man. Keep reading. Ten standard containers offer a member function that eliminates all elements with a specified value (or, for map containers, a specified key): list, forward_list, set, multiset, map, multimap, unordered_set, unordered_multiset, unordered_map, unordered_multimap. In eight of these ten containers, the member function is named erase. In list and forward_list, it's named remove. This is inconsistent in two ways. First, different containers use different member function names to accomplish the same thing. Second, the meaning of "remove" as an algorithm is different from that as a container member function: the remove algorithm can't eliminate any container elements, but the remove member functions can. Why do we put up with this inconsistency? Because getting rid of it would break code. Adding a new erase member function to list and forward_list would be easy enough, and it would eliminate the first form of inconsistency, but getting rid of the remove member functions would render code calling them invalid. I say scramble those eggs! Hold your fire. I'm not done yet. C++11's override specifier enables derived classes to make explicit which functions are meant to override virtual functions inherited from base classes. Using override makes it possible for compilers to diagnose a host of overriding-relating errors, and it makes derived classes easier for programmers to understand. I cover this in my trademark scintillating fashion (ahem) in Item 12 of Effective Modern C++, but in a blog post such as this, it seems tacky to refer to something not available online for free, and that Item isn't available for free--at least not legally. So kindly allow me to refer you to this article as well as this StackOverflow entry for details on how using override improves your code.
Given the plusses that override brings to C++, why do we allow overriding functions to be declared without it? Making it possible for compilers to check for overriding errors is nice, but why not require that they do it? It's not like we make type checking optional, n'est-ce pas?You know where this is going. Requiring that overriding functions be declared override would cause umpty-gazillion lines of legacy C++ to stop compiling, even though all that code is perfectly correct. If it ain't broke, don't fix it, right? Wrong!, say I. Those old functions may work fine, but they aren't as clear to class maintainers as they could be, and they'll cause inconsistency in code bases as newer classes embrace the override lifestyle. I advocate cracking those eggs wide open.
Don't get me wrong. I'm on board with the importance of backward compatibility. Producing software that works is difficult and expensive, and changing it is time-consuming and error-prone. It can also be dangerous. There's a reason I mentioned pacemakers above: I've worked with companies who use C++ as part of pacemaker systems. Errors in that kind of code can kill people. If the Standardization Committee is going to make decisions that outlaw currently valid code (and that's what I'd like to see it do), it has to have a very good reason.
Or maybe not. Maybe a reason that's merely decent suffices as long as existing code can be brought into conformance with a revised C++ specification in a way that's automatic, fast, cheap, and reliable. If I have a magic wand that allows me to instantly and flawlessly take all code that uses NULL and 0 to specify null pointers and revises the code to use nullptr instead, where's the downside to getting rid of NULL and 0-as-a-null-pointer and revising C++ such that the only way to specify a null pointer is nullptr? Legacy code is easily updated (the magic wand works instantly and flawlessly), and we don't have to explain to new users why there are three ways to say the same thing, but they shouldn't use two of them. Similarly, why allow overriding functions without override if the magic wand can instantly and flawlessly add override to existing code that lacks it?
The eggs in C++ that I want to break are the old ways of doing things--the ones the community now acknowledges should be avoided. NULL and 0-as-a-null-pointer are eggs that should be broken. So should variables with implicit indeterminate values. list::remove and forward_list::remove need to go, as do overriding functions lacking override. The newer, better eggs are nullptr, variables with indeterminate values only when expressly requested, list::erase and forward_list::erase, and override.
All we need is a magic wand that works instantly and flawlessly.
In general, that's a tall order, but I'm willing to settle for a wand with limited abilities. The flawless part is not up for negotiation. If the wand could break valid code, people could die. Under such conditions, it'd be irresponsible of the Standardization Committee to consider changing C++ without the above-mentioned very good reason. I want a wand that's so reliable, the Committee could responsibly consider changing the language for reasons that are merely decent.
I'm willing to give ground on instantaneousness. The flawless wand must certainly run quickly enough to be practical for industrial-sized code bases (hundreds of millions of lines or more), but as long as it's practical for such code bases, I'm a happy guy. When it comes to speed, faster is better, but for the speed of the magic wand, good enough is good enough.
The big concession I'm willing to make regards the wand's expressive power. It need not perform arbitrary changes to C++ code bases. For Wand 1.0, I'm willing to settle for the ability to make localized source code modifications that are easy to algorithmically specify. All the examples I discussed above satisfy this constraint:
- The wand should replace all uses of NULL and of 0 as a null pointer with nullptr. (This alone won't make it possible to remove NULL from C++, because experience has shown that some code bases exhibit "creative" uses of NULL, e.g., "char c = (char) NULL;". Such code typically depends on undefined behavior, so it's hard to feel too sympathetic towards it, but that doesn't mean it doesn't exist.)
- The wand should replace all variable definitions that lack explicit initializers and that are currently not zero-initialized with an explicit initializer of void.
- The wand should replace uses of list::remove and forward_list::remove with uses of list::erase and forward_list::erase. (Updating the container classes to support the new erase member functions would be done by humans, i.e. by STL implementers. That's not the wand's responsibility.)
- The wand should add override to all overriding functions.
The magic wand exists--or at least the tool needed to make it does. It's called Clang. All hail Clang! Clang parses and performs semantic analysis on C++ source code, thus making it possible to write tools that modify C++ programs. Two of the transformations I discussed above appear to be part of clang-tidy (the successor to clang-modernize): replacing NULL and 0 as null pointers with nullptr and adding override to overriding functions. That makes clang-tidy, if nothing else, a proof of concept. That has enormous consequences. In recent years, the Standardization Committee's approach to backward compatibility has been to preserve it at all costs unless (1) it could be demonstrated that only very little code would be broken and (2) the cost of the break was vastly overcompensated for by a feature enabled by the break. Hence the Committee's willingness to eliminate auto's traditional meaning in C and C++98 (thus making it possible to give it new meaning in C++11) and its C++11 adoption of the new keywords alignas, alignof, char16_t, char32_t, constexpr, decltype, noexcept, nullptr, static_assert, and thread_local. Contrast this with the perpetual deprecation of setting bool variables to true by applying ++ to them. When C++14 was adopted, that construct had been deprecated for some 17 years, yet it remains part of C++. Given its lengthy stint on death row, it's hard to imagine that a lot of code still depends on it, but my guess is that the Committee sees nothing to be gained by actually getting rid of the "feature," so, failing part (2) of the break-backward-compatibility test, they leave it in. Incidentally, code using ++ to set a bool to true is another example of the kind of thing that a tool like clang-tidy should be able to easily perform. (Just replace the use of ++ with an assignment from true.)
Clang makes it possible for the Standardization Committee to retain its understandable reluctance to break existing code without being quite so conservative about how they do it. Currently, the way to avoid breaking legacy software is to ensure that language revisions don't affect it. The sole tool in the backward-compatibility toolbox is stasis: change nothing that could affect old code. It's a tool that works, and make no mistake about it, that's important. The fact that old C++ code continues to be valid in modern C++ is a feature of great importance to many users. It's not just the pacemaker programmers who care about it.Clang's contribution is to give the Committee another way to ensure backward compatibility: by recognizing that tools can be written to automatically modify old code to conform to revised language specifications without any change in semantics. Such tools, provided they can be shown to operate flawlessly (i.e., they never produce transformed programs that behave any differently from the code they're applied to) and at acceptable speed for industrial-sized code bases, give the Standardization Committee more room to get rid of the parts of C++ where there's consensus that we'd rather not have them in the language.
Here's how I envision this working:
- Stage 1a: The Standardization Committee identifies features of the language and/or standard library that they'd like to get rid of and whose use they believe can be algorithmically transformed into valid and semantically equivalent code in the current version or a soon-to-be-adopted version of C++. They publish a list of these features somewhere. The Standard is probably not the place for this list. Perhaps a technical report would be a suitable avenue for this kind of thing.
- Stage 1b: Time passes, during which the community has the opportunity to develop tools like clang-tidy for the features identified in Stage 1a and to get experience with them on nontrivial code bases. As is the case with compilers and libraries, the community is responsible for implementing the tools, not the Committee.
- Stage 2a: The Committee looks at the results of Stage 1b and reevaluates the desirability and feasibility of eliminating the features in question. For the features where they like what they see, they deprecate them in the next Standard.
- Stage 2b: More time passes. The community gets more experience with the source code transformation tools needed to automatically convert bad eggs (old constructs) to good ones (the semantically equivalent new ones).
- Stage 3: The Committee looks at the results of Stage 2b and again evaluates the desirability and feasibility of eliminating the features they deprecated in Stage 2a. Ideally, one of the things they find is that virtually all code that used to employ the old constructs has already been converted to use the new ones. If they deem it appropriate, they remove the deprecated features from C++. If they don't, they either keep them in a deprecated state (executing the moral equivalent of a goto to Stage 2b) or they eliminate their deprecated status.
What do you think?