My previous posts on modules have received a lot of attention. I’m happy that I’ve been able to kick-start a lot of conversation, but I’ve also seen that a large part of the community is still unclear on what modules actually are.

There is a lot of ground to cover. I can’t do it all in one sitting, and I doubt you’d want to read the entire thing in one go. I’ll be breaking this up, starting at the most high-level aspects and drilling down over time. I intend these posts will clarify and discuss what modules are, what they can do, and what they are intended to do, what they cannot do, and how they are used.

Here is a hello-world program using C++ Modules:

export module speech; export const char* get_phrase() { return "Hello, world!";
}
// main.cpp
import speech; import <iostream>; int main() { std::cout << get_phrase() << '\n';
}

This is an extremely straightforward example. We have a single source file that exposes a module speech. Our main.cpp imports speech and uses the single function defined therein: get_phrase.

The effect of importing a module is to make the exported entities declared within that module to become visible by the importing translation unit. No more. No less.

A few questions (which will be answered below): Why export module instead of just module? How are modules named? What are the rules on import? What does export do? What are these “partition” things people keep talking about?

Module Names

A module is identified by the aptly named module-name. The following is the grammar of module-name:

module-name: [module-name-qualifier] identifier ; module-name-qualifier: identifier "." | module-name-qualifier identifier "." ;

This means that a module’s name is some non-zero number of identifiers joined by a literal dot .. The identifier rules are the same as the rest of the language, except that the identifiers export and module may not be used in a module name, obviously.

What’s the significance of the dot . ? Literally nothing. It’s purely for the benefit of you, the developer. A module named boost.asio.async_completion makes it easier to understanding the logical hierarchy than a module named boost_asio_async_completion, but there is no semantic difference between the two naming styles according to the standard.

A New C++ Source Entity

For all C++’s history, there has been one standard concept to encapsulate the idea of a C++ source unit: The translation unit.

C++ Modules introduce a new type of translation unit called a module unit. The definition is fairly simple:

A module unit is a translation unit that contains a module-declaration.

What is a “module declaration”? We’ve already seen one in our example. The grammar is very simple:

module-declaration: ["export"] "module" module-name [module-partition] [attribute-specifier-seq] ";" ; module-partition: ":" module-name ;

Basically, any file that contains a module line at the top level is a module unit. (The next section will cover the meaning of module-partition).

Important subdivisions: There are several different types of module units, and it is important to understand the meaning of each of them:

  • A module interface unit is a module unit where the module-declaration contains the export keyword. There can be any number of these in a module.
  • A module implementation unit is any module unit that is not a module interface unit (Does not have the export keyword in the module-declaration).
  • A module partition is a module unit where the module-declaration contains the module-partition component.
  • A module interface partition is a module interface unit that is also a module partition (Contains both the export keyword and the module-partition component).
  • A module implementation partition is a module implementation unit that is also a module partition. (Contains the module-partition component but no export keyword).
  • The primary module interface unit is the non-partition module unit that is also a module interface unit. There must be exactly one primary interface unit in a module. All other module interface units must be module partitions.

It isn’t named explicitly, and it isn’t entirely clear from the above reading, but it is possible to have a module implementation unit that is not a partition. A module-declaration without export and without a module-partition label is a module implementation unit. There is no way to import this module unit from another file, but it is useful to provide the implementation of entities declared in different module units.

Modules Partitions

Just like with C++ headers, there is no requirement that modules be split and subdivided into multiple files. Nevertheless, large source files can become intractable, so C++ modules also have a way to subdivide a single module into distinct translation units that are merged together to form the total module. These subdivisions are known as partitions.

Suppose we have two enormous, cumbersome, and unwieldy functions that we don’t want to include in the same module:

export module speech; export const char* get_phrase_en() { return "Hello, world!";
} export const char* get_phrase_es() { return "¡Hola Mundo!";
}

Whoa! That’s a ton of code! Let’s subdivide it using partitions:

// speech.cpp
export module speech; export import :english;
export import :spanish;
// speech_english.cpp
export module speech:english; export const char* get_phrase_en() { return "Hello, world!";
}
// speech_spanish.cpp
export module speech:spanish; export const char* get_phrase_es() { return "¡Hola Mundo!";
}
// main.cpp
import speech; import <iostream>;
import <cstdlib>; int main() { if (std::rand() % 2) { std::cout << get_phrase_en() << '\n'; } else { std::cout << get_phrase_es() << '\n'; }
}

What’s going on here?

  • We have one module named speech
  • Speech has two partitions: english and spanish
  • The syntax export module <module-name>:<part-name> declares that the given module unit is a module interface partition belonging to <module-name> with the partition named by <part-name>.
  • The syntax import :<part-name> (with a leading colon) imports the partition named by <part-name>. The given <part-name> must belong to the importing module. Translation units that are not module units of a module A are not allowed to import partitions from A.
  • The syntax export import :<part-name> makes the exported entities in the module partition visible as part of the module interface.

The name of a module partition follows the same rules as module names, except private is not allowed.

When a user imports a module, then all entities described in all module interface units for that module become visible in the importing file. Remember: module interface partitions are module interface units.

The module unit that contains export module <module-name> (with no partition name) is known as the primary module interface unit. There must be exactly one primary module interface unit, but any number of module interface partitions.

In the above example, get_phrase_en and get_phrase_es both live in the speech module. The subdivision into partitions is not exposed to users.

The module interface is defined as the union of all module interface units within that module.

We can classify the above source files as such:

  • speech.cpp is the primary module interface unit.
  • speech_english.cpp and speech_spanish.cpp are module interface partitions.
  • main.cpp is a regular translation unit.

Another way we could have subdivided the prior example might look like this:

// speech.cpp
export module speech; export import speech.english;
export import speech.spanish;
// speech_english.cpp
export module speech.english; export const char* get_phrase_en() { return "Hello, world!";
}
// speech_spanish.cpp
export module speech.spanish; export const char* get_phrase_es() { return "¡Hola Mundo!";
}
// main.cpp
import speech; import <iostream>;
import <cstdlib>; int main() { if (std::rand() % 2) { std::cout << get_phrase_en() << '\n'; } else { std::cout << get_phrase_es() << '\n'; }
}

Instead of using partitions, we move the declarations of get_phrase_en and get_phrase_es into their own modules, and the speech module export imports them. The export import <name> syntax declares that users who import the module will transitively import the module of the given name.

The content of main.cpp is unaffected between the two layouts. It implicitly imports speech.english and speech.spanish by its import of speech.

If you familiar with some other languages’ module designs, it should be noted that this is not valid:

// speech.cpp
export module speech; // NOT OK:
export import .english;
export import .spanish;

Python users might be familiar with such syntax as a qualified relative import, where the leading . tells the module mechanism to look for a sibling module with the given name. C++ Modules do not work like this as their is no intrinsic hierarchy. The compiler sees no relationship between modules speech, speech.english, and speech.spanish. The above snippet is completely nonsensical in the eyes of the language.

speech.english and speech.spanish are not submodules of speech. They are completely disjoint modules.

So what’s the deal? If both of these appear identical to downstream users, why would you choose one over the other? The answer is, of course, tradeoffs:

When using partitions, every entity in the interface partitions is part of the same module. The module that owns an entity is intended to be part of that entity’s ABI! This means that moving an entity from one module to another is potentially ABI breaking. (It should be noted that this author strongly discourages mixing and matching library versions in such a way that this would be a problem, but that’s another topic.)

When using “submodules”, you give users the ability to be more granular in what they import. Despite the potential speed-up from modules, an import boost; that imports the entirety of Boost could be deathly expensive to compile times!

Module Implementation Units

So far we’ve poked at module interface units, but there’s another type of module unit: The module implementation unit.

A module implementation unit is a module unit which does not have the export keyword before the module keyword in its module-declaration. The implementation units belong to a named module. Entities declared within an implementation unit are visible only to the module of which they belong. This has the probable advantage of keeping details hidden and possible benefit of helping accelerate incremental builds as modification to the implementation units might not effect downstream modules. The details of when these benefits kick in and when they are inhibited will be a subject of a future post.

Here’s what our example would look like if we use implementation units:

// speech.cpp
export module speech; import :english;
import :spanish; export const char* get_phrase_en();
export const char* get_phrase_es();
// speech_english.cpp
module speech:english; const char* get_phrase_en() { return "Hello, world!";
}
// speech_spanish.cpp
module speech:spanish; const char* get_phrase_es() { return "¡Hola Mundo!";
}

This looks similar to before, but has a few changes:

  • The import of the partitions in the primary interface unit no longer has the export keyword.
  • The module declarations in the partitions are also missing the export keyword. This makes the partitions implementation partitions.
  • The functions are defined in the implementation units without the export keyword. Their exported-ness is not relevant in the implementation units.
  • The functions are declared in the primary interface unit with the export keyword. This makes the functions part of the module interface, even if not defined.

This is analogous to having a header file that declares two functions and two source files that define those functions. Changes to the implementation units do not effect the interface, and there is no way to modify the module interface by making changes to the implementation units.

I’ve been thinking about it, and it doesn’t seem like we have too much code to divide up into multiple files. Still, I’d like to be able to save on incremental build times when I only modify the implementation. Can I do that without partitions? Yes!

// speech.cpp
export module speech; export const char* get_phrase_en();
export const char* get_phrase_es();
// speech_impl.cpp
module speech; const char* get_phrase_en() { return "Hello, world!";
} const char* get_phrase_es() { return "¡Hola Mundo!";
}

The partition syntax has disappeared, and we only have two source files for our module. The module speech; declaration (without export) declares that module unit to be an implementation unit (not a partition). There is no way to import this file separately, nor is there reason to do so. These two files will be used to create the module speech.

Restrictions on [export] {import,module} and Partitions

The token sequence export import may look weird at first, but it’s meaning is just what it sounds like: Import the given module, and then export that import so that downstream importers will also import the same module transitively.

Because of the way they are defined, there are a few important restrictions on how you can import and export module partitions.

export import is only allowed for interface partitions

The following is not allowed:

export module A; export import :Foo; // NO! :Foo is not an interface unit!

The reasoning is fairly simple: Because A:Foo does not contribute to the interface of A, it is nonsensical to propagate the entities of A:Foo to importers.

export module must appear once per module

Suppose we are defining a module Cats. In order to define it, we’ll need at least one module interface unit that is not a partition of Cats.

export module Cats; export void meow();
export void purr();

What happens if we add another module unit that extends Cats?

export module Cats; export void hiss();

Not allowed! The export module without a partition name is the primary interface unit, and there can only be one.

The current nature of compilers could not cope with such a design. Two compiler invocations would be unable to know if these should be “merged,” or if you’ve very quickly rewritten and moved the definition of Cats to another file. A compiler could theoretically see both files simultaneously and do the merge, but then you have questions about how the two files interact. Supporting such a design would be very complex with little benefit.

Additionally, supporting multiple primary interface units raises another question: What stops a user from injecting things into someone else’s module by defining another Cats file?

export is not allowed in implementation units

This one is self-explanatory. Allowing an implementation unit to export an entity (or module import) is senseless.

All interface partitions must be re-exported from the primary interface unit

Because an interface partition may extend the interface of a module (i.e. introduce entities not declared in the primary interface unit), it is essential that a compiler be able to see the entirety of a module’s interface just by looking into the primary interface unit.

export module Cats:Sounds; export void meow();
export void hiss();
export module Cats:Behaviors; export void eat();
export void sleep();
export module Cats; export import :Sounds;
// importer.cpp
import Cats; void foo() { meow(); // Okay
 hiss(); // Okay
 eat(); // Er...
}

importer.cpp uses eat, and eat is defined with export in an interface partition, but how is the compiler supposed to know that eat is a member of Cats? The primary Cats unit does not export-import :Behaviors!

The program above is ill-formed, no diagnostic required! NDR is one of the most frightening terms in the C++ standard. If often means undefined behavior.

However: You can reasonably expect that this issue will most often result in a compile error, as the definition of eat is not visible. It may simply be unclear why eat is not visible when you can clearly see export void eat(); defined in export module Cats:Behaviors. So you’ll most likely see a diagnostic, just not a diagnostic related to the missing re-export.

Module implementation units are spooky beasts

One can define a module interface and implementation in separate files without the need for partitions. The may be unintuitive, but is perfectly reasonable upon inspection.

export module Cats; export void dream();
export void sleep();
// cats_sleep.cpp
module Cats; import sleep_info; void sleep() { if (is_rem_sleep()) { dream(); }
}

In the above, cats_sleep.cpp is an implementation unit for Cats. The primary interface unit makes no mention of it, so you might expect this to be an issue. What’s going on?

The answer is that the Cats interface unit contains sufficient information that downstream users can import and use the module successfully. They need not see the implementation of sleep and dream, so there is no need to mention where it is defined. It can be assumed that the definitions will be resolved by the linker.

Another thing you may notice: cats_sleep.cpp calls dream, but we never import or declare it! This is perfectly fine: non-partition module implementation units implicitly import the module of which they are a member. Since there is no way for the interface to import this anonymous implementation unit, there is no risk of cyclic imports.

This begs a question: Why make a module implementation partition at all? You could just make anonymous implementation units instead, right?

Not always. Even though the entities in implementation units are not visible to importers, they are visible to other members of the module as long as they are imported. This is one possible way you might use PIMPL with modules (and before you ask: There are still good reasons to use PIMPL with C++ modules)

module Gadgets:PrivWidget; struct PrivWidget { // ...
};
export module Gadgets; import :PrivWidget; export class Widget { std::unique_ptr<PrivWidget> _data; // ...
}; export Widget get_widget() { auto priv = std::make_unique<PrivWidget>(1, 2, 3); return Widget{std::move(priv)};
}

C++ modules are a huge change. To discuss every aspect of them will take more than a single post. We’ve covered the bare basics, but there is so much more to discuss.

Watch this space for follow-ups!