JEP draft: Universal Generics (Preview)


Allow Java type variables to range over both reference types and primitive value types. Produce warnings when a type variable or a primitive value type might be assigned null. This is a preview language feature.

Non-Goals

The core primitive value types feature is introduced in JEP 401. This JEP is only concerned with supporting primitive value types as type arguments.

In the future, we expect the JVM to optimize the performance of primitive value type parameterizations, with help from the Java compiler. But for now, generics continue to be implemented via erasure.

Significant adjustments to generic standard library code are expected in response to these language changes, but those adjustments will be pursued in a separate JEP. Future work may also refactor implementations of hand-specialized primitive code.

Motivation

A common programming task is to take code that solves a problem for values of a particular type and extend that code to work on values of other types. Java developers can use three different strategies to perform this task:

  • Hand-specialized code. Rewrite the same code multiple times (perhaps with copy and paste), using different types each time.

  • Subtype polymorphism. Change the types in the solution to be a common supertype of all anticipated operand types.

  • Parametric polymorphism. Replace the types in the solution with type variables, instantiated by callers with whatever types they need to operate on.

The java.util.Arrays.binarySearch methods are a good illustration of all three strategies:

static int binarySearch(Object[] a, Object key)
static <T> int binarySearch(T[] a, T key, Comparator<? super T> c)
static int binarySearch(char[] a, char key)
static int binarySearch(byte[] a, byte key)
static int binarySearch(short[] a, short key)
static int binarySearch(int[] a, int key)
static int binarySearch(long[] a, long key)
static int binarySearch(float[] a, float key)
static int binarySearch(double[] a, double key)

The first variant uses subtype polymorphism. It works for all arrays of reference types, which share the common supertype Object[]. The search key can, similarly, be any object. The behavior of the method depends on dynamic properties of the arguments—at run time, do the array components and key support comparison to each other?

The second variant uses parametric polymorphism. It also works for all arrays of reference types, but asks the caller to provide a comparison function. The parameterized method signature ensures at compile time, for each call site, that the array components and key are of types supported by the provided comparison function.

Additional variants use hand specialization. These work on arrays of basic primitive types, which do not have a useful common supertype. Unfortunately, this means there are 7 different copies of a nearly-identical method, adding a lot of noise to the API specification and violating the DRY principle.

Primitive value types, introduced in JEP 401, are a new kind of type, allowing developers to operate directly on custom-defined primitives. Primitive values have lightweight conversions to reference types, and can then participate in subtyping relationships. Arrays of primitive values also support these conversions (e.g., an array of values can be treated as an Object[]). Thus, primitive value types work out of the box with APIs that rely on subtype polymorphism, like the Object[] variant of binarySearch.

Unfortunately, Java's approach to parametric polymorphism is designed only for reference types. Thus, a primitive value type, just like a basic primitive type (int, double, etc.), cannot be a type argument. If Point is a primitive value type, an attempt to sort an array of Points with a comparison function would require choosing a reference type as the instantiation of T, and then providing a comparison function that works on all values of that reference type. Primitive value types do come with a sharp companion reference type—in this case, Point.ref—but using a type like Point.ref as a type argument leads to a few problems:

  • The natural way to write a comparison function for Points is with parameters of type Point. But to work with the generic Comparator interface, a lambda expression would need to declare parameters of type Point.ref. (Similarly, if the code needed to store the Comparator in a local variable, the variable's type would be Comparator<Point.ref>.)

  • The parameter type Point.ref raises the possibility of null inputs, which the function would need to appropriately respond to (perhaps with non-null assertions).

  • Most significantly, in the future, we'd like to optimize calls to the comparison function by passing flattened Point values directly in registers. But the reference type Point.ref interferes with flattening.

For these reasons, it would be useful if most generic APIs supported primitive value types in addition to reference types. The language can achieve this by relaxing the requirement that type arguments must be reference types, and adjusting the treatment of type variables, bounds, and inference accordingly.

A significant implication that developers need to account for is that a universal type variable might now represent a type that does not permit null. Java compilers can produce warnings, much like the unchecked warnings introduced in Java 5, to alert developers to this possibility. And the language can provide some new features to address the warnings.

Returning to the problem of hand-specializing for the basic primitive types, in JEP 402 the language will be updated to treat the basic primitive types as primitive value types. At that point, the basic primitives will be able to take advantage of both subtype and parametric polymorphism, and future APIs will no longer need to produce hand specializations for each basic primitive type. Type variables will range over all Java types.

Description

The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

Type variables and bounds

Previously, Java's type variable bounds were interpreted according to the language's subtyping relation. We now say that a type S is bounded by a type T if any of the following is true:

  • S is a subtype of T (where every type is a subtype of itself, and reference types are subtypes of many other types, per their class declarations and other subtyping rules)

  • S is a primitive value type whose corresponding reference type is bounded by T

  • S is a type variable with an upper bound that is bounded by T; or T is a type variable with a lower bound, and S is bounded by the lower bound of T

As usual, type variables are declared with upper bounds, and those declared without bounds (<T>) implicitly have upper bound Object (<T extends Object>). Any type may act as an upper bound, and any type may be provided as a type argument instantiating a type variable, as long as the type argument is bounded by the type variable's upper bounds.

Where Point is a primitive value type, the type List<Point> is valid, because Point is bounded by Object.

Type variables can thus range over almost any type, and are no longer assumed to represent a reference type.

Wildcards also have bounds, which again may be any type. Similar bounds checks are performed when testing that one parameterized type is a subtype of another.

Where primitive class Point implements Shape, the type List<Point> is a subtype of List<? extends Shape>, and the type List<Shape> is a subtype of List<? super Point>, because Point is bounded by Shape.

Type argument inference is enhanced to support inferring primitive value types. Because primitive value types are "lower" than reference types in the bounded by graph, when an inference variable has no equality bounds, inference will prefer a primitive value lower bound.

The invocation List.of(new Point(3.0, -1.0)) generally has inferred type List<Point>; if it occurs in an assignment context, with target type Collection<Point.ref>, it has inferred type List<Point.ref>.

These changes to type variables, bounds checking, and inference are applied automatically. Many generic APIs will smoothly handle primitive value types without any intervention from API authors.

(To do: In combination with JEP 402, there is some source compatibility risk due to type inference preferring int over Integer in existing code. There is also a risk that users of a migrated, reference-favoring primitive class will encounter unexpected .val types. Requires further exploration.)

Null pollution and null warnings

References can be null, but primitive value types are not reference types, so JEP 401 prohibits assigning null to a primitive value type.

Point p = null; // error

When we allow type variables to range over a wider set of types, we must ask developers to make fewer assumptions about the instantiations of a type variable. Specifically, it is usually improper to assign null to a variable with a type variable type, because the type variable may be instantiated by a primitive value type.

class C<T> { T x = null; /* shouldn't do this */ }
C<Point> c = new C<Point>();
Point p = c.x; // error

In this example, the type of the field x is erased to Object, so at run time a C<Point> will happily store a null, even though this violates the expectations of the compile-time type. This scenario is an example of null pollution, a new kind of heap pollution. Like other forms of heap pollution, the problem is detected at run time when the program attempts to assign a value to a variable whose erased type does not support it (in this case, the assignment to p).

As for other forms of heap pollution, the compiler produces null warnings to discourage null pollution:

  • A warning occurs when a null literal is assigned to a type variable type.

  • A warning occurs when a non-final field with a type variable type is left uninitialized by a constructor.

(There are also null warnings for certain value conversions, discussed in a later section.)

class Box<T> { T x; public Box() {} // warning: uninitialized field T get() { return x; } void set(T newX) { x = newX; } void clear() { x = null; // warning: null assignment } T swap(T oldX, T newX) { T currentX = x; if (currentX != oldX) return null; // warning: null assignment x = newX; return oldX; } }

A significant portion of existing generic code produces null warnings, having been written with an understanding that type variables are reference types. Developers are encouraged, as they are able, to update their code to eliminate sources of null pollution.

Generic code that compiles without null warnings can safely be instantiated with primitive value types: it doesn't introduce null pollution or risk downstream NullPointerExceptions.

In a future release, the physical layout of generic code will be specialized for each primitive value type. At that point, null pollution will be detected earlier, and code that has failed to address the warnings may become unusable. Code that has addressed the warnings is specialization-ready: future JVM enhancements will not disrupt the program's functionality.

Reference type variable types

When generic code needs to work with null, the language offers a few special features to ensure that a type variable type is a (null-friendly) reference type.

  • A type variable that is bounded by IdentityObject (either directly or via an identity class bound) is always a reference type.

    class C<T extends Reader> { T x = null; /* ok */ } FileReader r = new C<FileReader>().x;
  • A type variable whose declaration is modified by the contextual keyword ref prohibits non-reference type arguments, and thus is always a reference type.

    class C<ref T> { T x = null; /* ok */ } FileReader r = new C<FileReader>().x;
    Point.ref p = new C<Point.ref>().x;
  • A type variable use may be modified by the syntax .ref, which represents a mapping from the instantiating type to its tightest bounding reference type (e.g., Point maps to Point.ref, while FileReader maps to FileReader).

    class C<T> { T.ref x = null; /* ok */ } FileReader r = new C<FileReader>().x;
    Point.ref p = new C<Point.ref>().x;
    Point.ref p2 = new C<Point>().x;

(New syntax above is subject to change.)

In the last case, the types T and T.ref are two distinct type variable types. Assignments between the two types are allowed, as a form of reference conversion or value conversion.

class C<T> { T.ref x = null; void set(T arg) { x = arg; /* ok */ }
}

A type variable that is bounded by IdentityObject or declared with the ref modifier is a reference type variable. All other type variables are called universal type variables.

Similarly, a type that names a reference type variable or has the form T.ref is called a reference type variable type, while a type that names a universal type variable without .ref is called a universal type variable type.

Warnings on value conversion

Primitive value conversions allow a primitive reference type to be converted to a primitive value type, mapping an object reference to the object itself. Per JEP 401, if the reference is null, the conversion fails at run time.

Point.ref pr = null;
Point p = pr; // NullPointerException

When value conversion is applied to a type variable type, there is no runtime check, but the conversion may be a source of null pollution.

T.ref tr = null;
T t = tr; // t is polluted

To help prevent both NullPointerExceptions and null pollution, value conversions produce null warnings unless the compiler can prove that the reference being converted is non-null.

class C<T> { T.ref x = null; T get() { return x; } // warning: possible null value conversion T.ref getRef() { return x; }
} C<Point> c = new C<>();
Point p1 = c.get();
Point p2 = c.getRef(); // warning: possible null value conversion

If a parameter, local variable, or final field has a reference type variable type, the compiler may be able to prove, at certain usages, that the variable's value is non-null. In that case, value conversion may occur without a null warning. The proof is similar to the control-flow analysis that determines whether a variable has been initialized before use.

<T> T deref(T.ref val, T alternate) { if (val == null) return alternate; return val; // no warning
}

Parameterized type conversions

Unchecked conversions traditionally allow a raw type to be converted to a parameterization of the same class. These conversions are unsound, and so are accompanied by unchecked warnings.

As developers make changes like applying .ref to certain type variable uses, they may end up with parameterized types (e.g., List<T.ref>) in API signatures that are out of sync with other code. To smooth migration, the allowed set of unchecked conversions is expanded to include the following parameterized-to-parameterized conversions:

  • Changing a type argument of a parameterized type from a universal type variable type (T) to its reference type (T.ref), or vice versa

    List<T.ref> newList() { return Arrays.asList(null, null); }
    List<T> list = newList(); // unchecked warning
  • Changing a type argument of a parameterized type from a primitive value type (Point, LocalDate.val) to its reference type (Point.ref, LocalDate), or vice versa

    void plot(Function<Point.ref, Color> f) { ... }
    Function<Point, Color> gradient = p -> Color.gray(p.x());
    plot(gradient); // unchecked warning
  • Changing a wildcard bound in a parameterized type from a universal type variable type (T) or a primitive value type (Point, LocalDate.val) to its reference type (T.ref, Point.ref, LocalDate), or vice versa (where the conversion is not already allowed by subtyping)

    Supplier<? extends T.ref> nullFactory() { return () -> null; }
    Supplier<? extends T> factory = nullFactory(); // unchecked warning
  • Recursively applying an unchecked conversion to any type argument or wildcard bound of a parameterized type

    Set<Map.Entry<String, T>> allEntries() { ... }
    Set<Map.Entry<String, T.ref>> entries = allEntries(); // unchecked warning

These unchecked conversions may seem easily avoidable in small code snippets, but the flexibility they offer will significantly ease migration as different program components or libraries adopt universal generics at different times.

In addition to unchecked assignments, these conversions can be used by unchecked casts and method overrides.

interface Calendar<T> { Set<T> get(Set<LocalDate> dates);
} class CalendarImpl<T> implements Calendar<T> { Set<T.ref> get(Set<LocalDate.val> dates) { ... } // unchecked warning
}

Compiling to class files

Generic classes and methods continue to be implemented via erasure: generated bytecode replaces type variables with their erased bounds. So, within generic APIs, primitive objects will generally be operated on as references.

The usual rules for detecting heap pollution apply: casts are inserted at certain program points to assert that a value has the expected runtime type. In the case of primitive value types, this includes checking that the value is non-null.

The Signature attribute is extended to encode additional forms of compile-time type information:

  • Type variables declared as ref T

  • Type variable uses of the form T.ref

  • Primitive value types appearing as type arguments and type variable/wildcard bounds

Alternatives

We could ask developers to always use primitive reference types when making use of generic APIs. This is not a very good solution, as argued in the Motivation section.

We could also ask API authors to opt in to universal type variables, rather than making type variables universal by default. But the goal is for universal generics to be the norm, and in practice there's no reason most type variables can't be universal. An opt in would introduce too much friction and lead to a fragmented Java ecosystem.

As noted, the erasure-based compilation strategy does not allow for the performance we might hope from generic APIs operating on primitive objects. In the future we expect to enhance the JVM to allow for compilation that produces heterogenous classes specialized to different type arguments. But by prioritizing language changes first in this JEP, developers can write more expressive code now and make their generic APIs specialization-ready, while anticipating performance improvements in the future.

We could avoid introducing new warnings and accept null pollution as a routine fact of programming with primitive value types. This would make for a "cleaner" compilation experience, but the unpredictability of generic APIs at run time would not be pleasant. Ultimately, we want developers who use null in generic APIs to notice and think carefully about how their usage interacts with primitive value types.

In the other extreme, we could treat some or all of the warnings as errors. But we don't want to introduce source and migration incompatibilities—legacy code and users of legacy APIs should still successfully compile, even if there are new warnings.

Risks and Assumptions

The success of these features depends on Java developers learning about and adopting an updated model for the interaction of type variables with null. New warnings will be highly visible, and they will need to be understood and appreciated, not ignored, for them to have their desired effect.

Making these features available before specialized generics presents some challenges. Some developers may be dissatisfied with the performance (comparing ArrayList<Point> to Point[], for example), and develop incorrect long-term intuitions about the costs of using generics for primitive value types. Other developers may make suboptimal choices when applying .ref, not noticing any ill effects until running on a specialization-supporting VM, long after the code has been changed.

Dependencies

JEP 401, Primitive Objects, is a prerequisite.

A followup JEP will update the standard libraries, addressing null warnings and making the libraries specialization-ready.

Another followup JEP will introduce runtime specialization of generic APIs in the JVM.