Generics & Variance

By Tushar Gupta

Photo by Sergi Kabrera on Unsplash

Hello Generics & Type-Safety Lovers.

I have been wanting to write a piece on Generics & Variance from a long time but just couldn't start. There are already so many great articles & videos on this topic that I couldn't think of a way to add some value. But here I am making an attempt. If you are an expert, you could validate the content below or If you are a beginner you might just get some new takeaways. Let’s start.

The concept of Generics was first pioneered in a programming language called ML (Meta Language) in 1973. This approach allowed programmers to write common functions that differ only on types those functions operate on. This helped reduce code duplication. For ex:

// Standard ML
// A generic swap function
fun swap (y, x) = (x, y)

Here function swap is of type 'a * 'b -> 'b * 'a.This seemingly strange syntax is used to describe Generic Types in ML and other similar inspired languages. 'a is a type variable, denoting any possible type. It could be an Int, Float, String or any specific Type defined at the time of use case or instantiation. A similar function in Java & Swift would be:

// Java
public <A, B> Pair<B, A> swap(Pair<A, B> pair) {
return Pair.create(pair.second, pair.first);
}
// Swift
func swap<A, B>(_ pair: (A, B)) -> (B, A) {
return (pair.1, pair.0)
}

Here Pair<A,B> in the Java snippet is a Generic Type which can hold any two types. A& B are called Type Parameters. Prior to Generics we would need to write an IntPair, StringPair, int_swap, string_swap separately for every unique type.

These vague parametrized types are called Generics in C#, Java, Rust, Swift, TypeScript and a few others. They are known as Parametric Polymorphism in languages like ML, Haskell, Scala and Templates in C++.

Generics introduce the concept of type parameters, which make it possible to design structures(classes, interfaces) and functions(methods) that defer the specification of one or more types until the time of instantiation by client code.

Here are a few advantages of writing a Generic Code:

  1. Stronger Type Checking at Compile Time.
  2. Allows us to write code which is applicable to many types with the same underlying behavior. E.g: List<T>.
  3. Optimized code.

A shrewd reader will immediately question the 3rd point in the above list. Optimized? In what sense? This point needs some correction. It should be stated as Optimized ByteCode. But, let's first deal with first two points.

Consider these two code snippets below (in Java).

// Java
List<String> words = new ArrayList<>();
words.add("Hello ");
words.add("world!");
String s = words.get(0) + words.get(1);
System.out.println(s); // Hello World
// Java
List words = new ArrayList();
words.add("Hello ");
words.add("world!");
String s = ((String)words.get(0)) + ((String)words.get(1))
System.out.println(s); // Hello World

These snippets produce absolutely identical bytecode. In fact Java Compiler, transforms the 1st snippet, using Generics, into the 2nd snippet. Compiler makes sure that a List<String> only holds a list of String at compile time so we don’t face issues at runtime. Since compiler takes care of automatic casting, it fends of any ClassCastException that could be thrown at Runtime. Thus better type safety. There is also the matter of List<T> read as List of T. For a method such as size() on List all we need is a count of elements and not the underlying type of those elements. Thus better abstraction, free of Type specific information. This trickery used by Java Compiler is called Type Erasure.

There is another way for the compilers to operate on generic code. They can be compiled away to generate separate specialized versions of generic functions. Confused?. Let’s take a very simple example in Rust.

// Rust
fn print_hash<T: Hash>(t: &T) {
println!("The hash is {}", t.hash())
}
print_hash(&true);      // instantiates T = bool
print_hash(&12_i64); // instantiates T = Int64

Java compiler would have only one copy of print_hash function, But Rust compiler actually produces two copies of this function one for each concrete argument type. What’s the advantage? Advantage is that compiler can optimize the code according to the concrete type in picture. This means any call to print_hash will be in-lined at the place of the call; a direct static dispatch call to the relevant implementation saving any indirection in function invocation. This is called Monomorphization — converting a polymorphic code to monomorphic code.

// Compiled Code
__print_hash_bool(&true);
__print_hash_i64(&12_i64);

If the benefits seem minimal, consider types: Array<Bool> & Array<Int>. Compiler produces specialized copies: Array_Bool & Array_Int for Bool & Int respectively. Compiler can now layout memory according to the size of Bool & Int, in-line them (when possible) and save on space. Each method on Array likewise generates specialized code saving on indirection to method calls. Hence the generic code is as fast and memory efficient as if you had handwritten the specialized code directly. How is this for a comparison: An ArrayList<Boolean> in Java, depending on implementation, uses at-least 16 times more memory than it would in Rust.

Seems amazing isn’t it; there are downsides as well: Code Bloat & Slow Compilation Times. Compiler instantiates an independent copy of the generic code for each distinct combination of generic parameters it is invoked with, leading to huge binary size and compiling the same functions potentially many many times.

If you have reached here, then you already know what is type safety (compile time). Catching errors even before a program is run is always good. Generics help us write re-usable type safe code. Consider an example from the book Effective Java:

// Java
// 1
private final Collection stampsUntyped = ... ;
stampsUntyped.add(new Coin())

// 2
private final Collection<Stamp> stampsTyped = ... ;
stampsTyped.add(new Coin())
^
Type Mismatch

There are two ways we can create a collection of stamps.

In 1, we have a raw collection, meaning it can hold anything. There is a possibility that we might add an instance of class Coin to it. It will compile successfully and will crash at run time when we retrieve a stamp from the collection.

In 2, we have a Generic collection, where the generic parameter is defined to be a Stamp. This is good. This is type safe. There is no way we can add an instance of Coin to a collection of stamps. Compiler comes to our rescue and does not let this code compile.

Imagine someone putting a java.util.Date instance into a collection that is supposed to contain only java.sql.Date instances. Parametrized types would prevent this at compile time.

If you use raw types, you lose all the safety and expressiveness benefits of Generics.

___________________________________________________________________

Question:
What is the difference between
List & List<Object>? 
(Don’t fret, we will answer this shortly.)

___________________________________________________________________

Consider this code snippet (again from Effective Java Book):

// Java
static int numElementsInCommon(Set s1, Set s2) {
int result = 0;
for (Object o1 : s1)
if (s2.contains(o1))
result++;
return result;
}

This method numElementsinCommon calculates the number of common elements between two sets. There is no need to make these input Sets generic since we are only accessing and comparing elements of one another. It seems we do not always need type safety.

But, can we make it better with Generics? If yes, better how?
The answer is Yes, we can, by using Wildcard(?). This is NOT a question mark. In Java, its called a Wildcard.

Hold tight, we will get to wildcards in a short while. First we need to understand some principles.

Subtyping must seem like a familiar or known territory and it actually is. Its a key feature of Object-oriented languages such as Java and used by us too often. When we extend or implement a certain Type, the more specific type (i.e. the child/subtype) can fit in a bigger container (i.e. the Parent/supertype). For example:

  • Integer is a subtype of Number.
  • Double is a subtype of Number.
  • String is a subtype of Object.
  • String[] is a subtype of Object[].
  • ArrayList is a subtype of List.

Here onward, we will denote A -> B as A is a subtype of B & !-> B as A is NOT a subtype of B .

And, Substitution Principle says:

If S -> T, it means that any term of type S can be safely used in a context where a term of type T is expected.

Now, do you remember the name of this article. I know this is a long one but it’s time to begin our journey to the interesting part — Variance. Let’s consider this class hierarchy below for all the discussion ahead. All examples will be in Java.

// Java
public class Animal {
String name;
int troubleMakerRating;

Animal(String name) {
this.name = name;
this.troubleMakerRating = new Random().nextInt();
}
}

public class Dog extends Animal {
Dog(String name) {
super(name);
}
}

public class Cat extends Animal {
Cat(String name) {
super(name);
}
}

There is a class Animal. Dog & Cat extend Animal, because a dog is an animal, a cat is an animal. Each Animal has a name and a troubleMakerRating. So according to the Substitution Principle (SP), If we have a collection of Animals, it’s permissible to add a Dog or a Cat instance to it:

List<Animal> animals = new ArrayList<Animal>();
animals.add(new Animal("Some Lion"));
animals.add(new Dog("German Shepherd"));
animals.add(new Cat("RagDoll"));

Here subtyping is in two forms:
1. Adding to list is permitted because Dog/Cat is a subtype of Animal.
2. Assignment operation is permitted because ArrayList is a subtype of List.

So, if Dog -> Animal , ArrayList -> List it seems very reasonable that an ArrayList<Dog> -> List<Animal>. Let’s write this down in code:

List<Animal> animals = new ArrayList<Dog>();

Try compiling above piece of code. It won’t. Compiler throws Incompatible Types. Compiler does not let us substitute List<Animal> with ArrayList<Dog> because its not safe.

List<Animal> animals = new ArrayList<Dog>(); // Compilation ERROR !!
animals.add(new Cat("Birman"));

If there was no compilation error at the first line, then under Substitution Principle, it would be permissible to add an instance of Cat to the list of animals which is actually a list of Dog and Cats and Dogs don’t go along well. Issues would be seen at run time when we are parsing through the list of Animals thinking they are all Dogs but a Cat pops out. Remember the rule: Compile Time Error is Good, Run time Error is bad. That’s why, Substitution Principle does not apply here. i.e. ArrayList<Dog> !-> List<Animal>.

But there are also cases where we want Substitution Principle to work, i.e. We want the Compiler to relax a bit as we know what we are doing.

void printNames(Collection<Animal> animals) {
animals.forEach(animal -> System.out.println(animal.name));
}
List<Dog> myDogs = new ArrayList<>();
myDogs.add(new Dog("German Shepherd"));
myDogs.add(new Dog("Bulldog"));
myDogs.add(new Dog("Pug"));

printNames(myDogs); // Compilation ERROR !! Incompatible Types

A function printNames takes in a list of animals and prints their name. So, we should be able reuse the function to print the names of a dog list. After all the function printNames is only accessing the animals from the list and not trying to write into it. But compiler’s job is to be safe, so it errors out on printNames(myDogs). Over smart Compiler!!

And, what if we need to compare two dogs using a generic Animal Comparator:

// 1
void printNameOfMostTroubleMakerDog(List<Dog> dogs,
Comparator<Dog> comparator) {

Dog maxTroubleMaker = dogs.get(0);
    for (int i = 1; i < dogs.size(); i++) {
if (comparator.compare(maxTroubleMaker, dogs.get(i)) > 0) {
maxTroubleMaker = dogs.get(i);
}
}
    System.out.println(maxTroubleMaker.name 
+ " is the most trouble making Dog.");
}
// 2
Comparator<Animal> troubleMakerComparator = Comparator.comparingInt(animal -> animal.troubleMakerRating);
// 3 - Compilation ERROR !! Incompatible Types
printNameOfMostTroubleMakerDog(myDogs, troubleMakerComparator)
  1. printNameOfMostTroubleMakerDog takes a list of dogs and prints the name of most trouble making dog based on an input comparator.
  2. We create a reusable troubleMakerComparator which compares trouble maker rating of each animal.
  3. It does not compile, because printNameOfMostTroubleMakerDog is expecting a Comparator<Dog> but we are passing in a Comparator<Animal>.

Compiler should have allowed this. If a Comparator<Animal> can compare two Animals, it definitely can compare two Dogs.

Now the question arises, if we know that some of the substitutions like above are safe, shouldn’t we be able to convey this to compiler too? If the answer to every question was No, there would be no need for this article. 😛 
Remember WildCards ( ?). Yes we are going to use them now.

Variance refers to how subtyping between more complex types relates to subtyping between their components. For example, how should a list of Cats relate to a list of Animals, Or how should a function returning a Cat relate to a function returning a Animal.

There are four possible variances:

  • Covariance: Preserves ordering of subtyping relation. i.e.
     if A -> B, then Box<A> -> Box<B>
  • Contravariance: Reverses the ordering of subtyping relation. i.e. 
    if A -> B, then Box<B> -> Box<A>
  • Bivariance: If both of the above applies. i.e.
     if A -> B, then Box<A> -> Box<B> & Box<B> -> Box<A>
  • Invariance: Neither of Covariance or Contravariance apply.

To solve the substitution problem we faced when using printNames & printNameOfMostTroubleMakerDog, we will make use of Covariance & Contravariance respectively.

If somehow, ArrayList<Dog> becomes a subtype of List<Animal>, printNames function becomes reusable for any list which contains types that extend Animal. This is how we do it:

void printNames(Collection<? extends Animal> animals) {
...
}

printNames(myDogs); // Compiles cleanly

Now printNames(myDogs); type-checks and compiles. Notice the change in signature. printNames does not accept a Collection<Animal> anymore but a Collection<? extends Animal>. When we use ? extends, we are basically telling compiler that a collection of items whose Type is upper bound to Animal is accepted by the function printNames.

This does two things:

  1. It makes List<Dog> -> Collection<Animal>. This is called Covariance.
  2. And, inside printNames we can NOT add any item to the collection, but we can parse or access the items out of the list. We can not add because it’s possible that this list might be a list of dogs (a homogeneous list) or list of some animals (a heterogeneous list). Since we are not sure what the underlying list is, compiler does not allow any addition to the collection.

We can also use wildcards when declaring variables. Ex:

1|  List<Dog> dogs = new ArrayList<Dog>();
2| dogs.add(new Dog("PitBull"));
3| dogs.add(new Dog("Boxer"));
4| List<? extends Animal> animals = dogs;
5| animals.add(new Cat("Sphynx Cat")); // Compilation ERROR !!

Without ? extends, 4th line would have caused compilation error because List<Dog> !-> List<Animal>, but 5th would have worked because Cat -> Animal. With ? extends 4th line compiles because now List<Dog> -> List<Animal> but 5th does not compile because you cannot add a Cat to a List<? extends Animal>, since it might be a list of some other subtype of Animal.

Another way to understand is When we extend, there is atleast an upper bound type, so we know what type we are getting out of the Box. So GET is allowed, because that's the safest option for the compiler.

General Rule of Thumb:
If a structure contains elements with a type of the form ? extends E, we can GET elements out of the structure, but we CAN NOT PUT elements into it.

Similarly, to solve for the comparator problem, we change the function printNameOfMostTroubleMakerDog to include a wildcard and make it ? super Dog. i.e:

void printNameOfMostTroubleMakerDog(List<Dog> dogs,
Comparator<? super Dog> comparator) {
...
}
...
// Compiles Cleanly
printNameOfMostTroubleMakerDog(myDogs, troubleMakerComparator);

Now printNameOfMostTroubleMakerDog(myDogs, troubleMakerComparator) type-checks. ? super Dog means any type whose lower bound is Dog i.e. any type which is a supertype of Dog. So Animal & Object are acceptable types here. This also makes Comparator<Animal> a subtype of Comparator<Dog> for this method. This is Contravariance. There must be some corollary here from the GET-PUT principle we saw above and in fact there is. We can now PUT items inside a such a structure but we can not GET items out if it. We will need a different example for it.

public static void add_N_DogsTo(List<? super Dog> someList,
int count) {
    if (count == 0)
return;

for (int i = 0; i < count; i++)
// Adding to the list is allowed
someList.add(new Dog("Stray Dog " + i ));
    // Reading from the list is NOT allowed. Compilation ERROR !!                                     
Animal d = someList.get(0);
}

If we have a function to add ’N’ dogs to some list, whose lower bound is a Dog, adding is allowed but reading from list isn’t. This is because, during PUT we have a concrete type we can add, but during a GET command, we can never know the type of output object. It could be an Animal or an Object.

Note: Object d = someList.get(0); works because in Java, everything extends Object. So there is always an upper bound.

Another way to understand is When we super, there is atleast a lower bound type, so we know what's smallest type we can put in the box. So PUT is allowed, because that's the safest option for the compiler.

General Rule of Thumb:
If a structure contains elements with a type of the form ? super E, we can PUT elements into the structure, but we CAN NOT GET elements out of it.

Phew.. That was a lot, right?. It was but that’s it folks. Before we part our ways, let’s just answer the two questions we left hanging above.

1). What is the difference between List & List<Object>?

  • A List has opted out of the Generic Type Checking, where as a List<Object> has explicitly told the compiler that it is capable of holding items of type Object which basically is everything in Java. How does that help? Read next point:
  • List list = new List<String> is possible but
    List<Object> list = new List<String> is NOT. Because, Generics in Java are Invariant, for reasons already discussed in Subtyping & Substitution Principle.

2). How do we make the following code snippet better by using Generics where Type safety is not of concern?

// Java
static int numElementsInCommon(Set s1, Set s2) {
int result = 0;
for (Object o1 : s1)
if (s2.contains(o1))
result++;
return result;
}

To repeat what I said above, we don’t need to make Sets generic here (i.e. Set<T>) because it’s only comparing elements of both. But what if we add a wildcard here:

// Java
static int numElementsInCommon(Set<?> s1, Set<?> s2) {
int result = 0;
for (Object o1 : s1)
if (s2.contains(o1))
result++;
return result;
}

What does adding a wildcard buy us?. How do Set & Set<?> differ? 
We have actually made this function much safer to use. When using Set, we can put any element inside the function and modify it’s content, but since Generics are Invariant in Java, Compiler prohibits us to add anything inside a Set<?>. That’s not to say that we can’t call set.removeAll() or set.pop() inside the function and not modify its content. Also we can even add null to the Set<?> collection. But, any kind of safety is always beneficial.

Done and dusted. Hope this long read helped you understand how these concepts are interwoven and why its important to learn them both together.

Thanks for reading through guys. Please share 🗣 in your circles and clap 👏🏻 If you enjoyed it.

If you have any queries or you have anything to add or correct in this article feel free to put (no pun intended) them in comments and I will get (again, no pun intended 😛) back to you as soon as possible.

Follow Androidiots for more such amazing content.