A practical look at JEP-389 in JDK16 with libsodium


JDK 16 is coming and with the incubating JEP-389 (Foreign Linker API).

The Foreign Linker API is a very convenient and attractive way to connect to the native world. Let’s have a practical look at this API that should supersede JNI. In order to do so I wanted Java code to interact with the infamous libsodium.

First I will focus on using the foreign linker API, then I will show how to use jextract in its current state (it is still being actively developed).

Note that JEP-389 is still incubating, therefore examples below are to be obsolete for the next JDK as API and behavior are further refined.
The following examples were based on JDK 16 release candidate build 36 (2021/2/8).

Let’s try to reproduce the following example from the libsodium sealbox documentation, on this page there is a simple code snippet, that could be interesting to reproduce in Java.

Crypto sealed box example
#define MESSAGE (const unsigned char *) "Message"
#define MESSAGE_LEN 7
#define CIPHERTEXT_LEN (crypto_box_SEALBYTES + MESSAGE_LEN) /* Recipient creates a long-term key pair */
unsigned char recipient_pk[crypto_box_PUBLICKEYBYTES];
unsigned char recipient_sk[crypto_box_SECRETKEYBYTES];
crypto_box_keypair(recipient_pk, recipient_sk); /* Anonymous sender encrypts a message using an ephemeral key pair * and the recipient's public key */
unsigned char ciphertext[CIPHERTEXT_LEN];
crypto_box_seal(ciphertext, MESSAGE, MESSAGE_LEN, recipient_pk); /* Recipient decrypts the ciphertext */
unsigned char decrypted[MESSAGE_LEN];
if (crypto_box_seal_open(decrypted, ciphertext, CIPHERTEXT_LEN, recipient_pk, recipient_sk) != 0) { /* message corrupted or not intended for this recipient */
}

One of the cool thing with jshell is that you can try small ideas with a rapid feedback loop. With the right configuration, it is also possible to play the foreign linker.

Allow jshell to use the foreign module
$ jshell --add-modules jdk.incubator.foreign -R-Dforeign.restricted=permit

Then within jshell, let’s try out a simple smoke test.

Smoke testing the foreign module
jshell> import java.lang.invoke.*;
jshell> import jdk.incubator.foreign.*;
jshell> var getpid = CLinker.getInstance() ...> .downcallHandle( ...> LibraryLookup.ofDefault().lookup("getpid").get(), ...> MethodType.methodType(long.class), ...> FunctionDescriptor.of(CLinker.C_LONG) ...> );
getpid ==> MethodHandle()long jshell> (long) getpid.invokeExact();
$4 ==> 53699 jshell> ProcessHandle.current().pid()
$5 ==> 53699

Yes it works ! It really is easy to try a things for almost free, without leaving Java this is really neat. Now I would like to focus on the small example with libsodium within a project. I’ll explain how to use the API along the way.

The incubating modules are not on the default module path. Hence, it is required to add the jdk.incubator.foreign module when invoking the compilation command.

$ javac --add-modules jdk.incubator.foreign ...

This module also needs to be declared when running this code, as well as another property foreign.restricted to be able to invoke native code.

$ java -Dforeign.restricted=permit --add-modules jdk.incubator.foreign ...

If you like to play with jshell, it will be necessary to use these two as well

$ jshell -Dforeign.restricted=permit --add-modules jdk.incubator.foreign ...

Then comes the question to configure the build tool. I am using Gradle, the configuration is likely similar for other build tool.

// ... java { toolchain { languageVersion.set(JavaLanguageVersion.of(16)) }
} tasks { withType<JavaCompile>().configureEach { options.forkOptions.jvmArgs = listOf( "--add-opens", "jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED" (1) ) options.compilerArgs = listOf( "--add-modules", "jdk.incubator.foreign" (2) ) options.release.set(16) } withType<JavaExec>().configureEach { jvmArgs("-Dforeign.restricted=permit", (3) "--add-modules", "jdk.incubator.foreign") } withType<Test>().configureEach { useJUnitPlatform() jvmArgs("-Dforeign.restricted=permit", (4) "--add-modules", "jdk.incubator.foreign") }
}
1 Gradle itself can run on a different JDK, but the code needs to be compiled with JDK16, at this time Gradle 6.8.2 does not support the new module restriction introduced with JDK16 by default, hence it is necessary to explicitly open modules. See gradle/gradle#15538.
2 Let the compiler knows about the jdk.incubator.foreign module
3 Configure the tasks that executes a main class, while this is not immediately useful IntelliJ IDEA will pick up this configuration, when you click running a main method.
4 Configure test tasks to be able to run jdk.incubator.foreign tests.

The first lines makes use of a few macros (the lines starting with #define), we can assume that MESSAGE will be a method parameter, MESSAGE_LEN will be derived from the message parameter, and CIPHERTEXT_LEN is also derived from the message but needs another constant crypto_box_SEALBYTES.

The first thing needed is to acquire the crypto_box_SEALBYTES constant, looking at crypto_box.h there’s a method size_t crypto_box_sealbytes(void); that returns this constant.

It’s simple and it will be the first method I will present here.

The first challenge is to map the return type size_t, unsigned integer type, since the constant 1 2 3 is inferior to the integer max value and that I’d like to use this as an array size, I will map it to an int.

crypto_box_sealbytes (.java)
MethodHandle crypto_box_sealbytes = CLinker.getInstance() .downcallHandle( libsodiumLookup.lookup("crypto_box_sealbytes").get(), MethodType.methodType(int.class), FunctionDescriptor.of(CLinker.C_INT) ); var crypto_box_SEALBYTES = (int) crypto_box_sealbytes.invokeExact();

The java type and the C descriptor must match, otherwise the call will fail at runtime with a IllegalArgumentException.

Example 1. Carrier mismatch long != b32

If the java method type used long.class, and the C descriptor was C_INT, the code would have failed with a carrier mismatch.

java.lang.IllegalArgumentException: Carrier size mismatch: long != b32[abi/kind=INT]

For reference, CLinker.C_INT is actually a MemoryLayout, a layout is used to model native memory.

The next part of the example is a little more involved code, the crypto_box_keypair method takes two array pointers recipient_pk and recipient_sk, the generated keypair will be written to the given byte array.

unsigned char recipient_pk[crypto_box_PUBLICKEYBYTES];
unsigned char recipient_sk[crypto_box_SECRETKEYBYTES];
crypto_box_keypair(recipient_pk, recipient_sk);

In order to initialize the size of these arrays, the codes needs two constants crypto_box_PUBLICKEYBYTES and crypto_box_SECRETKEYBYTES. To access these two it’ll be the same as crypto_box_SEALBYTES.

The C mapping is easy to get : a void method that takes 2 pointers FunctionDescriptor.ofVoid(C_POINTER, C_POINTER). In Java the method type require a type called MemoryAddress which represents the pointer address.

The pointers need to point to some memory. That’s what the MemorySegment type is for. Before invoking the method the necessary memory will be allocated via MemorySegment::allocateNative, and the respective memory segment address will be passed.

crypto_box_keypair (.java)
MethodHandle crypto_box_keypair = CLinker.getInstance().downcallHandle( libsodiumLookup.lookup("crypto_box_keypair").get(), MethodType.methodType( void.class, MemoryAddress.class, // pk MemoryAddress.class // sk ), FunctionDescriptor.ofVoid(C_POINTER, C_POINTER) ); var recipientPublicKey = MemorySegment.allocateNative(crypto_box_publickeybytes());
var recipientSecretKey = MemorySegment.allocateNative(crypto_box_secretkeybytes());
crypto_box_keypair.invokeExact(recipientPublicKey.address(), recipientSecretKey.address()); var kp = new CryptoBoxKeyPair( recipientPublicKey.toByteArray(), recipientSecretKey.toByteArray()
);
This code works, but there is something that must be taken care of, the native segment lifecycle.

The above code snippet never deallocate native memory. Fortunately in JDK 16 the MemorySegment class implements AutoCloseable, declaring it in a try-with_resources block will solve the issue.

try (var recipientPublicKey = MemorySegment.allocateNative(crypto_box_publickeybytes()); var recipientSecretKey = MemorySegment.allocateNative(crypto_box_secretkeybytes())) { crypto_box_keypair.invokeExact(recipientPublicKey.address(), recipientSecretKey.address()); return new CryptoBoxKeyPair( recipientPublicKey.toByteArray(), recipientSecretKey.toByteArray() );
}

However, JEP-389 comes with the concept of scopes, which allows to express temporal bounds of these segments, in JDK16 look for the NativeScope class, it allows registering segments in a code section and allocating segments anywhere in this section.

crypto_box_keypair with NativeScope (.java)
try (var scope = NativeScope.unboundedScope()) { var recipientPublicKey = scope.allocate(crypto_box_publickeybytes()); var recipientSecretKey = scope.allocate(crypto_box_secretkeybytes()); crypto_box_keypair.invokeExact(recipientPublicKey.address(), recipientSecretKey.address()); return new CryptoBoxKeyPair( recipientPublicKey.toByteArray(), recipientSecretKey.toByteArray() );
}

In order to get back the off-heap content into Java types, the code can call any of the to{The Java Type} methods on the MemorySegment instance, they will take care of the conversion.

The next method to call is crypto_box_seal, which also takes pointers and a message length.

unsigned char ciphertext[CIPHERTEXT_LEN];
crypto_box_seal(ciphertext, MESSAGE, MESSAGE_LEN, recipient_pk);

When looking at the C signature however we notice something unusual for Java developers: the message length argument is of type long long!

In C or C++, this declaration means the type is at least 8 bytes (64 bits), this means a Java long type is what is needed.

In particular here’s a breakdown of the signed integers. It is incomplete as they can be declared differently (eg. long is the same as long int, or long long is the same as long long int), this wikipedia page has a more complete overview of C data types.

Table 1. Signed integers

int

A signed integer type with "the natural size suggested by the architecture of the execution environment", with a minimum of 2 byte (16 bits, \$[-32767; +32767]\$).

On a 64bits CPU, int is 4bytes and the range becomes \$[-2147483647; +2147483647]\$;

long

A signed integer type that is at least so 4 bytes (\$[-2147483647; +2147483647]\$).

On a 64bits CPU, long is 4bytes and the range becomes \$[−9223372036854775807; +9223372036854775807]\$;

long long

A signed integer type that is at least so 8 bytes (\$[−9223372036854775807; +9223372036854775807]\$).

On a 64bits CPU, long long is still 8 bytes long.

When you start to study these C data types a bit more, you’ll notice two things that just don’t match with Java types:

  • unsigned integers, while they do have the same width as their signed counterpart, their math is different as their range is different:

    • unsigned long's range is \$[0; +4294967295]\$ (on a 64 bit CPU)

    • unsigned long long's range is \$[0; +18446744073709551615]\$ (on a 64 bit CPU)

  • long doubles are larger than 64 bytes, I never had to use those, but it seems they can be as big as 128 bits (16 bytes).

As a reminder size_t is unsigned.

crypto_box_seal definition (.c)
SODIUM_EXPORT
int crypto_box_seal(unsigned char *c, const unsigned char *m, unsigned long long mlen, const unsigned char *pk) __attribute__ ((nonnull(1, 4)));

Also, for this post, and I intend to pass a short String message, which is baked by a char array whose length can only be an int.

var crypto_box_seal = CLinker.getInstance().downcallHandle( libsodiumLookup.lookup("crypto_box_seal").get(), MethodType.methodType(int.class, MemoryAddress.class, // cipherText, output buffer MemoryAddress.class, // message long.class, // message length MemoryAddress.class // publicKey ), FunctionDescriptor.of(C_INT, C_POINTER, C_POINTER, C_LONG_LONG, C_POINTER) ); try (var scope = NativeScope.unboundedScope()) { var cipherText = scope.allocate(crypto_box_sealbytes() + message.length()); var ret = (int) crypto_box_seal.invokeExact( cipherText.address(), CLinker.toCString(message, StandardCharsets.US_ASCII, scope).address(), (long) message.length(), scope.allocateArray(C_CHAR, publicKey).address() ); return cipherText.toByteArray();
}

There’s a few thing to notice :

  1. I am specifically passing the US_ASCII charset, as I now that the byte array representation of the string will be 1 byte per char, implying I can use the String::length method. If the string used characters that do not fit in a single byte, I would have needed to extract the byte array using UTF-8 charset encoder first and use the length of the byte array instead.

  2. The var ret is not used, however due to the dynamic nature of invokeExact, the compiler needs the exact signature on the call-site, that’s why the result of this invocation is assigned to an int variable even if it is not used.

    Without this assignment the JVM would have raised a WrongMethodTypeException, in this case the exception message helps to identify the type differences in the signature:

    java.lang.invoke.WrongMethodTypeException: expected (MemoryAddress,MemoryAddress,long,MemoryAddress)int but found (MemoryAddress,MemoryAddress,long,MemoryAddress)void

The last method call of this snippet ends the libsodium crypto box example. The method crypto_box_seal_open take pointers and a ciphered text length so let’s apply again what has been done for crypto_box_seal.

crypto_box_seal_open (.c)
unsigned char decrypted[MESSAGE_LEN];
if (crypto_box_seal_open(decrypted, ciphertext, CIPHERTEXT_LEN, recipient_pk, recipient_sk) != 0) { /* message corrupted or not intended for this recipient */
}
crypto_box_seal_open (.java)
var crypto_box_seal_open = getInstance().downcallHandle( libsodiumLookup.lookup("crypto_box_seal_open").get(), MethodType.methodType(int.class, MemoryAddress.class, // message MemoryAddress.class, // cipherText long.class, // cipherText.length MemoryAddress.class, // public key MemoryAddress.class // secret key ), FunctionDescriptor.of(C_INT, C_POINTER, C_POINTER, C_LONG_LONG, C_POINTER, C_POINTER )
); try (var scope = NativeScope.unboundedScope()) { var decipheredText = scope.allocateArray(C_CHAR, cipherText.length - crypto_box_sealbytes()); var ret = (int) crypto_box_seal_open.invokeExact(decipheredText.address(), scope.allocateArray(C_CHAR, cipherText).address(), (long) cipherText.length, scope.allocateArray(C_CHAR, publicKey).address(), scope.allocateArray(C_CHAR, secretkey).address()); return CLinker.toJavaString(decipheredText, StandardCharsets.US_ASCII);
}

Yet running this code raise an error:

java.lang.IndexOutOfBoundsException: Out of bound access on segment MemorySegment{ id=0x6f11d841 limit: 20 }; new offset = 20; new length = 1 at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.outOfBoundException(AbstractMemorySegmentImpl.java:495) at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkBoundsSmall(AbstractMemorySegmentImpl.java:465) at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkBounds(AbstractMemorySegmentImpl.java:446) at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkAccess(AbstractMemorySegmentImpl.java:401) at java.base/java.lang.invoke.MemoryAccessVarHandleByteHelper.checkAddress(MemoryAccessVarHandleByteHelper.java:80) at java.base/java.lang.invoke.MemoryAccessVarHandleByteHelper.get(MemoryAccessVarHandleByteHelper.java:113) at jdk.incubator.foreign/jdk.incubator.foreign.MemoryAccess.getByteAtOffset(MemoryAccess.java:105) at jdk.incubator.foreign/jdk.internal.foreign.abi.SharedUtils.strlen(SharedUtils.java:259) at jdk.incubator.foreign/jdk.internal.foreign.abi.SharedUtils.toJavaStringInternal(SharedUtils.java:249) at jdk.incubator.foreign/jdk.incubator.foreign.CLinker.toJavaString(CLinker.java:342)

I didn’t get why this code failed at first.

CLinker::toJavaString is the mirror function of the CLinker::toCString, so it looked correct.

The exception message indicates the segment has the size 20 which is the length of the string Hello foreign code !, there’s new offset is 20 indicating the segment was read up to the 20th byte / character, and there is the new length = 1, which suggests toJavaString needs to read an additional character but can’t.

The javadoc of toJavaString says (emphasis is mine) :

Converts a null-terminated C string stored at given address into a Java string, using the platform’s default charset.

This immediately clicked: libsodium’s message does not imply it is a string. It’s API takes a pointer to a memory region and the length to read in that memory region. For all that matter, the message could be any binary payload.

Let’s look at the string Hello

  1. Libsodium seal method will be passed the following byte array CLinker.toCString("Hello", StandardCharsets.US_ASCII).toByteArray()48656C6C6F00

  2. But since the code is using String::length, libsodium will only seal up to 5 bytes : 48656C6C6F.

  3. Then opening the seal, the content of the MemorySegment that contains the decrypted message will be 48656C6C6F

  4. But CLinker.toJavaString(decipheredText, StandardCharsets.US_ASCII) expects the memory segment to be a valid C string, terminated by the \0 character. And since the actual decrypted memory segment is not terminated by '\0', the code emit an error.

For this reason this suggests the code to use is new String(decipheredText.toByteArray(), StandardCharsets.US_ASCII). They are other possibilities like not using the CLinker::toCString with the crypto_box_seal method and instead, or to increment by 1 the length when CLinker::toCString is passed.

For reference here are the bytes returned by String::getBytes and CLinker::toCString.

  • "Hello".getBytes(US_ASCII)48656C6C6F

  • CLinker.toCString("Hello", US_ASCII).toByteArray()48656C6C6F00

For this blog post I’d like to keep the assumption the sealed message is a String, which leads to the following correct code :

try (var scope = NativeScope.unboundedScope()) { var decipheredText = scope.allocateArray(C_CHAR, cipherText.length - crypto_box_sealbytes()); var ret = (int) crypto_box_seal_open.invokeExact(decipheredText.address(), scope.allocateArray(C_CHAR, cipherText).address(), (long) cipherText.length, scope.allocateArray(C_CHAR, publicKey).address(), scope.allocateArray(C_CHAR, secretkey).address()); return new String(decipheredText.toByteArray(), StandardCharsets.US_ASCII);
}

Also, I have intentionally left out the returned status of crypto_box_seal_open, to focus on the foreign module API, but this would make sense to perform checks on the returned value before returning the buffer as suggested on the libsodium documentation.

I didn’t cover everything this API has to offer, like the up call stubs, which is a way to pass a function pointer to the native code, nor did I cover the every feature of JEP-389, like MemorySegment or MemoryLayout API.

At this time I find this API a pleasure to use compared to JNI. Note that I don’t have experience with JNA, so I may be lacking perspective there.

There’s a few pitfalls like the CLinker::toJavaString or the MemorySegment lifecycles which get more complicated if those segments are shared between threads. I found the API well-designed and well documented, but if you’re novice in this area, you’ll likely need other materials. A package wide documentation, in jdk.incubator.foreign, should definitely fill this gap in my opinion.

The chosen example was concise in native code, but writing the stubs in Java is quickly tedious and verbose. JDK developers felt the same way as they are also investing energy on a tool named jextract whose goal is to reduce the tedious work amount. I’ll show in a section below what can be done with the current state of jextract.

MemorySegment do have the same constraints as DirectByteBuffers, ie by default the size of the segment can’t size can’t go over Runtime.getRuntime().maxMemory()

Allocating a very bigger segment than than maxMemory
Exception in thread "main" java.lang.OutOfMemoryError: Cannot reserve 2147483648 bytes of direct buffer memory (allocated: 8192, limit: 522190848)

This limit is configurable by setting the -XX:MaxDirectMemorySize={size} flag.

var memorySegment = MemorySegment.allocateNative(nativeSegmentSize);

There’s one interesting thing with this API it is possible to access the address from the API, via MemorySegment::address, and one can bet the hexadecimal representation, via Long.toHexString(memorySegment.address().toRawLongValue()).

MemoryAddress{ base: null offset=0x7fc513fff010 }

If you are on Linux then you use pmap from the procps package to inspect memory mappings of the JVM.

/pmap output of a 2GiB native segment
151: java -Dforeign.restricted=permit --add-modules jdk.incubator.foreign -XX:MaxDirectMemorySize=2100m MemorySegments.java
Address Kbytes RSS Dirty Mode Mapping
...
0000557635ba1000 4 0 0 r-x-- java
0000557635ba3000 4 0 0 r---- java
0000557635ba4000 4 0 0 rw--- java
0000557636d4b000 132 16 16 rw--- [ anon ]
00007fc513fff000 2097156 1811456 1811456 rw--- [ anon ] (1)
00007fc594000000 132 0 0 rw--- [ anon ]
00007fc594021000 65404 0 0 ----- [ anon ]
...
1 This is the allocated segment, 2 GiB ⇐⇒ 2097152 KiB, this segment is a bit larger by one page (4 KiB). And in fact the base address of the segment is 0x7fc513fff010.

In this case it is not related to alignment, but it may be possible. What is important is that the address of a MemorySegment may be contained in a larger memory mapping.

One important and useful distinction with DirectByteBuffers is the presence of a MemorySegment::close method, that will immediately free the native mapping when called. DirectByteBuffer used to be challenging because they had no explicit method to free the native mapping, and as such had to wait for the GC to kick in order to be freed.

Another thing to remind is that the memory mapping is zeroed, that means a big segment will take a noticeable time to get initialized. As with DirectByteBuffers this pattern is interesting when inspecting off-heap memory.

Usually it is more practical to use the NativeScope API as it is easier to reason about boundaries of the involved memory mapping. Using a larger MemorySegment coud be interesting when it has to be sliced and shared among various threads. Also given the high initialization cost for large segments it’s likely to have the same lifecycle as the application. Typically, in a few years, Netty, Aeron, Kafka, Cassandra, …​ could make use of this API !

One thing that caught me off-guard, is that when closing a slice (created by MemorySegment::asSlice) also closes the underlying segment.

Finally, when the code requires new native allocation, the JVM appears to be able to grow native mappings. In short the JVM tries to put these segment in a bigger memory mapping.

The access modes allows to define a set of permissions of the MemorySegment, by default all permissions are given. In the example below this segment won’t be readable by

var ms = MemorySegment.allocateNative(segmentSize) .withAccessModes(MemorySegment.WRITE | MemorySegment.CLOSE); ms.asByteBuffer().getLong(); (1)
1 Throws UnsupportedOperationException: Required access mode READ ; current access modes: [WRITE, CLOSE]

I am not quite sure how to use these at this time. It certainly would be useful to prevent a slice from being closed though.

Also, the WRITE and READ permissions only apply to the Java object, the native memory mapping isn’t afected, which is expected since it can hold multiple MemorySegment.

Until JEP-389, we used a FileChannel and a MappedByteBuffer to memory map a file. The JEP-389 also take care of this use case, by using the mapFile factory method.

try (var mmaped = MemorySegment.mapFile( path, (1) 0, (2) Files.size(path), (3) FileChannel.MapMode.READ_ONLY (4)
)) { // ...
}
1 A path eg Path.of("…​")
2 The base offset
3 The size of the mapping, here the complete file
4 The mapping mode

What is really nice here is that the MemorySegment is also immediately freed when the code leaves the try-with-resources block.

I mentioned that MemorySegment is implementing AutoCloseable, it won’t be the case in the next JDK release. In the same manner I mentioned NativeScope earlier, which is a JDK16 API, but in the current panama state it will be replaced by a slightly different construct.

try (ResourceScope scope : ResourceScope.ofConfined()) { MemorySegment.allocateNative(layout, scope): MemorySegment.mapFile(… , scope); CLinker.upcallStub(… , scope);
}

Given the current state I have doubts JEP-389 will get out of incubating for JDK 17. JEP-389 is working well, but I think the developers may need more time to get this API right. They are doing a fantastic job in my opinion.

jextract is still being backed and was not ready to be included in JDK 16 for incubation, but since it complements JEP-389, I wanted to give it a try and showcase its usefulness.

This tool leverages the native libclang and as such the jdk.incubator.foreign module.

In order to be able to use it, one should download the panama jdk here: https://jdk.java.net/panama/. Don’t be scared by early access, JDK 17 (very early at this stage) or the other warnings, you just need to use jextract not the panama jdk.

When I started to bootstrap work on JDK16 and libsodium, the built panama JDK didn’t contain the jextract, as I wasn’t sure I voiced this on Twitter, Oracle engineers confirmed me this was a bug in the release JDK-8261733 if this every happen again, or you want to try the latest jextract, you’ll need to build the panama JDK.

Again the jextract tool is still being backed at this time. That means it that everything below can be obsolete any time.

The first thing I need is to get the headers of libsodium, and for that I cloned the repo. Then checked out the 1.0.18 tag as I intend to target this released binary.

$ git clone https://github.com/jedisct1/libsodium.git
Cloning into 'libsodium'...
remote: Enumerating objects: 151, done.
remote: Counting objects: 100% (151/151), done.
remote: Compressing objects: 100% (105/105), done.
remote: Total 32369 (delta 74), reused 86 (delta 41), pack-reused 32218
Receiving objects: 100% (32369/32369), 8.24 MiB | 10.52 MiB/s, done.
Resolving deltas: 100% (19205/19205), done.
$ git checkout 1.0.18

Headers are located in this folder src/libsodium/include. Now let use jextract.

First contact with jextract
$ jextract -d libsodium-jextract \ (1) -l sodium \ (2) --target-package com.github.bric3.sodium \ (3) -I src/libsodium/include/ \ (4) -I src/libsodium/include/sodium \ (4) --filter sodium.h \ (5) src/libsodium/include/sodium.h (6)
src/libsodium/include/sodium/export.h:5:10: fatal error: 'stddef.h' file not found
1 Destination of the generated sources
2 Extracts or more precisely generate sources, instead of classes
3 Indicates the target package of the generated source
4 Includes of the library (some files include others in the library)
5 Only includes symbols from the given file, otherwise symbols of other includes may be extracted
6 The C header file

Obviously the standard C headers are not discovered by jextract. I tried to solve this by declaring the system includes in /usr/include and /usr/include/linux (/usr/include/linux/stddef.h) but the error went a bit further with unknown type name 'size_t'. This is a known issue that for some platforms jextract has issues to find the system headers (JDK-8262127).

size_t is a standard C alias representing the unsigned integer type. I found help in this old thread from november 2018. Instead of using the includes under /usr/includes, it is necessary to use the includes of the compiler ; on my docker image they were located here : /usr/lib/gcc/x86_64-redhat-linux/8/include.

Also I noticed that jextract generates classes first, but you can pass a --source option to configure it to generate sources instead.

On the next run of jextract the extraction process stopped on the file version.h.

Includes the compiler headers
$ jextract \ -d libsodium-jextract \ -l sodium \ --source \ (1) --target-package com.github.bric3.sodium \ -I /usr/lib/gcc/x86_64-redhat-linux/8/include \ (2) -I src/libsodium/include/ \ -I src/libsodium/include/sodium \ --filter sodium.h \ src/libsodium/include/sodium.h
src/libsodium/include/sodium.h:5:10: fatal error: 'sodium/version.h' file not found
1 generates the sources
2 the compiler includes installed on this linux image

In the libsodium repository there’s a file named version.h.in, and upon inspection of its content I noticed placeholders that suggests a preliminary phase in the libsodium build will generate the final version.h. In native sources this usually happen via a combination of ./autogen.sh and ./configure.

Let’s prepare the code base.

Configure libsodium codebase
$ ./autogen.sh
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
autoreconf: configure.ac: tracing
autoreconf: configure.ac: creating directory build-aux
autoreconf: running: libtoolize --copy --force
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
libtoolize: copying file 'build-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
autoreconf: running: /usr/bin/autoconf --force
autoreconf: configure.ac: not using Autoheader
autoreconf: running: automake --add-missing --copy --force-missing
configure.ac:75: installing 'build-aux/compile'
configure.ac:9: installing 'build-aux/config.guess'
configure.ac:9: installing 'build-aux/config.sub'
configure.ac:10: installing 'build-aux/install-sh'
configure.ac:10: installing 'build-aux/missing'
src/libsodium/Makefile.am: installing 'build-aux/depcomp'
parallel-tests: installing 'build-aux/test-driver'
autoreconf: Leaving directory `.'
Downloading config.guess and config.sub...
Done. ./configure
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether UID '0' is supported by ustar format... yes
checking whether GID '0' is supported by ustar format... yes
checking how to create a ustar tar archive... gnutar
checking whether make supports nested variables... (cached) yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking whether make supports the include directive... yes (GNU style)
checking for gcc... gcc
...
configure: creating ./config.status
config.status: creating Makefile
config.status: creating builds/Makefile
config.status: creating contrib/Makefile
config.status: creating dist-build/Makefile
config.status: creating libsodium.pc
config.status: creating libsodium-uninstalled.pc
config.status: creating msvc-scripts/Makefile
config.status: creating src/Makefile
config.status: creating src/libsodium/Makefile
config.status: creating src/libsodium/include/Makefile
config.status: creating src/libsodium/include/sodium/version.h (1)
config.status: creating test/default/Makefile
config.status: creating test/Makefile
config.status: executing depfiles commands
config.status: executing libtool commands
1 Configuring version.h with version values

Finally, this time jextract worked as expected.

$ jextract \ -d libsodium-jextract \ -l sodium \ --source \ --target-package com.github.bric3.sodium \ -I /usr/lib/gcc/x86_64-redhat-linux/8/include \ -I src/libsodium/include/ \ -I src/libsodium/include/sodium \ --filter sodium.h \ src/libsodium/include/sodium.h

However, when I opened sodium_h.java it was empty.

public final class sodium_h { /* package-private */ sodium_h() {}
}

In the 1.x tree the sodium.h file only includes the declaration of other headers. When I explicitly filtered on sodium.h, jextract evicted symbols of the includes.

How to keep the declarations of the other headers ? At this time jextract help is a bit vague.

$ jextract --help
Non-option arguments:
[String] -- header file Option Description
------ -----------
-?, -h, --help print help
-C <String> pass through argument for clang
-I <String> specify include files path
-d <String> specify where to place generated files
--filter <String> header files to filter
-l <String> specify a library
--source generate java sources
-t, --target-package <String> target package for specified header files

Looking at the jextract source code was the way to go, first the code suggests that it’s possible to pass multiple filters (--filter), just like it is possible to pass multiple include (-I). Although it is not very practical with multiple values, isn’t is possible to pass a pattern ?

This is answered here in this document (Using the jextract tool) or in the source code in the Filter class ; it’s possible to pass --filter a part of the path, the current code will just verify if this string is contained in the header path.

Concretely I can use the string sodium as a filter to include headers located in include/sodium/ folder.

$ jextract \ -d libsodium-jextract \ (1) --source \ (2) --target-package com.github.bric3.sodium \ (3) -l sodium \ (4) -I /usr/lib/gcc/x86_64-redhat-linux/8/include \ (5) -I src/libsodium/include/ \ (6) -I src/libsodium/include/sodium \ (6) --filter sodium \ (7) src/libsodium/include/sodium.h (8)
1 Destination of the generated sources
2 Extracts or more precisely generate sources, instead of classes
3 Indicates the target package of the generated source
4 Name without the JNI prefix and suffix (or path) of the library to load
5 Includes C definitions or includes like size_t, stddef.h etc.
6 Includes of the library (some files include others in the library)
7 Only includes symbols from the given file, otherwise symbols of other includes may be extracted
8 The C header file
$ ls -lh libsodium-jextract-f/com/github/bric3/sodium/
total 956K
-rw-r--r--. 1 root root 557 Feb 16 14:10 C.java
-rw-r--r--. 1 root root 8.8K Feb 16 14:10 RuntimeHelper.java
-rw-r--r--. 1 root root 350K Feb 16 14:10 sodium_h.java
-rw-r--r--. 1 root root 124K Feb 16 14:10 sodium_h_0.java
-rw-r--r--. 1 root root 329K Feb 16 14:10 sodium_h_constants_0.java
-rw-r--r--. 1 root root 131K Feb 16 14:10 sodium_h_constants_1.java

Let’s have a look at what jextract generated. The entry point is the class sodium_h. In particular let’s compare the method stubs to these I wrote earlier :

  • crypto_box_sealbytes

  • crypto_box_keypair

  • crypto_box_seal

  • crypto_box_seal_open

The libsodium headers declare a method named crypto_box_sealbytes, whose role is to return a constant crypto_box_SEALBYTES, however this constant is defined as a C preprocessor directive #DEFINE, which is not visible as a symbol when performing a library lookup. The native crypto_box_sealbytes method compensates this limitation.

jextract is however reading the headers, in doing so it actually extracts the constant crypto_box_SEALBYTES. It is still exposed as method, and it is declared in a different class sodium_h_0#crypto_box_SEALBYTES.

Note that sodium_h extends sodium_h_0, so one will write

sodium_h.crypto_box_SEALBYTES()

Behind the scene this call invokes sodium_h_constants_1#crypto_box_SEALBYTES, and for sodium_h this split in two classes due to the class limits. sodium_h_constants_1 extends sodium_h_constants_0.

When I accessed this constant for the first time, I got this error :

java.lang.ExceptionInInitializerError at com.github.bric3.sodium.sodium_h_0.crypto_box_PUBLICKEYBYTES(sodium_h_0.java:1511) at com.github.bric3.sodium.Libsodium$JextractedLibsodium.crypto_box_keypair(Libsodium.java:263) at com.github.bric3.sodium.LibsodiumTest.can_invoke_crypto_box_keypair(LibsodiumTest.java:44) Caused by: java.lang.IllegalArgumentException: Library not found: sodium at jdk.incubator.foreign/jdk.internal.foreign.LibrariesHelper.lookup(LibrariesHelper.java:94) at jdk.incubator.foreign/jdk.internal.foreign.LibrariesHelper.loadLibrary(LibrariesHelper.java:60) at jdk.incubator.foreign/jdk.incubator.foreign.LibraryLookup.ofLibrary(LibraryLookup.java:150) at com.github.bric3.sodium.RuntimeHelper.lambda$libraries$0(RuntimeHelper.java:46) at com.github.bric3.sodium.RuntimeHelper.libraries(RuntimeHelper.java:49) at com.github.bric3.sodium.sodium_h_constants_0.<clinit>(sodium_h_constants_0.java:14)

The stacktrace points to this code:

sodium_h_constants_0.LIBRARIES
static final LibraryLookup[] LIBRARIES = RuntimeHelper.libraries(new String[] { "sodium", (1)
});
1 This is the value I passed to the jextract command.

The value above is the value I used in the -l sodium option of jextract, yet this value here is obviously incorrect for my use case.

Work around 1: with jextract

It is not yet clear, in the jextract usage description at this time, but one can pass to the -l option

  1. A library name, which has to be available on one of the paths declared in the JVM system property java.library.path

    linux

    /usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib

    macOs

    /Users/bric3/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.

    The library must conform to JNI conventions, libsodium.23.dylib or libtasn1.so.6.5.5 won’t work as they contain version numbers.

  2. Or an absolute path eg /usr/local/opt/libsodium/lib/libsodium.23.dylib.

However, the actual library path is dependent on the system, on the library version and on the installation mechanism. I could have used jextract with -l /usr/local/opt/libsodium/lib/libsodium.23.dylib, but then the generated code can not run on Linux without modifications, etc.

My final objective for this code is to declare the libsodium bindings in java, and link with the actual libsodium on the platform macOs or Linux.

Work-around 2: Modify generated code

LIBRARIES is a static final variable that is used by other static variables in the same class. While it is possible to edit the sodium_h_constants_0 class, it is still difficult to make this LibraryLookup code configurable without a significant refactoring.

Oracle engineers are aware of this problem JDK-8262126, so we might see it fixed in the final JEP-389 release.

For this article the easiest solution, is to declare the local libsodium path in the code, as I did in the first section of this blog.

static final LibraryLookup[] LIBRARIES = RuntimeHelper.libraries(new String[] { "/usr/local/opt/libsodium/lib/libsodium.23.dylib"
});

In the end I’ll rework this initialization later though with custom code to find the actual libsodium on the current platform.

Now let’s profit from the generated function call, in the same order I’d like to use crypto_box_keypair, this is straightforward. The arguments are still carrier type like MemorySegment, which means we still need to take care of the scope / lifecycle of these allocations.

try (var scope = NativeScope.unboundedScope()) { var recipientPublicKey = scope.allocate(sodium_h.crypto_box_PUBLICKEYBYTES()); var recipientSecretKey = scope.allocate(sodium_h.crypto_box_SECRETKEYBYTES()); sodium_h.crypto_box_keypair(recipientPublicKey, recipientSecretKey); (1) return new CryptoBoxKeyPair( recipientPublicKey.toByteArray(), recipientSecretKey.toByteArray() );
}

The IDE might suggest a method named crypto_box_keypair$MH ; the suffix $MH simply indicates this returns the Method Handle for this native method which is basically what I showed in the first part of this blog post.

As reflex, I always like to navigate the code I’m invoking. The method we are invoking are just the public API methods, checking null, and declaring a correct callsite (correct return type, correct argument types).

sodium_h.crypto_box_keypair
public static MethodHandle crypto_box_keypair$MH() { return RuntimeHelper.requireNonNull(sodium_h_constants_0.crypto_box_keypair$MH(), "unresolved symbol: crypto_box_keypair");
}
public static int crypto_box_keypair ( Addressable pk, Addressable sk) { var mh$ = RuntimeHelper.requireNonNull(sodium_h_constants_0.crypto_box_keypair$MH(), "unresolved symbol: crypto_box_keypair"); try { return (int)mh$.invokeExact(pk.address(), sk.address()); } catch (Throwable ex$) { throw new AssertionError("should not reach here", ex$); }
}

Going further down to see how the MethodHandle is declared:

sodium_h_constants_0.crypto_box_keypair$MH
static final FunctionDescriptor crypto_box_keypair$FUNC_ = FunctionDescriptor.of( C_INT, C_POINTER, C_POINTER
); static final MethodHandle crypto_box_keypair$MH_ = RuntimeHelper.downcallHandle( LIBRARIES, "crypto_box_keypair", "(Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;)I", (1) crypto_box_keypair$FUNC_, false
);
static final java.lang.invoke.MethodHandle crypto_box_keypair$MH() { return crypto_box_keypair$MH_; }
1 Note that the Java method signature is declared with a String instead of the Java API MethodType.

This code invokes creates the down-call stub, the only difference with the handcrafted handle in the section above, is the signature of the method declared as a String.

(Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;)I breakdown
  • Ljdk/incubator/foreign/MemoryAddress ⇒ arg0

  • Ljdk/incubator/foreign/MemoryAddress ⇒ arg1

  • Iint return type

The other two methods in this example crypto_box_seal and crypto_box_seal_open are similar and don’t require to do the tedious handle declaration.

This type raised a few questions about how to map them in Java in the first section where I used manually jdk.incubator.foreign. Also there’s statement at this time about jextract not supporting some wide types.

  • jextract does not support certain C types bigger than 64 bits (e.g. long double).

How does it handle these unsupported types, the answer is in the source code.

In here we learn that unsigned types are represented with their signed counterpart and the types wider than 64 bits are represented with a specific unsupported layout during headers processing. The symbols with unsupported layouts won’t be generated as the JEP-389 linker won’t be able to link them.

Some details on how jextract's primitive types handling

The enum below in jextract show how native primitive types are mapped to their respective memory layout whether they are supported of not.

enum Kind { /** * {@code void} type. */ Void("void", null), /** * {@code Bool} type. */ Bool("_Bool", CLinker.C_CHAR), /** * {@code char} type. */ Char("char", CLinker.C_CHAR), /** * {@code char16} type. */ Char16("char16", UnsupportedLayouts.CHAR16), /** * {@code short} type. */ Short("short", CLinker.C_SHORT), /** * {@code int} type. */ Int("int", CLinker.C_INT), /** * {@code long} type. */ Long("long", CLinker.C_LONG), /** * {@code long long} type. */ LongLong("long long", CLinker.C_LONG_LONG), /** * {@code int128} type. */ Int128("__int128", UnsupportedLayouts.__INT128), /** * {@code float} type. */ Float("float", CLinker.C_FLOAT), /** * {@code double} type. */ Double("double",CLinker.C_DOUBLE), /** * {@code long double} type. */ LongDouble("long double", UnsupportedLayouts.LONG_DOUBLE), /** * {@code float128} type. */ Float128("float128", UnsupportedLayouts._FLOAT128), /** * {@code float16} type. */ HalfFloat("__fp16", UnsupportedLayouts.__FP16), /** * {@code wchar} type. */ WChar("wchar_t", UnsupportedLayouts.WCHAR_T);

Those types can be qualified, in particular integer types can be unsigned:

jdk.internal.jextract.impl.TypeMaker#makeTypeInternal
case UShort: { Type chType = Type.primitive(Primitive.Kind.Short); return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case UInt: { Type chType = Type.primitive(Primitive.Kind.Int); return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case ULong: { Type chType = Type.primitive(Primitive.Kind.Long); return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case ULongLong: { Type chType = Type.primitive(Primitive.Kind.LongLong); return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case UChar: { Type chType = Type.primitive(Primitive.Kind.Char); return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}

Going further we can see that signed and unsigned integers use the same memory layout, eg. long long and unsigned long long use the same layout C_LONG_LONG.

public static MemoryLayout getLayout(Type t) { Supplier<UnsupportedOperationException> unsupported = () -> new UnsupportedOperationException("unsupported: " + t.kind()); switch(t.kind()) { case UChar, Char_U: case SChar, Char_S: return Primitive.Kind.Char.layout().orElseThrow(unsupported); case Short: case UShort: return Primitive.Kind.Short.layout().orElseThrow(unsupported); case Int: case UInt: return Primitive.Kind.Int.layout().orElseThrow(unsupported); case ULong: case Long: return Primitive.Kind.Long.layout().orElseThrow(unsupported); case ULongLong: case LongLong: return Primitive.Kind.LongLong.layout().orElseThrow(unsupported); (1) case UInt128: case Int128: return Primitive.Kind.Int128.layout().orElseThrow(unsupported); (2) case Enum: return valueLayoutForSize(t.size() * 8).layout().orElseThrow(unsupported); case Bool: return Primitive.Kind.Bool.layout().orElseThrow(unsupported); case Float: return Primitive.Kind.Float.layout().orElseThrow(unsupported); case Double: return Primitive.Kind.Double.layout().orElseThrow(unsupported); case LongDouble: return Primitive.Kind.LongDouble.layout().orElseThrow(unsupported); case Complex: throw new UnsupportedOperationException("unsupported: " + t.kind()); case Record: return getRecordLayout(t); case Vector: return MemoryLayout.ofSequence(t.getNumberOfElements(), getLayout(t.getElementType())); case ConstantArray: return MemoryLayout.ofSequence(t.getNumberOfElements(), getLayout(t.getElementType())); case IncompleteArray: return MemoryLayout.ofSequence(getLayout(t.getElementType())); case Unexposed: Type canonical = t.canonicalType(); if (canonical.equalType(t)) { throw new TypeMaker.TypeException("Unknown type with same canonical type: " + t.spelling()); } return getLayout(canonical); case Typedef: case Elaborated: return getLayout(t.canonicalType()); case Pointer: case BlockPointer: return C_POINTER; default: throw new UnsupportedOperationException("unsupported: " + t.kind()); }
}
1 C_LONG_LONG will be used for both long long and unsigned long long.
2 Native types longer than 64 bits are still represented internally by jextract.

jextract identify unsupported types, and represents them correctly during the C header processing. But the symbols that use them will be skipped during the Java generation.

private static final String ATTR_LAYOUT_KIND = "jextract.abi.unsupported.layout.kind"; public static final ValueLayout __INT128 = MemoryLayout.ofValueBits(128, ByteOrder.nativeOrder()). withAttribute(ATTR_LAYOUT_KIND, "__int128"); public static final ValueLayout LONG_DOUBLE = MemoryLayout.ofValueBits(128, ByteOrder.nativeOrder()). withAttribute(ATTR_LAYOUT_KIND, "long double"); public static final ValueLayout _FLOAT128 = MemoryLayout.ofValueBits(128, ByteOrder.nativeOrder()). withAttribute(ATTR_LAYOUT_KIND, "_float128"); public static final ValueLayout __FP16 = MemoryLayout.ofValueBits(16, ByteOrder.nativeOrder()). withAttribute(ATTR_LAYOUT_KIND, "__fp16"); public static final ValueLayout CHAR16 = MemoryLayout.ofValueBits(16, ByteOrder.nativeOrder()). withAttribute(ATTR_LAYOUT_KIND, "char16"); public static final ValueLayout WCHAR_T = MemoryLayout.ofValueBits(16, ByteOrder.nativeOrder()). withAttribute(ATTR_LAYOUT_KIND, "wchar_t"); static boolean isUnsupported(MemoryLayout vl) { (1) return vl.attribute(ATTR_LAYOUT_KIND).isPresent();
} static String getUnsupportedTypeName(MemoryLayout vl) { return (String) vl.attribute(ATTR_LAYOUT_KIND).orElseThrow(IllegalArgumentException::new);
}
1 Invoked during java representation generation.

In the end jextract is useful but there’s a few little hiccups along the way. The generated code is currently lacking in some usability. Also, the generated code is a tad verbose, I would wish a way to eliminate some unneeded generated methods. Using jextract is a bit obscure as well, and they are a few pitfalls there too, and may require peeking at the jdk.incubating.jextract source code (in the panama repository).

While I mention these point, this should not diminish the work done on this tool and what this tool could become. When ready, this could be leveraged by Gradle, or Jetbrains IntelliJ IDEA, etc.

///Users/bric3/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.

comments powered by