Introducing Package Diff


GitHub is a great tool to view modifications to code repositories. It is the de facto source code repository for the Open Source community. Likewise, npm is a great package repository for distributing JavaScript modules, and is also the repository of choice for the Node.js community. That said, neither GitHub nor any code repository provides a perfect 1:1 relationship with between code commits and npm package releases.

By convention, running npm version will create a git tag with a format like v1.0.0, but again, there isn’t a strict enforcement that git tag is the snapshot of code which is released to npm. In fact, it often isn’t, as an author may create a version, modify their README in a follow-up commit, before finally running npm publish. There are even prepublish scripts and .npmignore files which ensure differences between the source code repository and package contents.

There is no requirement that code being uploaded in an npm module is equivalent to the code stored publicly in a git repository.

That’s what we built Package Diff for. That and inspiration from Mikeal Rogers:

This tool does essentially what Mikeal described: It downloads package tarballs from the npm repository (the only source of truth about a package’s contents), and recursively compares the differences between files within the package.

Each comparison is represented a permalink containing the package name and two version numbers. A table of contents to the left lists the files changed between the two versions. A list of each changed file, along with a few lines for context, are provided in the right column.

Screenshot of Package Diff showing an introduced vulnerability

This tool can be used for many purposes. Developers can use it to view changes in their modules over time, perhaps to discover why it has increased in size. It can also be used in conversations to explain why a new package release has violated semver. It can even be used to provide a convenient GUI for describing diffs if a package isn’t hosted in a repository with a UI, such as a personal git repository.

The use-case we’re most excited about at Intrinsic is security audits. As mentioned before, the underlying git repository cannot always be trusted to show package differences (such was the case with the event-stream incident), which means a GitHub URL won’t always cut it. We’re excited with the potential for this tool to be used for malicious package postmortems, either accidental or actively malicious, since Package Diff will always show the exact differences between package releases. For example, here is a list of a few security issues introduced into modules:

Please experiment with Package Diff and reply in the comments with any interesting package comparisons you find!

This article was written by me, Thomas Hunter II. I work at a company called Intrinsic (btw, we’re hiring!) where we specialize in writing software for securing Node.js applications. We currently have a product which follows the Least Privilege model for securing applications. Our product proactively protects Node.js applications from attackers, and is surprisingly easy to implement. If you are looking for a way to secure your Node.js applications, give us a shout at hello@intrinsic.com.

Original Banner Photo by Alberto Restifo.