ECMAScript regular expressions are getting better! · Mathias Bynens

Support for regular expressions was added to ECMAScript 3 in 1999.

Sixteen years later, ES6/ES2015 introduced Unicode mode (the u flag), sticky mode (the y flag), and the RegExp.prototype.flags getter.

This article highlights what’s happening in the world of JavaScript regular expressions right now. Spoiler: it’s quite a lot — there are more RegExp-related proposals currently advancing through the TC39 standardization process than there have been updates to RegExp in the history of ECMAScript!

We’ll discuss the following ES2018 features and ECMAScript proposals:

dotAll mode (the s flag)

By default, . matches any character except for line terminators:

// → false

(It doesn’t match astral Unicode symbols either, but we fixed that by enabling the u flag.)

ES2018 introduces dotAll mode, enabled through the s flag. In dotAll mode, . matches line terminators as well.

// → true

Lookbehind assertions

Lookarounds are zero-width assertions that match a string without consuming anything. ECMAScript currently supports lookahead assertions that do this in forward direction. Positive lookahead ensures a pattern is followed by another pattern:

const pattern = /\d+(?= dollars)/u;const result = pattern.exec('42 dollars');

// → result[0] === '42'

Negative lookahead ensures a pattern is not followed by another pattern:

const pattern = /\d+(?! dollars)/u;const result = pattern.exec('42 pesos');

// → result[0] === '42'

ES2018 adds support for lookbehind assertions. Positive lookbehind ensures a pattern is preceded by another pattern:

const pattern = /(?<=\$)\d+/u;const result = pattern.exec('$42');

// → result[0] === '42'

Negative lookbehind ensures a pattern is not preceded by another pattern:

const pattern = /(?<!\$)\d+/u;const result = pattern.exec('€42');

// → result[0] === '42'

Named capture groups

Currently, each capture group in a regular expression is numbered and can be referenced using that number:

const pattern = /(\d{4})-(\d{2})-(\d{2})/u;const result = pattern.exec('2017-01-25');// → result[0] === '2017-01-25'// → result[1] === '2017'// → result[2] === '01'

// → result[3] === '25'

This is useful, but not very readable or maintainable. Whenever the order of capture groups in the pattern changes, the indices need to be updated accordingly.

ES2018 adds support for named capture groups, enabling more readable and maintainable code.

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;const result = pattern.exec('2017-01-25');// → result.groups.year === '2017'// → result.groups.month === '01'

// → === '25'

Unicode property escapes

The Unicode Standard assigns various properties and property values to every symbol. For example, to get the set of symbols that are used in the Greek script, search the Unicode database for symbols whose Script_Extensions property is set to Greek.

Unicode property escapes make it possible to access these Unicode character properties natively in ECMAScript regular expressions. For example, the pattern \p{Script_Extensions=Greek} matches every symbol that is used in the Greek script.

const regexGreekSymbol = /\p{Script_Extensions=Greek}/u;regexGreekSymbol.test('π');

// → true

Previously, developers wishing to use equivalent regular expressions in JavaScript had to resort to large run-time dependencies or build scripts, both of which lead to performance and maintainability problems. With built-in support for Unicode property escapes, creating regular expressions based on Unicode properties couldn’t be easier.


A common use case of global (g) or sticky (y) regular expressions is applying it to a string and iterating through all of the matches, including capturing groups. The String.prototype.matchAll proposal makes this easier than ever before.

const string = 'Magic hex numbers: DEADBEEF CAFE 8BADF00D';const regex = /\b[0-9a-fA-F]+\b/g;for (const match of string.matchAll(regex)) {	console.log(match);


The match object for each loop iteration is equivalent to what regex.exec(string) would return.

// Iteration 1:[	'DEADBEEF',	index: 19,	input: 'Magic hex numbers: DEADBEEF CAFE 8BADF00D']// Iteration 2:[	'CAFE',	index: 28,	input: 'Magic hex numbers: DEADBEEF CAFE 8BADF00D']// Iteration 3:[	'8BADF00D',	index: 33,	input: 'Magic hex numbers: DEADBEEF CAFE 8BADF00D'


Keep in mind that this proposal is still in the process of being standardized, and as such, its API is subject to change. The descriptions and code examples in this article match the latest versions of the proposal at the time of writing. This proposal can make it into ES2019, at the earliest.

Legacy RegExp features

Another proposal specifies certain legacy RegExp features, such as the RegExp.prototype.compile method and the static properties from RegExp.$1 to RegExp.$9. Although these features are deprecated, unfortunately they cannot be removed from the web platform without introducing compatibility issues. Thus, standardizing their behavior and getting engines to align their implementations is the best way forward. This proposal is important for web compatibility.

Hi there! I’m Mathias. I work on V8 at Google. HTML, CSS, JavaScript, Unicode, performance, and security get me excited. If you managed to read this far without falling asleep, you should follow me on Twitter and GitHub.