PHP RFC: JIT


It's no secret that the performance jump of PHP 7 was originally initiated by attempts to implement JIT for PHP. We started these efforts at Zend (mostly by Dmitry) back in 2011 and since that time tried 3 different implementations. We never moved forward to propose to release any of them, for three main reasons: They resulted in no substantial performance gains for typical Web apps; They were complex to develop and maintain; We still had additional directions we could explore to improve performance without having to use JIT.

Even though most of the fundamentals for JIT-enabling PHP haven't changed - we believe there is a good case today for JIT-enabling PHP.

First, we believe we've reached the extent of our ability to improve PHP's performance using other optimization strategies. In other words - we can't further improve the performance of PHP unless we use JIT.

Secondly - using JIT may open the door for PHP being more frequently used in other, non-Web, CPU-intensive scenarios - where the performance benefits will actually be very substantial - and for which PHP is probably not even being considered today.

Lastly - making JIT available can provide us (with additional efforts) with the ability to develop built-in functions in PHP, instead of (or in addition to) C - without suffering the huge performance penalty that would be associated with such a strategy in today's, non-JITted engine. This, in turn, can open the door to faster innovation - and also more secure implementations, that would be less susceptible to memory management, overflows and similar issues associated with C-based development.

We propose to include JIT in PHP 8 and provide additional efforts to increase its performance and usability.

In addition, we propose to consider including JIT in PHP 7.4 as an experimental feature (disabled by default).

PHP JIT is implemented as an almost independent part of OPcache. It may be enabled/disabled at PHP compile time and at run-time. When enabled, native code of PHP files is stored in an additional region of the OPcache shared memory and op_array→opcodes[].handler(s) keep pointers to the JIT-ed code. This approach doesn't require engine modification at all.

We use DynAsm (developed for LuaJIT project) for generation of native code. It's a very lightweight and advanced tool, but does assume good, and very low-level development knowledge of target assembler languages. In the past we tried LLVM, but its code generation speed was almost 100 times slower, making it prohibitively expensive to use. Currently we support only x86 and x86_64 on POSIX platforms. Windows support should be relatively straightforward, but was (and still is) a low priority for us. DynAsm also supports ARM. ARM64, MIPS, MIPS64 and PPC, so in theory we should be able to support all of the platforms that are popular for PHP deployments (given enough efforts).

The quality of the JIT may be demonstrated on Mandelbrot benchmark published at https://gist.github.com/dstogov/12323ad13d3240aee8f1, where it improves performance more than 4 times (0.011 sec vs 0.046 sec on PHP 7.4).

 function iterate($x,$y) { $cr = $y-0.5; $ci = $x; $zr = 0.0; $zi = 0.0; $i = 0; while (true) { $i++; $temp = $zr * $zi; $zr2 = $zr * $zr; $zi2 = $zi * $zi; $zr = $zr2 - $zi2 + $cr; $zi = $temp + $temp + $ci; if ($zi2 + $zr2 > BAILOUT) return $i; if ($i > MAX_ITERATIONS) return 0; }
  }

The following is the complete assembler code generated for the PHP function above, with the main loop code visible between .L5 and .L7:

JIT$Mandelbrot::iterate: ; (/home/dmitry/php/bench/b.php) sub $0x10, %esp cmp $0x1, 0x1c(%esi) jb .L14 jmp .L1
.ENTRY1: sub $0x10, %esp
.L1: cmp $0x2, 0x1c(%esi) jb .L15 mov $0xec3800f0, %edi jmp .L2
.ENTRY2: sub $0x10, %esp
.L2: cmp $0x5, 0x48(%esi) jnz .L16 vmovsd 0x40(%esi), %xmm1 vsubsd 0xec380068, %xmm1, %xmm1
.L3: mov 0x30(%esi), %eax mov 0x34(%esi), %edx mov %eax, 0x60(%esi) mov %edx, 0x64(%esi) mov 0x38(%esi), %edx mov %edx, 0x68(%esi) test $0x1, %dh jz .L4 add $0x1, (%eax)
.L4: vxorps %xmm2, %xmm2, %xmm2 vxorps %xmm3, %xmm3, %xmm3 xor %edx, %edx
.L5: cmp $0x0, EG(vm_interrupt) jnz .L18 add $0x1, %edx vmulsd %xmm3, %xmm2, %xmm4 vmulsd %xmm2, %xmm2, %xmm5 vmulsd %xmm3, %xmm3, %xmm6 vsubsd %xmm6, %xmm5, %xmm7 vaddsd %xmm7, %xmm1, %xmm2 vaddsd %xmm4, %xmm4, %xmm4 cmp $0x5, 0x68(%esi) jnz .L19 vaddsd 0x60(%esi), %xmm4, %xmm3
.L6: vaddsd %xmm5, %xmm6, %xmm6 vucomisd 0xec3800a8, %xmm6 jp .L13 jbe .L13 mov 0x8(%esi), %ecx test %ecx, %ecx jz .L7 mov %edx, (%ecx) mov $0x4, 0x8(%ecx)
.L7: test $0x1, 0x39(%esi) jnz .L21
.L8: test $0x1, 0x49(%esi) jnz .L23
.L9: test $0x1, 0x69(%esi) jnz .L25
.L10: movzx 0x1a(%esi), %ecx test $0x496, %ecx jnz JIT$$leave_function mov 0x20(%esi), %eax mov %eax, EG(current_execute_data) test $0x40, %ecx jz .L12 mov 0x10(%esi), %eax sub $0x1, (%eax) jnz .L11 mov %eax, %ecx call zend_objects_store_del jmp .L12
.L11: mov 0x4(%eax), %ecx and $0xfffffc10, %ecx cmp $0x10, %ecx jnz .L12 mov %eax, %ecx call gc_possible_root
.L12: mov %esi, EG(vm_stack_top) mov 0x20(%esi), %esi cmp $0x0, EG(exception) mov (%esi), %edi jnz JIT$$leave_throw add $0x1c, %edi add $0x10, %esp jmp (%edi)
.L13: cmp $0x3e8, %edx jle .L5 mov 0x8(%esi), %ecx test %ecx, %ecx jz .L7 mov $0x0, (%ecx) mov $0x4, 0x8(%ecx) jmp .L7
.L14: mov %edi, (%esi) mov %esi, %ecx call zend_missing_arg_error jmp JIT$$exception_handler
.L15: mov %edi, (%esi) mov %esi, %ecx call zend_missing_arg_error jmp JIT$$exception_handler
.L16: cmp $0x4, 0x48(%esi) jnz .L17 vcvtsi2sd 0x40(%esi), %xmm1, %xmm1 vsubsd 0xec380068, %xmm1, %xmm1 jmp .L3
.L17: mov %edi, (%esi) lea 0x50(%esi), %ecx lea 0x40(%esi), %edx sub $0xc, %esp push $0xec380068 call sub_function add $0xc, %esp cmp $0x0, EG(exception) jnz JIT$$exception_handler vmovsd 0x50(%esi), %xmm1 jmp .L3
.L18: mov $0xec38017c, %edi jmp JIT$$interrupt_handler
.L19: cmp $0x4, 0x68(%esi) jnz .L20 vcvtsi2sd 0x60(%esi), %xmm3, %xmm3 vaddsd %xmm4, %xmm3, %xmm3 jmp .L6
.L20: mov $0xec380240, (%esi) lea 0x80(%esi), %ecx vmovsd %xmm4, 0xe0(%esi) mov $0x5, 0xe8(%esi) lea 0xe0(%esi), %edx sub $0xc, %esp lea 0x60(%esi), %eax push %eax call add_function add $0xc, %esp cmp $0x0, EG(exception) jnz JIT$$exception_handler vmovsd 0x80(%esi), %xmm3 jmp .L6
.L21: mov 0x30(%esi), %ecx sub $0x1, (%ecx) jnz .L22 mov $0x1, 0x38(%esi) mov $0xec3802b0, (%esi) call rc_dtor_func jmp .L8
.L22: mov 0x4(%ecx), %eax and $0xfffffc10, %eax cmp $0x10, %eax jnz .L8 call gc_possible_root jmp .L8
.L23: mov 0x40(%esi), %ecx sub $0x1, (%ecx) jnz .L24 mov $0x1, 0x48(%esi) mov $0xec3802b0, (%esi) call rc_dtor_func jmp .L9
.L24: mov 0x4(%ecx), %eax and $0xfffffc10, %eax cmp $0x10, %eax jnz .L9 call gc_possible_root jmp .L9
.L25: mov 0x60(%esi), %ecx sub $0x1, (%ecx) jnz .L26 mov $0x1, 0x68(%esi) mov $0xec3802b0, (%esi) call rc_dtor_func jmp .L10
.L26: mov 0x4(%ecx), %eax and $0xfffffc10, %eax cmp $0x10, %eax jnz .L10 call gc_possible_root jmp .L10

PHP 8 and PHP 7.4 (separate votes)

Make sure there are no open issues when the vote starts!

In PHP 8 we are going to improve JIT and perform optimized code generation after an initial profiling of hot functions. This would allow application of speculative optimizations and generation only the code that is really executed. It's also possible to do deeper integration of JIT with preloading and FFI, and perhaps a standardized way of developing (and providing) built-in functions that are written in PHP, and not just in C.

This project requires a 50%+1 majority.

As PHP 7.4 is already branched and its engine is not expected to be significantly changed (consequently requiring corresponding changes to the JIT implementation), we can also consider including JIT in PHP-7.4 as an experimental feature (disabled by default).

After the project is implemented, this section should contain

  1. the version(s) it was merged into

  2. a link to the git commit(s)

  3. a link to the PHP manual entry for the feature

  4. a link to the language specification section (if any)

Keep this updated with features that were discussed on the mail lists.