Contributing the Go Compiler: Adding New Tilde (~) Operator

By Furkan Türkal

The middle-end, also known as an optimizer, performs optimizations on the Intermediate Representation to improve the performance and the quality of the produced machine code. The middle end contains those optimizations that are independent of the CPU architecture being targeted. [11]

The main phases of the middle end include the following:

3.1. Intermediate Representation (IR)

High-Level Overview of IR [8]

^ Here’s the 30,000-foot view of how Machine Code is generating.

An intermediate representation is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive for further processing, such as optimization and translation.

An intermediate language is the language of an abstract machine designed to aid in the analysis of computer programs. The term comes from their use in compilers, where the source code of a program is translated into a form more suitable for code-improving transformations before being used to generate object or machine code for a target machine.

Use of an intermediate representation allows compiler systems like the GNU Compiler Collection and LLVM to be used by many different source languages to generate code for many different target architectures. [7]

All Abstract Syntax Representation related stuff stored under ./src/cmd/compile/internal/ir/ as ir package.

3.1.1. Node

Node is the abstract interface to an IR node.

We need to define the new opcode as OCOM under consts:

type Op uint8

// Node ops.
const
(


OXXX Op = iota ... OPLUS // +X ONEG // -X OCOM // ~X ...

)

After we declared new Node here, we need to run $ stringer -type=Op -trimprefix=O node.go again to automatically generate OCOM OpCode and do update to ./ir/op_string.go file:

Stringer to ir/op_string.go

By doing so, now we can easily convert the OpCode to String using [1]struct{} map:

func (i Op) String() string {
if i >= Op(len(_Op_index)-1) {
return "Op(" + strconv.FormatInt(int64(i), 10) + ")" }

return _Op_name[_Op_index[i]:_Op_index[i+1]]


}

Now what we need to do here is that we have to update the unary map table that declared under ./noder/noder.go.

Find the unOps table:

var unOps = [...]ir.Op { syntax.Recv: ir.ORECV, ... syntax.Tilde: ir.OCOM

}

We defined new a Syntax token here, and thus compiler will not throw invalid Operator panic anymore:

func (p *noder) unOp(op syntax.Operator) ir.Op {
if uint64(op) >= uint64(len(unOps)) || unOps[op] == 0 {
panic("invalid Operator") }

return unOps[op]


}

I think it’s a good checkpoint to see how the error is changed after we implemented those. Run the ./test.sh file:

panic: cannot SetOp COM on XXX
goroutine 1 [running]:cmd/compile/internal/ir.(*UnaryExpr).SetOp(0xc000076230, 0x1a682e0) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/ir/expr.go:672 +0x10ccmd/compile/internal/ir.NewUnaryExpr(…) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/ir/expr.go:665cmd/compile/internal/noder.(*noder).expr(0x100cc5b, 0x1a71358, 0xc0004000c0) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/noder/noder.go:751 +0x1887

cmd/compile/internal/noder.(*noder).exprs(0x231d000, 0xc000064030, 0x1, 0x203000)

Wow, the panic message is changed. Let’s investigate that where the error is coming from.

3.1.2. Expr

Expr is a Node that can appear as an expression.

type Expr interface { Node isExpr()

}

We initialize the unary operator at ./ir/expr.go in a function called NewUnaryExpr():

// A UnaryExpr is a unary expression Op X,// or Op(X) for a builtin function that does not end up being a call.

type

UnaryExpr struct { miniExpr X Node}

func NewUnaryExpr(pos src.XPos, op Op, x Node) *UnaryExpr {

n := &UnaryExpr{X: x} n.pos = pos n.SetOp(op)

return n


}

But n.SetOp(op) function throw error here because it falls into the default case, which is throwing panic:

func (n *UnaryExpr) SetOp(op Op) {
switch op {
default:
panic(n.no("SetOp " + op.String()))
case OBITNOT, ONEG, OCOM, ..., ..., OVARLIVE: n.op = op

}

Adding OCOM to the just right of ONEG will fix the problem. Now jump to SameSafeExpr() function and add OCOMto UnaryExpr casting case.

// SameSafeExpr checks whether it is safe to reuse one of l and r// instead of computing both. SameSafeExpr assumes that l and r are// used in the same statement or expression.

func SameSafeExpr

(l Node, r Node) bool {
if l.Op() != r.Op() || !types.Identical(l.Type(), r.Type()) {
return false } ...

case ONOT, OBITNOT, OPLUS, ONEG, OCOM:

l := l.(*UnaryExpr) r := r.(*UnaryExpr)

return SameSafeExpr(l.X, r.X)

...

}

Rerun the tests:

typecheck [0xc0001398b0]. COM tc(2) # neg.go:4. . LITERAL-15 untyped int # neg.go:4

./neg.go:4:13: internal compiler error: typecheck COM

goroutine 1 [running]:runtime/debug.Stack() /Users/furkan.turkal/src/public/go/src/runtime/debug/stack.go:24 +0x65cmd/compile/internal/base.FatalfAt(0xc0001398b0, 0x191d86a, 0x1a7fb48, 0xc000146be0, 0x1824634, 0x188e860) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:227 +0x157cmd/compile/internal/base.Fatalf(...) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:196cmd/compile/internal/typecheck.typecheck1(0x1a7fb48, 0xc0001398b0, 0x12) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/typecheck/typecheck.go:484 +0x298acmd/compile/internal/typecheck.typecheck(0x1a7fb48, 0xc0001398b0, 0x12) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/typecheck/typecheck.go:371 +0x4b0cmd/compile/internal/typecheck.typecheckslice(0xc0001028c0, 0x1, 0x11ab40a, 0x1a7eba8)

/Users/furkan.turkal/src/public/go/src/cmd/compile/internal/typecheck/typecheck.go:177 +0x68

What we need to do here is simply define our new syntax token in the typecheck.go file. Then, find the typecheck1() function, and jump to theunaryExpr case.

// typecheck1 should ONLY be called from typecheck.
func
typecheck1(n ir.Node, top int) ir.Node { ...

case ir.OBITNOT, ir.ONEG, ir.ONOT, ir.OPLUS, ir.OCOM:

n := n.(*ir.UnaryExpr)

return tcUnaryArith(n)

...

}

So we may want to know what happens in the function called tcUnaryArith(n). So let’s investigate it:

// tcUnaryArith typechecks a unary arithmetic expression.
func
tcUnaryArith(n *ir.UnaryExpr) ir.Node { ...

if !okfor[n.Op()][defaultType(t).Kind()] {

base.Errorf("invalid operation: %v (operator %v not defined on %s)", n, n.Op(), typekind(t)) n.SetType(nil)

return n

} ...

}

If we’d rerun the test, we’d have panicked here due to this check:!okfor[n.Op()][defaultType(t).Kind()]

Jump to ./typecheck/universe.go file. You will see two vars at the top of the file:

var (
okfor [ir.OEND][]bool
iscmp [ir.OEND]bool
)

I was really impressed when I first saw this file. We are defining the OpCodes what operations they will do.

var (
okforeq [types.NTYPE]bool
okforadd [types.NTYPE]bool
okforand [types.NTYPE]bool
okfornone [types.NTYPE]bool
okforbool [types.NTYPE]bool
okforcap [types.NTYPE]bool
okforlen [types.NTYPE]bool
okforarith [types.NTYPE]bool
)

The OpCode we created OCOM should able to do some arithmetic operations as we mentioned in the proposal:

  • We should not allow ~ for bools. i.e. ~true (since Go already support this with !true) So, not okforbool
  • We should allow ~ for arithmetics. i.e. ~7. So, okforarith
  • There are no such things called !~, ~~, ~> or <~, So, eliminate all the others but okforarith

Let’s do a little operation on the InitUniverse() function as we mentioned above:

// InitUniverse initializes the universe block.
func InitUniverse
() { ... ... ...

// unary
okfor[ir.OBITNOT] = okforand[:]


okfor[ir.ONEG] = okforarith[:]
okfor[ir.OCOM] = okforarith[:] ...

}

So, the okfor[n.Op()][defaultType(t).Kind()] function will never return false once we put the OCOM in okfor map.

Run the ./test.sh again:

./neg.go:4:12: internal compiler error: unexpected untyped expression: <node COM>
goroutine 1 [running]:runtime/debug.Stack() /Users/furkan.turkal/src/public/go/src/runtime/debug/stack.go:24 +0x65cmd/compile/internal/base.FatalfAt(0xc0004101e0, 0x1930ce3, 0x9844128, 0xc00011eeb8, 0x12200e8, 0x1a7fb68) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:227 +0x157cmd/compile/internal/base.Fatalf(…) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:196cmd/compile/internal/typecheck.convlit1(0x1a7fb68, 0xc0004101e0, 0x8, 0x203000, 0x0)

/Users/furkan.turkal/src/public/go/src/cmd/compile/internal/typecheck/const.go:123 +0xb8c

Whops, we forgot the modify ./typecheck/const.go file.

We did not define the Token inside the tokenForOp map:

var tokenForOp = [...]token.Token { ...

}

Find the convlit1 function:

// convlit1 converts an untyped expression n to type t. If n already// has a type, convlit1 has no effect.//

func

convlit1(n ir.Node, t *types.Type, explicit bool, context func() string) ir.Node { ... ... ...

case ir.OPLUS, ir.ONEG, ir.OCOM, ...:

... ... ...

}

Run the ./test.sh again:

./neg.go:4:13: internal compiler error: unexpected expr: COM <node COM>
goroutine 1 [running]:runtime/debug.Stack() /Users/furkan.turkal/src/public/go/src/runtime/debug/stack.go:24 +0x65cmd/compile/internal/base.FatalfAt(0x1942672, 0x19273d4, 0x8, 0xc00011efa8, 0x100cc5b, 0x116fff9) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:227 +0x157cmd/compile/internal/base.Fatalf(…) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:196cmd/compile/internal/escape.(*escape).exprSkipInit(0xc0004101e0, 0xc0004181c0, 0x0, 0x0, 0xc000410000, 0x1a7fb68, 0xc0004101e0)

/Users/furkan.turkal/src/public/go/src/cmd/compile/internal/escape/escape.go:590 +0x2025

Wow, it looks like it’s time to move on to Escape Analysis.

3.2. Escape Analysis

Escape analysis, one of the phases of the Go compiler. It analyses the source code and determines what variables should be allocated on the stack and which ones should escape to the heap. Go statically defines what should be heap or stack-allocated during the compilation phase. This analysis is available via the flag -gcflags="-m" when compiling and/or running your code. [12]

Here we analyze functions to determine which Go variables can be allocated on the stack. The two key invariants we have to ensure are: (1) pointers to stack objects cannot be stored in the heap, and (2) pointers to a stack object cannot outlive that object (e.g., because the declaring function returned and destroyed the object’s stack frame or its space is reused across loop iterations for logically distinct variables).

We implement this with static data-flow analysis of the AST. First, we construct a directed weighted graph where vertices (termed “locations”) represent variables allocated by statements and expressions, and edges represent assignments between variables (with weights representing addressing/dereference counts).

Next, we walk the graph looking for assignment paths that might violate the invariants stated above. If a variable v’s address is stored in the heap or elsewhere that may outlive it, then v is marked as requiring heap allocation.

To support interprocedural analysis, we also record data-flow from each function’s parameters to the heap and to its result parameters. This information is summarized as “parameter tags”, which are used at static call sites to improve escape analysis of function arguments. [13]

Find the exprSkipInit function:

func (e *escape) exprSkipInit(k hole, n ir.Node) { ...

switch n.Op() {


default: base.Fatalf("unexpected expr: %s %v", n.Op().String(), n) ...

case ir.OPLUS, ir.ONEG, ir.OCOM, ir.OBITNOT:

n := n.(*ir.UnaryExpr) e.unsafeValue(k, n.X) ...

}

Find the unsafeValue function:

// unsafeValue evaluates a uintptr-typed arithmetic expression looking// for conversions from an unsafe.Pointer.

func

(e *escape) unsafeValue(k hole, n ir.Node) { ...

switch n.Op() {

...

case ir.OPLUS, ir.ONEG, ir.OCOM, ir.OBITNOT:

n := n.(*ir.UnaryExpr) e.unsafeValue(k, n.X) ...

}

Find the mayAffectMemory function:

// mayAffectMemory reports whether evaluation of n may affect the program's// memory state. If the expression can't affect memory state, then it can be// safely ignored by the escape analysis.

func

mayAffectMemory(n ir.Node) bool { ...

switch n.Op() {

...

case ir.OLEN, ..., ir.ONEG, ir.OCOM, ir.OALIGNOF, ...:

n := n.(*ir.UnaryExpr)

return mayAffectMemory(n.X)

Run the ./test.sh again:

walk [0xc0004101e0]. COM tc(1) int # neg.go:4 int. . LITERAL-15 tc(1) int # neg.go:4

./neg.go:4:13: internal compiler error: walkExpr: switch 1 unknown op COM

goroutine 1 [running]:runtime/debug.Stack() /Users/furkan.turkal/src/public/go/src/runtime/debug/stack.go:24 +0x65cmd/compile/internal/base.FatalfAt(0xc0004101e0, 0x1930e0c, 0x1a7fb68, 0xc0001272a0, 0xc00009d2b0, 0xc00009d2b0) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:227 +0x157cmd/compile/internal/base.Fatalf(…) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:196cmd/compile/internal/walk.walkExpr1(0x1a7fb68, 0xc0004101e0, 0xc0004101e0)

/Users/furkan.turkal/src/public/go/src/cmd/compile/internal/walk/expr.go:82 +0xe9f

3.3. Walk

We walk the given ir.Func and their all statements here.

func Walk(fn *ir.Func) { ir.CurFunc = fn ... order(fn) ... walkStmtList(ir.CurFunc.Body) ...

}

A Func corresponds to a single function in a Go program (and vice versa: each function is denoted by exactly one *Func).

There are multiple nodes that represent a Func in the IR:* The ONAME node (Func.Nname) is used for plain references to it.* The ODCLFUNC node (the Func itself) is used for its declaration code.

*The OCLOSURE node (Func.OClosure) is used for a reference to a function literal.

type Func struct { miniNode Body Nodes

Nname *Name // ONAME node
OClosure *ClosureExpr // OCLOSURE node


// ONAME nodes for all params/locals for this func/closure, does NOT include closurevars until transforming closures during walk.
Dcl []*Name

// ClosureVars lists the free variables that are used within a // function literalClosureVars []*Name

// Enclosed functions that need to be compiled, populated during walk.
Closures []*Func

...

// Parents records the parent scope of each scope within a // function.Parents []ScopeID


...
Pragma PragmaFlag // go:xxx function annotation.
...
NumDefers int32 // number of defer calls in the function
NumReturns int32 // number of explicit returns in the function. ...}

func NewFunc(pos src.XPos) *Func {

f := new(Func) f.pos = pos

f.op = ODCLFUNC ...

return

f


}

Jump to ./walk/expr.go file and find walkExpr1 function:

func walkExpr1(n ir.Node, init *ir.Nodes) ir.Node {
switch n.Op() {
default: ir.Dump("walk", n) base.Fatalf("walkExpr: switch 1 unknown op %+v", n.Op())

panic("unreachable")

...

case ir.ONOT, ir.ONEG, ir.OCOM, ir.OPLUS, ..., ir.OIDATA:

n := n.(*ir.UnaryExpr) n.X = walkExpr(n.X, init)

return n

...

}

Just to ./walk/walk.go file and find mayCall function:

// mayCall reports whether evaluating expression n may require// function calls, which could clobber function call arguments/results// currently on the stack.

func

mayCall(n ir.Node) bool { ...

return ir.Any(n, func(n ir.Node) bool {

...

switch n.Op() {

...

// When using soft-float, these ops might be rewritten to function calls // so we ensure they are evaluated first.

case

ir.OADD, ir.OSUB, ir.OMUL, ir.ONEG, ir.OCOM:


return ssagen.Arch.SoftFloat && isSoftFloat(n.Type()) ... ...

}

Run the ./test.sh:

./neg.go:4:13: internal compiler error: ‘main’: unhandled expr COM
goroutine 9 [running]:runtime/debug.Stack() /Users/furkan.turkal/src/public/go/src/runtime/debug/stack.go:24 +0x65cmd/compile/internal/base.FatalfAt(0xc00002414c, 0xc00009ef70, 0x10, 0xc0003cb780, 0x10, 0x2094108) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:227 +0x157cmd/compile/internal/base.Fatalf(…) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/base/print.go:196cmd/compile/internal/ssagen.(*ssafn).Fatalf(0xc00039c1e0, 0x0, 0x1922c8b, 0x0, 0xc000064570, 0x1, 0x0) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/ssagen/ssa.go:7481 +0x187cmd/compile/internal/ssagen.(*state).Fatalf(…) /Users/furkan.turkal/src/public/go/src/cmd/compile/internal/ssagen/ssa.go:976cmd/compile/internal/ssagen.(*state).expr(0xc0003f8100, 0x1a7fb68, 0xc00039c1e0)

/Users/furkan.turkal/src/public/go/src/cmd/compile/internal/ssagen/ssa.go:3198 +0x8c44

Here we are. The SSA. Let’s see what SSA means exactly in the next section. It’s time to move on to Compiler Back-End.