OpenCensus Web: Unlocking Full End-to-End Observability for Your Entire Stack


OpenCensus Web is a tool to trace and monitor the user-perceived performance of your web pages. It can help determine whether or not your web pages are experiencing performance issues that you might otherwise not know how to diagnose. Web application owners want to monitor the operational health of their applications so that they can better understand actual user performance; however, capturing relevant telemetry from your web applications is often very difficult. Today, we are introducing OpenCensus Web (OC Web) to make instrumenting and exporting metrics and distributed traces from web applications simple and automatic.
The OpenCensus project provides a set of language-specific instrumentation libraries that collect traces and metrics from applications and export them to tracing and monitoring backends like Prometheus, Zipkin, Jaeger, Stackdriver, and others.

The OpenCensus Web library is an implementation of OpenCensus that focuses on frontend web application code that executes in the browser. OC Web instruments web pages and collects user-side performance data, including latency and distributed traces, which gives developers the information to diagnose frontend issues and monitor overall application health.

Overshadowing the work on OC Web, the wider OpenCensus family of projects is merging with OpenTracing into OpenTelemetry. OpenCensus Web’s functionality will be migrated into OpenTelemetry JS once this project is ready, although OC Web will continue working as an alpha release in the meantime.


Architecture

OC Web interacts with three application components:
  • Frontend web server: renders the initial HTML to the browser including the OC Web library code and configuration. This would typically be instrumented with an OpenCensus server-side library (Go, Java, etc.). We also suggest that you create an endpoint in the server that receives HTTP/JSON traces and proxies to the OpenCensus Agent.
  • Browser JS: the OC Web library code that runs in the browser. This measures user interactions and collects browser data and writes them to the OpenCensus Agent as spans via HTTP/JSON.
  • OpenCensus Agent: receives traces from the frontend web server proxy endpoint or directly from the browser JS, and exports them to a trace backend (e.g. Stackdriver, Zipkin).
OC Web requires the OpenCensus Agent, which will proxy and re-export telemetry to your backend of choice. For more details see the documentation.


Features

Initial page load tracing

You can use OC Web to capture traces of initial page loads, which will even capture events that take place before the OC Web library was loaded by the browser! Initial page load traces show you which resources may be causing poor website performance, and contain data that you can’t typically capture from a distributed tracing system.

To measure the time of the overall initial page load interaction, OC Web waits until after the document load event and generates spans from the initial load performance timings via the browser's Navigation Timing and Resource Timing APIs. Below is a sample trace from OC Web that has been exported to Zipkin and captured from the initial load example app. Notice that there is an overall ‘nav./’ span for the user navigation experience until the browser load event fires.

This example also includes ‘/’ spans for the client and server side measurements of the initial HTML load. These spans are connected by the server sending back a ‘window.traceparent’ variable in the W3C Trace Context format, which is necessary because the browser does not send a trace context header for the initial page load. The server side spans also indicate how much time was spent parsing and rendering the template:

Notice the long js task span in the previous image, which indicates a CPU-bound JavaScript event loop that took 80.095ms, as measured by the Long Tasks browser API.


Span annotations for DOM and network events

Spans captured by OC Web also include detailed annotations for DOM events like `domInteractive` and `first-paint`, as well as network events like domainLookupStart and secureConnectionStart. Here is a similar trace exported to Stackdriver Trace with the annotations expanded:


User Interactions

For single page applications there are often subsequent interactions after the initial load (e.g. user clicks a button or navigates to a different section of the page). Measuring end-user interactions within a browser application adds useful data for your application:
  • Ability to relate an initial page render with subsequent on-page interactions
  • Visibility into slowness as perceived by the end user, for example, an unresponsive page after clicking
Currently, OC Web tracks clicks and route transitions by monkey-patching the Angular Zone.js library. OC Web tracks the subsequent synchronous and asynchronous tasks (e.g. setTimeouts, XHRs, etc.) caused by the interaction even if there are several concurrent interactions.
All browser click events are traced as long as the click is done in a DOM element (e.g. button) and the clicked element is not disabled. When the user clicks the element, a new Zone is created to measure this interaction and determine the total time.

To name this root span, we provide developers with the option of adding the attribute data-ocweb-id to elements and give a custom name to the interaction. For the next example, the resulting name will be ‘Save edit user info’:


<button type="submit" data-ocweb-id="Save edit user info">       Save changes </button>
This helps you to identify the traces related to a specific element. Also, this may avoid ambiguity when there are similar interaction. If you don’t add this attribute, OC Web will use the DOM element ID, the tag name plus the event involved in the interaction. For example, clicking this button:
<button id="save_changes"> Save changes </button>
will generate a span named : “button#save_changes click”.

Automatic tracing for route transitions

OC Web traces route transitions between the different sections of your page by monkey-patching the History API. OC Web will name these interactions with the pattern ‘Navigation /path/to/page’. The following screenshot of a trace exported to Stackdriver from the user interaction example shows a Navigation trace which includes several network calls before the route transition is complete:

OC Web allows you to instrument your web application with custom spans for tasks or code involved in a user interaction. Here is a code snippet that shows how to do this:

import { tracing } from '@opencensus/web-instrumentation-zone';
  // Start child span of the current root span on the current interaction.
  // This must run in in code that the button is running.
  const childSpan = tracing.tracer.startChildSpan({
    name: 'name of your child span'
  // Finish the child span at the end of it's operation

See the OC Web documentation for more details.

Automatic spans for HTTP requests and Browser performance data

OC Web automatically intercepts and generates spans for HTTP requests generated by user interactions. Additionally, OC Web attaches Trace Context Headers to each intercepted HTTP request, using the W3C Trace Context format. This is only done for same-origin requests or requests that match a provided regex. If your servers are also instrumented with OpenCensus, these requests will continue to be traced throughout your backend services! This lets you know if the issues are related to either the front-end or the server-side.

OC Web also includes Performance API data to make annotations like domainLookupStart and responseEnd and generates spans for any CORS preflight requests.

The next screenshot shows a trace exported to Stackdriver as result of the user interaction example. There, you can see the several network calls with the automatic generated spans (e.g. ‘Sent./sleep’) with annotations, the server-side spans (e.g. ‘/sleep’ and ‘ocweb.handlerequest’) and CORS Preflight related spans:



Relate user interactions back to the initial page load tracing

OC Web attaches the initial page load trace id to the user interactions as an attribute and a span link. This enables you to do a trace search by attribute to find the initial load trace and its interactions traces via a single attribute query as well as letting you understand the whole navigation of a user through the application for a given page load.

The next screenshot shows a search by initial_load_trace_id attribute containing all user interaction traces after the initial page loaded:


Making it Real

With OC Web and a few lines of instrumentation, you can now export distributed traces from your web application. Start exploring the initial load and user interaction examples and you're welcome to poke around the source code and send us feedback via either Gitter or contributing with Pull Requests!

By Cristian González – OpenCensus Team – Software Engineering intern at Google Summer 2019 and student of Computer and Systems Engineering at Universidad Nacional de Colombia.

Special thanks to Dave Raffensperger for being initial creator of OC Web and guiding me in the process to develop i

t.