Running untrusted JavaScript as a SaaS is hard. This is how I tamed the demons.

Building secure and robust SaaS platforms enabling custom JavaScript execution has become a critical competency. Driven by surging adoption of microservice and serverless architectures, more web apps require sandboxing untrusted code now than ever before.

Yet acquisition after acquisition has shown cracks forming under the weight of scale. The demons come out to play.

This 2600+ word guide dives deep across the layered defenses implemented while creating an automated browser testing stack powered by arbitrary user-provided scripts. You‘ll learn:

  • The immense risks involved with executing unvetted external code
  • Strategies employed to contain and compartmentalize damage
  • How applying limits both encouraged innovation and prevented exploitation

Let‘s start at the beginning – understanding the compelling reasons for allowing Pandora‘s box to be opened in the first place…

The Rising Tide of JavaScript-based Attacks

With JavaScript powering functionality on over 97% of websites, it represents the world‘s most ubiquitous and portable programming language. Consequently, web apps leveraging rich client-side experiences are surging in popularity across both consumer and enterprise sectors.

However, increased reliance on externally sourced JavaScript for mission critical services brings heightened risks. Attackers leveraging script injection have found a highly effective and stealthy new vector for infiltration.

Rise of JavaScript-based attacks across industries

High profile supply chain attacks through compromised NPM packages have amplified concerns. And dependecy confusion issues in downstream assemblers like Webpack have introduced entire classes of exploits.

In response, services allowing some degree of dynamic script execution are putting mitigation layers in place. Striking the right balance between developer experience and air tight security has proved enormously difficult.

Real-World JavaScript Attacks in Action

To ground the threat model involved with sandboxing user scripts, let‘s walk through real world examples of JavaScript-based attacks in action:

Crypto Mining

By spinning up background mining threads, attackers can profitably syphon away CPU cycles. In-browser mining payloads represent the most common client side attacks today. Without containerization, server resources could be secretly siphoned:

async function illicitMining() {

  while (true) {
    await mineCryptocurrency(); // fthn.wtf library
  } 

}

illicitMining();

Command Execution

Despite running in a restricted environment, savvy attackers utilize JavaScript‘s async facilities to break out and access the underlying system:

import { exec } from ‘child_process‘;

exec(‘rm -rf /‘, (err, stdout, stderr) => {
   // Folder deletion attempted! 
});

Resource Exhaustion

Common yet disruptive attacks involve bringing environments to their knees by exhausting critical resources – crashing threads, starving CPU, and filling disk space:

while(true) {
  let filler = crypto.randomBytes(1000000); 
}

This code repeatedly generates 1MB buffers without releasing, rapidly escalating memory utilization.

Session Hijacking

Attackers may attempt to access cookies or storage from other user sessions running in contextually shared processes. Code and data both require isolation:

const sessions = require(‘/tmp/sessions.json‘);
console.log(sessions); // Prints all active user sessions!

Supply Chain Backdoors

Injected downstream dependencies allow bad actors to stealthily compromise thousands of apps relying on entrusted packages:

npm i [email protected] // Malicious version published  
import { hack } from ‘backdoored-lib‘;
hack(); // Exfiltrates secrets!

This small sample of dirty JavaScript tricks illustrates risks involved with community contributed code. Even strict sandboxing may not eliminate all attack surfaces…

The Journey to Safely Enabling User Script Execution

My SaaS requirement for allowing client code execution stemmed from a browser test automation suite I set out to build named Skriptable.

The vision was enabling teams to easily record, parameterize, schedule and monitor Puppeteer scripts exercising critical user journeys across websites and web apps. Implicit in this goal was safely running arbitrary JavaScript provided by users on internal infrastructure.

Despite understanding the substantial risks early on, I believed striking the right balance was achievable without compromising user experience. The key would be applying layered controls for damage containment rather than eliminating flexibility.

Weighing Tradeoffs Around Sandbox Strictness

More than any other factor, the level of sandboxing security controls impacts developer experience. Aggressive lockdowns translate to reduced functionality and convenience.

I needed to take a pragmatic stance on balancing safety with usability. Otherwise adoption would undoubtedly suffer.

Here were my guiding principles on constraints:

  • Stable – Average scripts unlikely to crash environments
  • Performant – Tests execute reasonably quickly
  • Functional – Utilities like Lodash available out of box
  • Flexible – Handles diverse browser testing use cases
  • Secure – Greatly reduces attack surface and blast radius

Note capabilities like downloading unverified binaries or importing native Node.js modules are intentionally omitted. The tradeoff against unconstrained open source contributions tilted strongly in favor of curation over anarchy.

With clear goals set, it was time to start coding against compromise…

Layer 1: Request Throttling to Prevent Resource Exhaustion

The first requirement involved adding capacity controls by throttling traffic to API endpoints accepting scripts. Unchecked volumes risk resource exhaustion through simultaneous runaway processes.

To avoid rocky launches seen by services overwhelmed by early popularity, request limiting provided an early mechanism for graceful degradation under load. Kubernetes pod autoscaling would provide additional elasticity on the worker tier.

Rate limiting involved implementing an express middleware layer in the API gateway:

import rateLimit from ‘express-rate-limit‘;

app.use(
  rateLimit({
    windowMs: 60 * 1000, 
    max: 100, 
    handler: (req, res, next) => {
      res.status(429).send("Too many requests!");
    }
  })
);

Now all endpoints had surge protection against excessive requests. But even at steady state traffic, directly executing scripts in server contexts still posed high risks…

Layer 2: Asynchronous Job Processing

The next refactor involved moving script execution out of web nodes into isolated background workers triggered via asynchronous job queues.

Shift user code execution into async background workers

This well proven distributed computing pattern brings huge fault tolerance advantages:

  • Decouples processes – Separates concerns between user facing and processing tiers
  • Adds reliability – Retries failed executions without blocking workflows
  • Enables scaling – Workers increase parallelism without downtime

Implementation complexities did rise substantially – from queue provisioning, message packaging, logging aggregation and state notifications.

Thankfully battle hardened open source message brokers like RabbitMQ lowered the effort here. The hard part remained appropriately sandboxing the worker execution contexts…

Layer 3: Launcher/Runner Containers for Process Isolation

With a scalable async job pipeline in place, attention shifted to hardening the worker processes actually executing user scripts.

Initial experiments running code directly within worker processes enabled trivial access to all environment variables, files and more. Even simple scripts could easily escalate privileges:

import fs from ‘fs‘;
const secrets = JSON.parse(fs.readFileSync(‘/etc/secrets‘));
console.log(secrets); // Oops!

The isolation goal transitioned from purely sandboxing JavaScript to isolating processes themselves. This would prevent privilege leaks even from underlying runtimes.

Adopting a single purpose launcher/runner architecture introduced the requisite separation:

Launcher/Runner Process Isolation

  • Launcher – Coordinates queues, triggers runners, handles logging
  • Runner – Executes untrusted code in disposable containers

Implementation Challenges with Process Sandboxing

Significant effort went into hardening interprocess communications between launchers and runners:

  • Directions – Restricting bidirectional command flows
  • Payloads – Scrubbing & encoding all messages
  • Timeouts – Preventing hangs from freezing execution
  • Rejection – Limiting invalid message types

Especially tricky was handling programmatic output from sandboxed runnables back up to callers.

Ultimately a uni-directional firehose pub-sub stream for all console output proved necessary given restrictions preventing access to host filesystems for log writing.

Process isolation constraints ultimately mandated replacing simple method invocations with a dedicated messaging bus among containers.

Layer 4: Docker Containers for OS-Level Sandboxing

The next layer of security protections locked down the operating system layer.

Docker containers provide user space sandboxesideal for encapsulating untrustworthy processes and languages like JavaScript. Beyond stability wins, containers enabled precisely limiting resources available to processes via Linux control groups:

docker run -it --cpus=.5 --memory=512m node:12-alpine

Here CPU allocations are capped at 50% of a core while max memory set to 512 MB. Restricting hardware access hardens security.

Templatizing containers also simplified preparing identical environments reducing variability:

/Dockerfile

FROM node:12-alpine
RUN apk add --no-cache chromium
WORKDIR /home/node/app
...

The declarative specifications serve as version controlled golden images. Expanding server scale becomes possible through immutable infrastructure patterns.

Integration challenges still arose around granting just enough privileges for Puppeteer to drive Chromeheadless inside containers while limiting damage potential. Striking the right capability concessions following principle of least privilege remains an iterative process.

Browser Automation in Containers Tradeoffs

Significant configuration tweaking was required for Node + Puppeteer to function smoothly within containers:

  • Video recording – Implemented shred filesystem instead of host writing
  • Screenshots – Streamed images rather than host disk dumps
  • Network access – Utilized fixtures instead of live websites
  • Process management – Precluded binding containers together

These constraints did impose usability limits. But the stability, portability and security dividends proved well worth the isolation tradeoffs.

For true JavaScript lockdown though, intersectiong runtime sandboxes took things furthest…

Layer 5: JavaScript Sandboxing with VM2

The final and most restrictive isolation barrier involved sandboxing the Node.js runtime itself. Exposing unvetted code to the full JavaScript API surface area enables countless malicious activities.

JavaScript virtual machines limit the binding surfacearea to globals and functions. Access can be finely controlled by whitelisting:

JavaScript Virtual Machine Sandbox Layers

VM2 provided critical capabilities for whitelisting language primitives, stubbing restricted APIs, preventing code escapes and more:

import { NodeVM } from ‘vm2‘;

const vm = new VM({
  sandbox: {
    // Allow
    console: { log: null },

    // Disallow 
    process: undefined,
    module: undefined,
    require: undefined    
  }
}) 

vm.run(untrustedCode); // Locked down execution!

Here console.log is permitted but module imports are disabled. The sandbox aligns with least privilege principles – exposing only what‘s essential.

Hardening VM configurations took weeks of trial and error though. The many escape hatches baked into JavaScript works against confinement goals. Async callbacks and promise rejections provided rich attack vectors for instance.

Incrementally locking things down without interfering with legitimate use cases demanded methodical testing and vulnerability scanning tools. I ended up authoring custom instrumentation on top of libraries like Esprima to prevent access bypasses.

Managing the tension between usability and security continues being an iterative process. But combining OS and programming language isolation layers now blocks the vast majority of malicious exploits.

Key Takeaways from Shipping Hardened JavaScript Execution Pipelines

While threats remain ever-present requiring constant vigilance, the multi-layered sandboxing foundation powering Skriptable has proved robust enough for public opening.

Over 18 months later, zero breaches have occurred despite hundreds of thousands of scripts executed. Performance overheads stay well within interactive boundaries. Developers continuously invent new ways to leverage the platform.

The most significant learning was contributions requiring curation, not elimination. Alaska-sized sandbox land may seem appealing for its removal of dangers. But the inhospitable environment chills innovation and creativity.

By cultivating a walled garden instead of enacting draconian prohibitions, the JS community can responsibly advance the open source ecosystem. Come, contribute – safely!

For those interested, reach out directly to learn more about Skriptable or schedule a custom security assessment around your sandboxing controls!

Similar Posts