Playwright E2E Testing in 2026: The Setup That Actually Scales

Three years ago, the typical E2E test suite at a web agency ran on Cypress, took 20 minutes in CI, flaked constantly on login tests, and was the thing developers turned off before a deploy “just this once.” Playwright changed the calculus. Not because it’s magic, but because it fixed the specific things that made E2E testing miserable.

This is a practical setup guide, not a comparison of why Playwright beats Cypress. That argument is settled. The real question is how to configure Playwright so it doesn’t accumulate technical debt the way the old suites did.

What Playwright Gets Right by Default

When you install Playwright, three things work out of the box that took real effort to configure in older tools.

Auto-waiting. Playwright waits for elements to be actionable before interacting with them. It doesn’t click a button and hope the animation finished. It checks that the element is visible, enabled, not obscured, and not in a transitioning state before acting. This eliminates an entire category of flaky tests without any waitForSelector boilerplate.

Browser contexts. Each test can get a fresh browser context with isolated cookies, localStorage, and session state. This runs in the same browser process, making it fast, but gives you clean state between tests without spawning new browser instances.

Parallelism from day one. Playwright runs tests in parallel by default, across workers and across projects (browsers). You don’t opt into this. You opt out if you need to.

The Folder Structure That Scales

The structure Microsoft recommends in their docs gets you started but falls apart around 50 tests. This one scales:

tests/
  e2e/
    auth/
      login.spec.ts
      logout.spec.ts
      password-reset.spec.ts
    checkout/
      cart.spec.ts
      payment.spec.ts
    fixtures/
      auth.fixture.ts
      db.fixture.ts
  playwright.config.ts

The fixtures/ folder is where the real scaling happens. Playwright’s fixture system lets you define reusable setup logic (authenticated browser contexts, seeded database states, stubbed API responses) and compose them into tests without repetitive beforeEach blocks.

Authentication: Do It Once, Reuse It Everywhere

The single biggest performance win in any Playwright suite is handling authentication correctly. Most teams re-login in every test. That’s slow, fragile, and unnecessary.

Playwright’s storageState feature lets you authenticate once, save the session to a file, and reuse it across tests.

// tests/e2e/fixtures/auth.fixture.ts
import { test as base } from '@playwright/test'

type AuthFixtures = {
  authenticatedPage: Page
}

export const test = base.extend<AuthFixtures>({
  authenticatedPage: async ({ browser }, use) => {
    const context = await browser.newContext({
      storageState: 'playwright/.auth/user.json',
    })
    const page = await context.newPage()
    await use(page)
    await context.close()
  },
})

The auth state file is generated once in a global setup script:

// playwright/global-setup.ts
import { chromium } from '@playwright/test'

async function globalSetup() {
  const browser = await chromium.launch()
  const page = await browser.newPage()

  await page.goto('http://localhost:3000/login')
  await page.fill('[data-testid="email"]', process.env.TEST_USER_EMAIL!)
  await page.fill('[data-testid="password"]', process.env.TEST_USER_PASSWORD!)
  await page.click('[data-testid="submit"]')
  await page.waitForURL('**/dashboard')

  await page.context().storageState({ path: 'playwright/.auth/user.json' })
  await browser.close()
}

export default globalSetup

Add globalSetup to playwright.config.ts and every test using the authenticatedPage fixture skips the login flow entirely. On a suite with 100 tests, this typically cuts runtime by 40%.

Config That Won’t Bite You Later

// playwright.config.ts
import { defineConfig, devices } from '@playwright/test'

export default defineConfig({
  testDir: './tests/e2e',
  timeout: 30_000,
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 4 : undefined,
  reporter: process.env.CI
    ? [['github'], ['html', { outputFolder: 'playwright-report' }]]
    : 'list',
  use: {
    baseURL: process.env.BASE_URL ?? 'http://localhost:3000',
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
    { name: 'webkit', use: { ...devices['Desktop Safari'] } },
  ],
  webServer: {
    command: 'npm run dev',
    url: 'http://localhost:3000',
    reuseExistingServer: !process.env.CI,
  },
})

Three things here worth explaining:

retries: 2 in CI only. Local retries hide flakiness. In CI, a retry with trace capture lets you diagnose the failure without a full re-run. Don’t set retries locally unless you’re debugging a specific timing issue.

trace: 'on-first-retry' captures a full Playwright Trace Viewer file (network requests, DOM snapshots, console logs, screenshots at each step) but only when a test is retried, meaning something probably went wrong. Tracing every test would generate gigabytes of files.

webServer starts your dev server automatically when running tests. The reuseExistingServer option means local runs use whatever server is already running on port 3000 (good for iteration speed), while CI always starts fresh.

Writing Tests That Don’t Rot

The most common reason E2E test suites become unmaintainable: tests depend on CSS classes and DOM structure. When you redesign the UI, 60 tests break.

The fix is test IDs. Add data-testid attributes to interactive elements and query by those instead:

// Bad: depends on class names and structure
await page.click('.checkout-button.btn-primary')

// Good: independent of styling
await page.click('[data-testid="checkout-submit"]')

For text that users actually see and interact with, getByRole and getByText are the right Playwright locators:

await page.getByRole('button', { name: 'Complete purchase' }).click()
await page.getByLabel('Email address').fill('user@example.com')
await expect(page.getByRole('heading', { name: 'Order confirmed' })).toBeVisible()

getByRole tests the accessibility tree, not the DOM. If the button is visible and labeled correctly, the test passes. If a developer changes a <div onClick> to a proper <button>, the test still passes. If they remove the accessible name, the test catches it.

API Mocking for Stable Tests

Tests that call real external APIs flake when the API is slow, returns different data, or changes its shape. Playwright’s page.route() intercepts requests:

test('shows error when payment fails', async ({ page }) => {
  await page.route('**/api/checkout', async (route) => {
    await route.fulfill({
      status: 400,
      contentType: 'application/json',
      body: JSON.stringify({ error: 'Card declined' }),
    })
  })

  await page.goto('/checkout')
  await page.getByRole('button', { name: 'Pay now' }).click()
  await expect(page.getByText('Card declined')).toBeVisible()
})

Mocking the payment error path is exactly the kind of test that’s hard to write with a real API and easy with route interception. The alternative is setting up a test Stripe account with a specific card number, which is brittle and slow.

CI Integration

Playwright works well in GitHub Actions with minimal config:

# .github/workflows/e2e.yml
name: E2E Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npx playwright install --with-deps chromium webkit
      - run: npm run test:e2e
        env:
          CI: true
          BASE_URL: http://localhost:3000
          TEST_USER_EMAIL: ${{ secrets.TEST_USER_EMAIL }}
          TEST_USER_PASSWORD: ${{ secrets.TEST_USER_PASSWORD }}
      - uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: playwright-report
          path: playwright-report/
          retention-days: 7

Two things to notice: only install the browsers you test against (chromium webkit), not all three. Installing Firefox adds ~200MB and 30+ seconds to every CI run if you’re not testing on it. Upload the report only on failure. The HTML report has everything you need to diagnose a broken test.

The Maintenance Problem That Catches Teams Off Guard

Playwright tests are cheaper to write than Cypress tests. They’re not free to maintain. A suite of 200 tests requires ongoing work as the application changes.

The investment that pays back fastest: component-level testing with Playwright component testing (for React/Vue/Svelte). It sits between unit tests and full E2E tests, with a real browser and real component rendering, but isolated from the full application stack. Component tests are faster and more specific than E2E tests and catch UI regressions earlier.

The split that works at agencies: E2E tests cover critical user journeys (signup, checkout, core workflows). Component tests cover UI state. Unit tests cover logic. Run E2E on every PR, run component tests on every push, run unit tests on every commit.

What “Good Coverage” Looks Like

There is no right number of E2E tests. There is a right set of tests.

Every path a paying user takes to complete a core action should have at least one E2E test. If a user can sign up, activate their account, and make a purchase, those three flows need tests. Everything else is secondary.

Error paths (failed payments, invalid inputs, rate limiting) matter more than most teams realize. Happy-path tests pass even when error handling is completely broken. Writing one test per error state you care about takes 30 minutes and catches a disproportionate number of production bugs.

Start there. Add tests when bugs slip through to production. The suite that’s useful at 50 tests is more valuable than the one that tried to cover everything at 500.

Playwright E2E Testing in 2026: The Setup That Actually Scales

What Playwright Gets Right by Default

The Folder Structure That Scales

Authentication: Do It Once, Reuse It Everywhere

Config That Won’t Bite You Later

Writing Tests That Don’t Rot

API Mocking for Stable Tests

CI Integration

The Maintenance Problem That Catches Teams Off Guard

What “Good Coverage” Looks Like

OpenTelemetry for Web Apps in 2026: What to Instrument and What to Skip

Scope Creep Is a Process Problem: How Agencies Protect Projects Without Burning Clients

More from Web Development

Go for Web APIs in 2026: An Honest Assessment

Modern Web Development Best Practices

Adding Search to Your SaaS: Typesense vs Meilisearch vs Algolia in 2026

Working notes from
the studio.

Join the conversation.

What Playwright Gets Right by Default

The Folder Structure That Scales

Authentication: Do It Once, Reuse It Everywhere

Config That Won’t Bite You Later

Writing Tests That Don’t Rot

API Mocking for Stable Tests

CI Integration

The Maintenance Problem That Catches Teams Off Guard

What “Good Coverage” Looks Like

OpenTelemetry for Web Apps in 2026: What to Instrument and What to Skip

Scope Creep Is a Process Problem: How Agencies Protect Projects Without Burning Clients

More from Web Development

Go for Web APIs in 2026: An Honest Assessment

Modern Web Development Best Practices

Adding Search to Your SaaS: Typesense vs Meilisearch vs Algolia in 2026

Working notes fromthe studio.

Join the conversation.

Working notes from
the studio.