Playwright in 2026: End-to-End Testing That Survives Code Changes

The average E2E test suite lives about six weeks before someone starts skipping it. The tests get flaky, the failures are false alarms, nobody trusts them, and eventually they’re running on a separate cron job that nobody checks.

Playwright solved the tooling problem years ago. Its auto-wait, browser context isolation, and network interception are genuinely good. The tests that break are the ones written the wrong way, not because of Playwright’s limitations. Here’s what the wrong way looks like and what to do instead.

Why Most Suites Fail

E2E suites break for three reasons, in order of frequency:

Fragile selectors. Tests that select by div:nth-child(3) or #root > div > button.MuiButton-root break whenever the UI is reorganized, even when the feature works fine.
No isolation. Tests that depend on database state left by earlier tests fail unpredictably depending on run order.
Overly wide scope. One test tries to cover the entire user journey, so it breaks when any step changes.

Playwright can’t fix any of these. Those are authoring decisions.

Select by Role, Not by Structure

The most durable selector strategy is to use the same signals a screen reader uses: ARIA roles, labels, and text content. Playwright’s getByRole, getByLabel, and getByText locators target these.

// Fragile — breaks when class names or DOM order changes
page.locator('.form-container > div:first-child > button');

// Durable — describes what the element IS to the user
page.getByRole('button', { name: 'Submit order' });
page.getByLabel('Email address');
page.getByRole('alert').getByText('Payment failed');

If the button’s accessible name changes, the test should fail (because the feature changed). If the button’s class changes, the test should not fail. The ARIA approach draws that line correctly.

For elements without natural accessible names, add data-testid attributes. These survive CSS class renames, DOM restructuring, and framework migrations:

<div data-testid="order-summary">
  <span data-testid="order-total">$42.00</span>
</div>

page.getByTestId('order-summary');
page.getByTestId('order-total');

Set a convention: data-testid attributes are test infrastructure. They don’t get removed without a corresponding test update. Put that in your contributing guide.

Isolation with Fixtures

Tests that share state are a slow-burning problem. They pass locally, fail in CI, and produce different results depending on which tests ran before. The fix is fixtures that set up and tear down their own data.

Playwright’s test.extend lets you define per-test setup that runs before each test and tears down after:

// fixtures.ts
import { test as base } from '@playwright/test';
import { createTestUser, deleteTestUser } from './helpers';

type Fixtures = {
  user: { email: string; password: string; id: string };
};

export const test = base.extend<Fixtures>({
  user: async ({}, use) => {
    const user = await createTestUser({
      email: `test-${Date.now()}@example.com`,
      password: 'TestPass123!',
    });
    await use(user);
    await deleteTestUser(user.id);
  },
});

Each test gets a fresh user that doesn’t exist before the test and is cleaned up after. No state leaks between tests, and tests run in any order.

// auth.spec.ts
import { test } from './fixtures';
import { expect } from '@playwright/test';

test('user can log in', async ({ page, user }) => {
  await page.goto('/login');
  await page.getByLabel('Email').fill(user.email);
  await page.getByLabel('Password').fill(user.password);
  await page.getByRole('button', { name: 'Log in' }).click();
  await expect(page).toHaveURL('/dashboard');
});

API Setup for Heavy Flows

Driving the UI through every step of a setup flow before each test is slow and fragile. If you need a user with a completed order to test the refund flow, create that order through the API, not through the UI.

// Set up the precondition through the API
test('user can request a refund', async ({ page, request }) => {
  // Create order directly via API — no UI clicks needed for setup
  const order = await request.post('/api/orders', {
    data: { items: [{ sku: 'WIDGET-001', qty: 1 }] },
    headers: { Authorization: `Bearer ${testToken}` },
  });
  const { orderId } = await order.json();

  // Now test the UI for the actual scenario under test
  await page.goto(`/orders/${orderId}`);
  await page.getByRole('button', { name: 'Request refund' }).click();
  await page.getByRole('combobox', { name: 'Reason' }).selectOption('Item not received');
  await page.getByRole('button', { name: 'Submit request' }).click();
  await expect(page.getByRole('alert')).toContainText('Refund requested');
});

This keeps the test focused on the refund UI, not on the order creation UI. If order creation is tested elsewhere, testing it again here is redundant and fragile.

Page Objects, Done Simply

Page objects reduce duplication and keep tests readable. The usual mistake is making them too abstract. A page object should model actions and assertions, not replicate the page’s entire structure.

// pages/CheckoutPage.ts
import { Page, expect } from '@playwright/test';

export class CheckoutPage {
  constructor(private page: Page) {}

  async fillShipping(address: { street: string; city: string; zip: string }) {
    await this.page.getByLabel('Street address').fill(address.street);
    await this.page.getByLabel('City').fill(address.city);
    await this.page.getByLabel('ZIP code').fill(address.zip);
  }

  async placeOrder() {
    await this.page.getByRole('button', { name: 'Place order' }).click();
  }

  async expectOrderConfirmation() {
    await expect(this.page.getByRole('heading', { name: 'Order confirmed' })).toBeVisible();
  }
}

// checkout.spec.ts
test('completes checkout with valid payment', async ({ page }) => {
  const checkout = new CheckoutPage(page);
  await page.goto('/checkout');
  await checkout.fillShipping({ street: '123 Main St', city: 'Portland', zip: '97201' });
  await checkout.placeOrder();
  await checkout.expectOrderConfirmation();
});

Page objects work best when they wrap the actions a user takes, not when they mirror the component hierarchy. If the UI is reorganized but the checkout flow still works the same way, the page object should survive unchanged.

Handling Flaky Async UI

Playwright auto-waits for most things: elements being visible, navigation completing, network requests triggered by a click settling. But some UI patterns still need explicit handling.

Waiting for a specific network request:

// Wait for the save request to complete before asserting
const [saveResponse] = await Promise.all([
  page.waitForResponse(resp => resp.url().includes('/api/profile') && resp.status() === 200),
  page.getByRole('button', { name: 'Save changes' }).click(),
]);
expect(saveResponse.status()).toBe(200);

Waiting for an element to disappear (loading state):

await page.getByRole('button', { name: 'Save changes' }).click();
await expect(page.getByTestId('loading-spinner')).not.toBeVisible();
await expect(page.getByRole('alert')).toContainText('Saved');

Polling for delayed UI changes:

// Some UIs animate or update after a delay
await expect(page.getByTestId('order-status')).toHaveText('Processing', { timeout: 10000 });

Avoid page.waitForTimeout. Fixed sleeps cause both slowness and flakiness: too long in some environments, too short in others.

What Belongs in E2E Tests

E2E tests are slow and expensive to maintain. They should cover things that can only be tested end-to-end: full user flows that cross multiple systems, third-party integrations, and the integration between frontend and backend that unit tests can’t reach.

Things that belong in E2E:

Complete checkout flow from cart to confirmation
Auth flows: login, logout, password reset, session expiry
File upload and download
Notifications sent by webhook triggers
Payment provider redirects

Things that belong in unit or integration tests instead:

Validation logic on forms
API response handling in isolation
State management in components
Individual API endpoints

An E2E suite with 200 tests usually means unit tests aren’t covering enough. A suite with 30 focused flows covering the critical paths is more maintainable and more trustworthy.

Running in CI

A few configuration choices that make CI runs reliable:

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  testDir: './e2e',
  fullyParallel: true,
  retries: process.env.CI ? 2 : 0,   // retry on CI only, not locally
  workers: process.env.CI ? 4 : undefined,
  reporter: [['html'], ['github']],
  use: {
    baseURL: process.env.BASE_URL || 'http://localhost:3000',
    trace: 'on-first-retry',  // capture trace only when test fails
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },
});

retries: 2 on CI catches genuine flakiness without masking systematic failures. A test that fails three times in a row is probably a real problem. Traces and screenshots on failure are worth enabling; they make debugging CI failures possible without re-running locally.

For the baseline run, use separate test databases and mock any external payment or email providers. The goal is a hermetic run that passes reliably without network access to third-party services.

The Test That’s Worth Keeping

A test worth keeping tells you something unambiguous failed when it fails. If a test fails because a class name changed, it’s generating noise. If it fails because the checkout flow no longer completes, it’s doing its job.

The selectors you write determine which category your tests fall into. Role-based selectors, isolated fixtures, and focused flows aren’t extra work. They’re what separates a test suite people trust from one people disable.

Playwright in 2026: End-to-End Testing That Survives Code Changes

Why Most Suites Fail

Select by Role, Not by Structure

Isolation with Fixtures

API Setup for Heavy Flows

Page Objects, Done Simply

Handling Flaky Async UI

What Belongs in E2E Tests

Running in CI

The Test That’s Worth Keeping

Passkeys Are Ready: Implementing Passwordless Auth in Your Web App

How to Write a Technical Proposal That Wins Agency Work

More from Web Development

CSS Anchor Positioning: Tooltips and Popovers Without JavaScript

gRPC in 2026: When to Use It Instead of REST or GraphQL

k6 Load Testing: Performance Testing Your APIs Before Users Find the Problems

Working notes from
the studio.

Join the conversation.

Why Most Suites Fail

Select by Role, Not by Structure

Isolation with Fixtures

API Setup for Heavy Flows

Page Objects, Done Simply

Handling Flaky Async UI

What Belongs in E2E Tests

Running in CI

The Test That’s Worth Keeping

Passkeys Are Ready: Implementing Passwordless Auth in Your Web App

How to Write a Technical Proposal That Wins Agency Work

More from Web Development

CSS Anchor Positioning: Tooltips and Popovers Without JavaScript

gRPC in 2026: When to Use It Instead of REST or GraphQL

k6 Load Testing: Performance Testing Your APIs Before Users Find the Problems

Working notes fromthe studio.

Join the conversation.

Working notes from
the studio.