Web Development · Testing
Playwright in 2026: End-to-End Testing That Survives Code Changes
Most E2E test suites break as fast as the features they cover. Here's how to write Playwright tests that hold up when the UI changes, the data changes, and the team keeps shipping.
Anurag Verma
7 min read
Sponsored
The average E2E test suite lives about six weeks before someone starts skipping it. The tests get flaky, the failures are false alarms, nobody trusts them, and eventually they’re running on a separate cron job that nobody checks.
Playwright solved the tooling problem years ago. Its auto-wait, browser context isolation, and network interception are genuinely good. The tests that break are the ones written the wrong way, not because of Playwright’s limitations. Here’s what the wrong way looks like and what to do instead.
Why Most Suites Fail
E2E suites break for three reasons, in order of frequency:
- Fragile selectors. Tests that select by
div:nth-child(3)or#root > div > button.MuiButton-rootbreak whenever the UI is reorganized, even when the feature works fine. - No isolation. Tests that depend on database state left by earlier tests fail unpredictably depending on run order.
- Overly wide scope. One test tries to cover the entire user journey, so it breaks when any step changes.
Playwright can’t fix any of these. Those are authoring decisions.
Select by Role, Not by Structure
The most durable selector strategy is to use the same signals a screen reader uses: ARIA roles, labels, and text content. Playwright’s getByRole, getByLabel, and getByText locators target these.
// Fragile — breaks when class names or DOM order changes
page.locator('.form-container > div:first-child > button');
// Durable — describes what the element IS to the user
page.getByRole('button', { name: 'Submit order' });
page.getByLabel('Email address');
page.getByRole('alert').getByText('Payment failed');
If the button’s accessible name changes, the test should fail (because the feature changed). If the button’s class changes, the test should not fail. The ARIA approach draws that line correctly.
For elements without natural accessible names, add data-testid attributes. These survive CSS class renames, DOM restructuring, and framework migrations:
<div data-testid="order-summary">
<span data-testid="order-total">$42.00</span>
</div>
page.getByTestId('order-summary');
page.getByTestId('order-total');
Set a convention: data-testid attributes are test infrastructure. They don’t get removed without a corresponding test update. Put that in your contributing guide.
Isolation with Fixtures
Tests that share state are a slow-burning problem. They pass locally, fail in CI, and produce different results depending on which tests ran before. The fix is fixtures that set up and tear down their own data.
Playwright’s test.extend lets you define per-test setup that runs before each test and tears down after:
// fixtures.ts
import { test as base } from '@playwright/test';
import { createTestUser, deleteTestUser } from './helpers';
type Fixtures = {
user: { email: string; password: string; id: string };
};
export const test = base.extend<Fixtures>({
user: async ({}, use) => {
const user = await createTestUser({
email: `test-${Date.now()}@example.com`,
password: 'TestPass123!',
});
await use(user);
await deleteTestUser(user.id);
},
});
Each test gets a fresh user that doesn’t exist before the test and is cleaned up after. No state leaks between tests, and tests run in any order.
// auth.spec.ts
import { test } from './fixtures';
import { expect } from '@playwright/test';
test('user can log in', async ({ page, user }) => {
await page.goto('/login');
await page.getByLabel('Email').fill(user.email);
await page.getByLabel('Password').fill(user.password);
await page.getByRole('button', { name: 'Log in' }).click();
await expect(page).toHaveURL('/dashboard');
});
API Setup for Heavy Flows
Driving the UI through every step of a setup flow before each test is slow and fragile. If you need a user with a completed order to test the refund flow, create that order through the API, not through the UI.
// Set up the precondition through the API
test('user can request a refund', async ({ page, request }) => {
// Create order directly via API — no UI clicks needed for setup
const order = await request.post('/api/orders', {
data: { items: [{ sku: 'WIDGET-001', qty: 1 }] },
headers: { Authorization: `Bearer ${testToken}` },
});
const { orderId } = await order.json();
// Now test the UI for the actual scenario under test
await page.goto(`/orders/${orderId}`);
await page.getByRole('button', { name: 'Request refund' }).click();
await page.getByRole('combobox', { name: 'Reason' }).selectOption('Item not received');
await page.getByRole('button', { name: 'Submit request' }).click();
await expect(page.getByRole('alert')).toContainText('Refund requested');
});
This keeps the test focused on the refund UI, not on the order creation UI. If order creation is tested elsewhere, testing it again here is redundant and fragile.
Page Objects, Done Simply
Page objects reduce duplication and keep tests readable. The usual mistake is making them too abstract. A page object should model actions and assertions, not replicate the page’s entire structure.
// pages/CheckoutPage.ts
import { Page, expect } from '@playwright/test';
export class CheckoutPage {
constructor(private page: Page) {}
async fillShipping(address: { street: string; city: string; zip: string }) {
await this.page.getByLabel('Street address').fill(address.street);
await this.page.getByLabel('City').fill(address.city);
await this.page.getByLabel('ZIP code').fill(address.zip);
}
async placeOrder() {
await this.page.getByRole('button', { name: 'Place order' }).click();
}
async expectOrderConfirmation() {
await expect(this.page.getByRole('heading', { name: 'Order confirmed' })).toBeVisible();
}
}
// checkout.spec.ts
test('completes checkout with valid payment', async ({ page }) => {
const checkout = new CheckoutPage(page);
await page.goto('/checkout');
await checkout.fillShipping({ street: '123 Main St', city: 'Portland', zip: '97201' });
await checkout.placeOrder();
await checkout.expectOrderConfirmation();
});
Page objects work best when they wrap the actions a user takes, not when they mirror the component hierarchy. If the UI is reorganized but the checkout flow still works the same way, the page object should survive unchanged.
Handling Flaky Async UI
Playwright auto-waits for most things: elements being visible, navigation completing, network requests triggered by a click settling. But some UI patterns still need explicit handling.
Waiting for a specific network request:
// Wait for the save request to complete before asserting
const [saveResponse] = await Promise.all([
page.waitForResponse(resp => resp.url().includes('/api/profile') && resp.status() === 200),
page.getByRole('button', { name: 'Save changes' }).click(),
]);
expect(saveResponse.status()).toBe(200);
Waiting for an element to disappear (loading state):
await page.getByRole('button', { name: 'Save changes' }).click();
await expect(page.getByTestId('loading-spinner')).not.toBeVisible();
await expect(page.getByRole('alert')).toContainText('Saved');
Polling for delayed UI changes:
// Some UIs animate or update after a delay
await expect(page.getByTestId('order-status')).toHaveText('Processing', { timeout: 10000 });
Avoid page.waitForTimeout. Fixed sleeps cause both slowness and flakiness: too long in some environments, too short in others.
What Belongs in E2E Tests
E2E tests are slow and expensive to maintain. They should cover things that can only be tested end-to-end: full user flows that cross multiple systems, third-party integrations, and the integration between frontend and backend that unit tests can’t reach.
Things that belong in E2E:
- Complete checkout flow from cart to confirmation
- Auth flows: login, logout, password reset, session expiry
- File upload and download
- Notifications sent by webhook triggers
- Payment provider redirects
Things that belong in unit or integration tests instead:
- Validation logic on forms
- API response handling in isolation
- State management in components
- Individual API endpoints
An E2E suite with 200 tests usually means unit tests aren’t covering enough. A suite with 30 focused flows covering the critical paths is more maintainable and more trustworthy.
Running in CI
A few configuration choices that make CI runs reliable:
// playwright.config.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({
testDir: './e2e',
fullyParallel: true,
retries: process.env.CI ? 2 : 0, // retry on CI only, not locally
workers: process.env.CI ? 4 : undefined,
reporter: [['html'], ['github']],
use: {
baseURL: process.env.BASE_URL || 'http://localhost:3000',
trace: 'on-first-retry', // capture trace only when test fails
screenshot: 'only-on-failure',
video: 'retain-on-failure',
},
});
retries: 2 on CI catches genuine flakiness without masking systematic failures. A test that fails three times in a row is probably a real problem. Traces and screenshots on failure are worth enabling; they make debugging CI failures possible without re-running locally.
For the baseline run, use separate test databases and mock any external payment or email providers. The goal is a hermetic run that passes reliably without network access to third-party services.
The Test That’s Worth Keeping
A test worth keeping tells you something unambiguous failed when it fails. If a test fails because a class name changed, it’s generating noise. If it fails because the checkout flow no longer completes, it’s doing its job.
The selectors you write determine which category your tests fall into. Role-based selectors, isolated fixtures, and focused flows aren’t extra work. They’re what separates a test suite people trust from one people disable.
Sponsored
More from this category
More from Web Development
CSS Anchor Positioning: Tooltips and Popovers Without JavaScript
gRPC in 2026: When to Use It Instead of REST or GraphQL
k6 Load Testing: Performance Testing Your APIs Before Users Find the Problems
Sponsored
The dispatch
Working notes from
the studio.
A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.
Discussion
Join the conversation.
Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.
Sponsored