Skip to content

Cloud & Infrastructure · Workflow Orchestration

Temporal for Durable Workflows: How We Finally Stopped Losing Background Jobs

Background jobs that crash mid-execution lose all their state. Temporal solves this by making workflows durable state machines that survive process restarts, deploys, and outages. Here's what it looks like in TypeScript and Python.

Anurag Verma

Anurag Verma

7 min read

Temporal for Durable Workflows: How We Finally Stopped Losing Background Jobs

Sponsored

Share

A client comes to you with a problem: their order fulfillment pipeline occasionally drops orders. The process sends an email, charges the card, creates a shipping label, and updates inventory. If the server crashes between step two and step three, the card is charged but no label is created. Support has to manually sort it out.

The standard fix is a job queue with retry logic. You enqueue tasks, workers pick them up, and if a task fails it gets retried. This works until the failure mode is “process crashed in the middle of an activity” rather than “the activity returned an error.” A worker that dies mid-execution doesn’t leave a retrievable state. The task is gone, or worse, it runs again from the beginning. Now the card is charged twice.

Temporal solves this at the architecture level. Instead of queuing discrete tasks, you write workflows as code. Temporal makes those workflows durable: if the process crashes at any point, the workflow resumes exactly where it left off when the worker comes back.

How Temporal Works

Temporal separates workflows from activities:

Workflows are the orchestration logic. They define the sequence of steps, handle retries, and maintain state. Workflow code must be deterministic: no random numbers, no system time, no I/O.

Activities are the actual side effects. Charging a card, sending an email, calling an API. Activities can do anything. They can fail, and Temporal will retry them according to a policy you define.

The Temporal server persists a full event history of every workflow. When a worker restarts, it replays this history to reconstruct the current state of every in-flight workflow. Your code runs again from the beginning, but Temporal intercepts every “completed activity” step and returns the cached result. The actual side effects don’t repeat.

Workflow: ProcessOrder

├── Activity: ValidateOrder     ← runs once, result cached
├── Activity: ChargeCard         ← runs once, result cached
├── Activity: CreateShipment     ← CRASH HERE
│                                   Worker restarts
│                                   Replay: ValidateOrder → cached
│                                   Replay: ChargeCard → cached
│                                   CreateShipment → actually executes again
├── Activity: UpdateInventory
└── Activity: SendConfirmation

The card is not charged twice. The replay knows ChargeCard already completed.

Getting Started: TypeScript

npm install @temporalio/client @temporalio/worker @temporalio/workflow @temporalio/activity

Run a local Temporal server for development:

npx @temporalio/create@latest temporal-dev
# or with docker
docker run -p 7233:7233 temporalio/auto-setup

Define Activities

Activities are plain async functions. They live in their own file because Temporal sandboxes workflow code separately.

// src/activities.ts
import { ApplicationFailure } from '@temporalio/activity';

export async function validateOrder(orderId: string): Promise<{ valid: boolean; amount: number }> {
  const order = await db.orders.findById(orderId);
  if (!order) {
    // Non-retryable: this order doesn't exist
    throw ApplicationFailure.nonRetryable(`Order ${orderId} not found`);
  }
  return { valid: true, amount: order.total };
}

export async function chargeCard(orderId: string, amount: number): Promise<string> {
  const result = await stripe.paymentIntents.create({
    amount,
    currency: 'usd',
    metadata: { orderId },
  });
  return result.id;
}

export async function createShipment(orderId: string): Promise<string> {
  const label = await shippo.transactions.create({ orderId });
  return label.trackingNumber;
}

export async function updateInventory(orderId: string): Promise<void> {
  await db.orders.updateStatus(orderId, 'shipped');
}

export async function sendConfirmation(orderId: string, trackingNumber: string): Promise<void> {
  await mailer.send({ to: await db.orders.getEmail(orderId), trackingNumber });
}

Define the Workflow

Workflow code must be deterministic. Use Temporal’s proxyActivities to call activities. Temporal intercepts these calls and makes them durable.

// src/workflows.ts
import { proxyActivities, sleep } from '@temporalio/workflow';
import type * as activities from './activities';

const { validateOrder, chargeCard, createShipment, updateInventory, sendConfirmation } =
  proxyActivities<typeof activities>({
    startToCloseTimeout: '30 seconds',
    retry: {
      initialInterval: '1 second',
      backoffCoefficient: 2,
      maximumAttempts: 5,
    },
  });

export async function processOrderWorkflow(orderId: string): Promise<void> {
  const { valid, amount } = await validateOrder(orderId);

  if (!valid) {
    return;
  }

  const paymentIntentId = await chargeCard(orderId, amount);
  const trackingNumber = await createShipment(orderId);
  await updateInventory(orderId);
  await sendConfirmation(orderId, trackingNumber);
}

Start a Worker

// src/worker.ts
import { Worker } from '@temporalio/worker';
import * as activities from './activities';

async function run() {
  const worker = await Worker.create({
    workflowsPath: require.resolve('./workflows'),
    activities,
    taskQueue: 'order-processing',
  });
  await worker.run();
}

run().catch(console.error);

Trigger the Workflow

// src/trigger.ts
import { Client } from '@temporalio/client';
import { processOrderWorkflow } from './workflows';

const client = new Client();

await client.workflow.start(processOrderWorkflow, {
  taskQueue: 'order-processing',
  workflowId: `order-${orderId}`,  // idempotent — same ID won't start twice
  args: [orderId],
});

The workflowId is your idempotency key. Calling start with the same ID while a workflow is running returns the existing execution instead of starting a new one. This means your API handler can safely retry without creating duplicate workflows.

Python SDK

Temporal has a first-class Python SDK that mirrors the TypeScript structure.

# activities.py
from temporalio import activity
from temporalio.exceptions import ApplicationError

@activity.defn
async def validate_order(order_id: str) -> dict:
    order = await db.orders.find(order_id)
    if not order:
        raise ApplicationError(f"Order {order_id} not found", non_retryable=True)
    return {"valid": True, "amount": order.total}

@activity.defn
async def charge_card(order_id: str, amount: int) -> str:
    result = await stripe.create_payment_intent(amount=amount)
    return result["id"]
# workflows.py
from datetime import timedelta
from temporalio import workflow
from temporalio.common import RetryPolicy

with workflow.unsafe.imports_passed_through():
    from activities import validate_order, charge_card, create_shipment

@workflow.defn
class ProcessOrderWorkflow:
    @workflow.run
    async def run(self, order_id: str) -> None:
        retry_policy = RetryPolicy(
            maximum_attempts=5,
            initial_interval=timedelta(seconds=1),
            backoff_coefficient=2.0,
        )
        result = await workflow.execute_activity(
            validate_order,
            order_id,
            start_to_close_timeout=timedelta(seconds=30),
            retry_policy=retry_policy,
        )
        if not result["valid"]:
            return

        await workflow.execute_activity(
            charge_card,
            args=[order_id, result["amount"]],
            start_to_close_timeout=timedelta(seconds=30),
            retry_policy=retry_policy,
        )
        # ... remaining steps

When to Use Temporal vs the Alternatives

The right tool depends on what you’re building.

ScenarioRecommendation
Simple task queue (emails, webhooks)BullMQ (Node) or Celery (Python)
Scheduled jobs, cron-styleNative cron or cloud scheduler
Multi-step processes with retriesTemporal
Long-running workflows (days/weeks)Temporal
Processes that need to pause and waitTemporal
Complex saga patterns (distributed transactions)Temporal

Temporal’s overhead (running a server, learning the SDK, writing deterministic workflow code) is not worth it for simple “retry this HTTP call three times.” It pays off when you have multi-step processes where partial completion causes real problems, or when you need to orchestrate work across multiple services with guarantees about what runs exactly once.

Signals and Queries

Two features change how you think about workflow orchestration.

Signals let external code send events into a running workflow. A human approval step, an external event, a cancellation request:

// In the workflow
import { defineSignal, setHandler } from '@temporalio/workflow';

const approveSignal = defineSignal<[string]>('approve');

export async function requiresApprovalWorkflow(orderId: string): Promise<void> {
  let approved = false;

  setHandler(approveSignal, (approverEmail: string) => {
    approved = true;
  });

  await condition(() => approved, '7 days');  // wait up to 7 days for approval

  if (!approved) {
    await cancelOrder(orderId);
    return;
  }

  await processOrder(orderId);
}
// From anywhere (API handler, admin panel)
await client.workflow.getHandle(`order-${orderId}`).signal(approveSignal, 'manager@company.com');

Queries let you inspect running workflow state without interrupting it. Useful for status APIs that need to show “step 3 of 5, waiting for payment confirmation.”

Production Deployment

Temporal Cloud is the managed option. They run the Temporal server; you just connect workers. For most teams, this is the right call. Self-hosting the Temporal cluster adds operational burden that’s rarely worth it unless you have strict data sovereignty requirements.

Workers are stateless and horizontally scalable. Deploy as many as you need; they’ll pull work from the task queue. One Temporal cluster can serve multiple application environments if you use separate namespaces.

The Temporal web UI (included with both Cloud and self-hosted) shows every workflow execution, its state, history, and any failures. It’s the debugging tool you wish you had for your current job queue setup.

For teams building products where background processes affect money, inventory, or user accounts, Temporal replaces a category of defensive code (status checks, idempotency tables, manual recovery scripts) with a programming model where correctness is the default.

Sponsored

Enjoyed it? Pass it on.

Share this article.

Sponsored

The dispatch

Working notes from
the studio.

A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.

No spam, ever. Unsubscribe anytime.

Discussion

Join the conversation.

Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.

Sponsored