Child Workflows
Spawn child workflows from a parent and poll their progress for batch processing, report generation, and other multi-workflow orchestration scenarios.
Use child workflows when a single workflow needs to orchestrate many independent units of work. Each child runs as its own workflow with a separate event log, retry boundary, and failure scope -- if one child fails, it doesn't take down the parent or siblings.
When to use child workflows
Child workflows are the right choice when:
- Work units are independent. Each child can run without knowing about the others (e.g., processing individual documents, generating separate reports).
- You need isolated failure boundaries. A failing child should not abort unrelated work. The parent decides how to handle failures.
- You want massive fan-out. Spawning 50 or 500 children is practical because each runs on its own infrastructure.
- You need per-item observability. Each child workflow has its own run ID, status, and event log for monitoring.
For simpler cases where steps share a single event log, use direct await composition instead.
Basic pattern: spawn and poll
The core pattern has three parts:
- A step that calls
start()to spawn a child workflow and returns the run ID - A polling loop in the parent workflow that checks child status with
getRun() - A step that retrieves the child's return value once it completes
import { sleep } from "workflow";
import { getRun, start } from "workflow/api";
declare function pollUntilComplete(runIds: string[]): Promise<void>; // @setup
declare function collectResults(runIds: string[]): Promise<Array<{ documentId: string; summary: string }>>; // @setup
// Child workflow -- processes a single document
export async function processDocument(documentId: string) {
"use workflow";
const content = await fetchDocument(documentId);
const analysis = await analyzeContent(content);
const summary = await generateSummary(analysis);
return { documentId, summary };
}
async function fetchDocument(documentId: string): Promise<string> {
"use step";
const res = await fetch(`https://docs.example.com/api/${documentId}`);
return res.text();
}
async function analyzeContent(content: string): Promise<string> {
"use step";
// Call analysis API
return `analysis of ${content.length} chars`;
}
async function generateSummary(analysis: string): Promise<string> {
"use step";
// Generate summary from analysis
return `Summary: ${analysis}`;
}
// Parent workflow -- orchestrates document processing
export async function processDocumentBatch(documentIds: string[]) {
"use workflow";
// Spawn a child workflow for each document
const runIds = await spawnChildren(documentIds);
// Poll until all children complete
await pollUntilComplete(runIds);
// Collect results
const results = await collectResults(runIds);
return { processed: results.length, results };
}
async function spawnChildren(
documentIds: string[]
): Promise<string[]> {
"use step";
const runIds: string[] = [];
for (const docId of documentIds) {
const run = await start(processDocument, [docId]);
runIds.push(run.runId);
}
return runIds;
}Polling loop
The parent workflow polls child statuses in a loop, sleeping between checks. This is durable -- if the parent replays, the sleep and status checks replay from the event log.
import { sleep } from "workflow";
import { getRun } from "workflow/api";
const POLL_INTERVAL = "30s";
const MAX_POLL_ITERATIONS = 120; // 60 minutes at 30s intervals
async function pollUntilComplete(runIds: string[]): Promise<void> {
let iteration = 0;
while (iteration < MAX_POLL_ITERATIONS) {
const status = await checkStatuses(runIds);
if (status.running === 0) {
if (status.failed > 0) {
throw new Error(
`${status.failed} of ${runIds.length} children failed`
);
}
return; // All completed successfully
}
iteration += 1;
await sleep(POLL_INTERVAL);
}
throw new Error("Timed out waiting for children to complete");
}
async function checkStatuses(
runIds: string[]
): Promise<{ running: number; completed: number; failed: number }> {
"use step";
let running = 0;
let completed = 0;
let failed = 0;
for (const runId of runIds) {
const run = getRun(runId);
const status = await run.status;
if (status === "completed") completed += 1;
else if (status === "failed" || status === "cancelled") failed += 1;
else running += 1; // queued, starting, running
}
return { running, completed, failed };
}
async function collectResults(
runIds: string[]
): Promise<Array<{ documentId: string; summary: string }>> {
"use step";
const results = [];
for (const runId of runIds) {
const run = getRun(runId);
const value = await run.returnValue;
results.push(value as { documentId: string; summary: string });
}
return results;
}Fan-out pattern: chunked spawning
When spawning hundreds of children, batch the start() calls to avoid overwhelming the system. Use multiple spawn steps, each launching a chunk of children.
import { start } from "workflow/api";
declare function pollUntilComplete(runIds: string[]): Promise<void>; // @setup
const CHUNK_SIZE = 10;
export async function largeReportBatch(reportConfigs: Array<{ id: string; query: string }>) {
"use workflow";
// Spawn children in chunks
const allRunIds: string[] = [];
for (let i = 0; i < reportConfigs.length; i += CHUNK_SIZE) {
const chunk = reportConfigs.slice(i, i + CHUNK_SIZE);
const runIds = await spawnReportChunk(chunk);
allRunIds.push(...runIds);
}
// Poll until all complete
await pollUntilComplete(allRunIds);
const results = await collectReportResults(allRunIds);
return { total: results.length, results };
}
async function spawnReportChunk(
configs: Array<{ id: string; query: string }>
): Promise<string[]> {
"use step";
const runIds: string[] = [];
for (const config of configs) {
const run = await start(generateReport, [config.id, config.query]);
runIds.push(run.runId);
}
return runIds;
}
async function generateReport(reportId: string, query: string) {
"use workflow";
const data = await queryDatabase(reportId, query);
const formatted = await formatReport(reportId, data);
return { reportId, formatted };
}
declare function queryDatabase(reportId: string, query: string): Promise<string>; // @setup
declare function formatReport(reportId: string, data: string): Promise<string>; // @setup
declare function collectReportResults(
runIds: string[]
): Promise<Array<{ reportId: string; formatted: string }>>; // @setupError handling
Tolerating partial failures
Not every batch requires 100% success. Use allowFailures logic to let the parent continue when some children fail, while still surfacing the failures.
import { sleep } from "workflow";
import { getRun } from "workflow/api";
const POLL_INTERVAL = "30s";
const MAX_POLL_ITERATIONS = 120;
async function pollWithPartialFailures(
runIds: string[],
maxFailureRate: number
): Promise<{ completed: string[]; failed: string[] }> {
let iteration = 0;
const completedIds: string[] = [];
const failedIds: string[] = [];
while (iteration < MAX_POLL_ITERATIONS) {
const status = await checkDetailedStatuses(runIds);
completedIds.length = 0;
failedIds.length = 0;
for (const entry of status) {
if (entry.status === "completed") completedIds.push(entry.runId);
else if (entry.status === "failed" || entry.status === "cancelled")
failedIds.push(entry.runId);
}
const active = runIds.length - completedIds.length - failedIds.length;
// Check if failure rate exceeds threshold
const failureRate = failedIds.length / Math.max(1, runIds.length);
if (failureRate > maxFailureRate) {
throw new Error(
`Failure rate ${(failureRate * 100).toFixed(1)}% exceeds ` +
`threshold of ${(maxFailureRate * 100).toFixed(1)}%`
);
}
if (active === 0) {
return { completed: completedIds, failed: failedIds };
}
iteration += 1;
await sleep(POLL_INTERVAL);
}
throw new Error("Timed out waiting for children");
}
async function checkDetailedStatuses(
runIds: string[]
): Promise<Array<{ runId: string; status: string }>> {
"use step";
const statuses = [];
for (const runId of runIds) {
const run = getRun(runId);
const status = await run.status;
statuses.push({ runId, status });
}
return statuses;
}Retrying failed children
When a child fails, the parent can spawn a replacement and continue polling. Track restart counts to prevent infinite retry loops.
import { sleep } from "workflow";
declare function checkDetailedStatuses(runIds: string[]): Promise<Array<{ runId: string; status: string }>>; // @setup
const POLL_INTERVAL = "30s";
const MAX_POLL_ITERATIONS = 120;
async function pollWithRetries(
initialRunIds: string[],
maxRestartsPerChild: number,
spawnReplacement: (index: number) => Promise<string>
): Promise<void> {
const activeRuns = new Map<number, string>();
const restartCounts = new Map<number, number>();
initialRunIds.forEach((runId, index) => activeRuns.set(index, runId));
let iteration = 0;
while (iteration < MAX_POLL_ITERATIONS) {
const statuses = await checkDetailedStatuses(
Array.from(activeRuns.values())
);
const statusByRunId = new Map(
statuses.map((s) => [s.runId, s.status])
);
for (const [index, runId] of activeRuns.entries()) {
const status = statusByRunId.get(runId) ?? "running";
if (status === "completed") {
activeRuns.delete(index);
continue;
}
if (status === "failed" || status === "cancelled") {
const restarts = (restartCounts.get(index) ?? 0) + 1;
restartCounts.set(index, restarts);
if (restarts > maxRestartsPerChild) {
throw new Error(
`Child ${index} exceeded restart limit (${maxRestartsPerChild})`
);
}
const newRunId = await spawnReplacement(index);
activeRuns.set(index, newRunId);
}
}
if (activeRuns.size === 0) return;
iteration += 1;
await sleep(POLL_INTERVAL);
}
throw new Error("Timed out waiting for children");
}Tips
start()must be called from a step, not directly from a workflow function. Wrap it in a"use step"function.getRun()must also be called from a step. The polling loop lives in the workflow, but the actual status check is a step.- Set a max iteration count on polling loops to prevent runaway workflows. Calculate the count from your expected max duration and poll interval.
- Use chunked spawning for large batches. Spawning 500 children in a single step can time out. Break it into chunks of 10-50.
- Each child has its own retry semantics. Steps inside child workflows retry independently. The parent only sees the child's final status.
- Use
deploymentId: "latest"if children should run on the most recent deployment. See thestart()API reference for compatibility considerations.
Key APIs
start()-- spawn a new workflow run and get its run IDgetRun()-- retrieve a workflow run's status and return valuesleep()-- durably pause between polling iterations"use workflow"-- marks the orchestrator function"use step"-- marks functions with full Node.js access