Skip to content

Writing Tests

Testing is central to Eigenvue’s reliability. Every algorithm has both TypeScript and Python generators, and both must produce identical output. This page explains every layer of the testing strategy.


Test fixtures are golden JSON files that define the expected step output for a specific set of inputs. They are the single source of truth that both TypeScript and Python tests validate against.

Fixtures live alongside the TypeScript generator they test:

algorithms/classical/binary-search/tests/
found.fixture.json
not-found.fixture.json
single-element.fixture.json
edge-left.fixture.json
edge-right.fixture.json
generator.test.ts
{
"description": "Human-readable description of this test case.",
"inputs": {
"array": [1, 3, 5, 7, 9, 11, 13, 15, 17, 19],
"target": 13
},
"expected": {
"stepCount": 9,
"terminalStepId": "found",
"result": 6,
"stepIds": [
"initialize",
"calculate_mid",
"search_right",
"calculate_mid",
"search_left",
"calculate_mid",
"search_right",
"calculate_mid",
"found"
],
"keyStates": [
{ "stepIndex": 0, "field": "left", "value": 0 },
{ "stepIndex": 0, "field": "right", "value": 9 },
{ "stepIndex": 1, "field": "mid", "value": 4 },
{ "stepIndex": 8, "field": "result", "value": 6 }
]
}
}
FieldTypeDescription
descriptionstringWhat this test case verifies
inputsobjectAlgorithm input parameters
expected.stepCountnumberTotal number of steps
expected.terminalStepIdstringThe id of the final (terminal) step
expected.resultanyThe result field in the terminal step’s state
expected.stepIdsstring[]Ordered list of step IDs
expected.keyStatesarraySpot-checks on specific state fields
  1. Cover the core paths. At minimum: success case, failure case, edge cases.
  2. Keep arrays small. 5—10 elements. Easy to compute expected values by hand.
  3. Use surgical state checks. Only assert the fields that matter for the test. Writing out the entire expected state of every step is brittle and hard to maintain.
  4. Name files descriptively. found.fixture.json, not-found.fixture.json, single-element.fixture.json, edge-left.fixture.json.

algorithms/<category>/<algorithm>/tests/generator.test.ts

Every generator test file follows the same pattern:

import { describe, it, expect } from "vitest";
import { runGenerator } from "@/engine/generator";
import myGenerator from "../generator";
// Import fixtures
import foundFixture from "./found.fixture.json";
import notFoundFixture from "./not-found.fixture.json";
// ── Helper ────────────────────────────────────────────────────────────────
function assertFixture(fixture: {
description: string;
inputs: Record<string, unknown>;
expected: {
stepCount: number;
terminalStepId: string;
result: unknown;
stepIds: string[];
keyStates: Array<{ stepIndex: number; field: string; value: unknown }>;
};
}) {
const result = runGenerator(myGenerator, fixture.inputs);
const { steps } = result;
expect(steps.length).toBe(fixture.expected.stepCount);
const terminal = steps[steps.length - 1]!;
expect(terminal.id).toBe(fixture.expected.terminalStepId);
expect(terminal.isTerminal).toBe(true);
expect(terminal.state.result).toBe(fixture.expected.result);
expect(steps.map((s) => s.id)).toEqual(fixture.expected.stepIds);
for (const check of fixture.expected.keyStates) {
expect(steps[check.stepIndex]!.state[check.field]).toEqual(check.value);
}
}

Every generator test file should include these four categories:

Validate against golden fixtures:

describe("Fixture Tests", () => {
it("found: default example", () => assertFixture(foundFixture));
it("not found: target absent", () => assertFixture(notFoundFixture));
});

Verify structural invariants of the step format:

describe("Format Compliance", () => {
const result = runGenerator(myGenerator, defaultInputs);
it("produces a valid StepSequence", () => {
expect(result.formatVersion).toBe(1);
expect(result.algorithmId).toBe("my-algorithm");
expect(result.generatedBy).toBe("typescript");
});
it("all step indices are contiguous", () => {
result.steps.forEach((step, i) => expect(step.index).toBe(i));
});
it("exactly one terminal step, it is the last", () => {
const terminals = result.steps.filter((s) => s.isTerminal);
expect(terminals.length).toBe(1);
expect(terminals[0]!.index).toBe(result.steps.length - 1);
});
it("all step IDs match pattern", () => {
for (const step of result.steps) {
expect(step.id).toMatch(/^[a-z0-9][a-z0-9_-]*$/);
}
});
it("all code highlight lines are positive integers", () => {
for (const step of result.steps) {
for (const line of step.codeHighlight.lines) {
expect(Number.isInteger(line)).toBe(true);
expect(line).toBeGreaterThanOrEqual(1);
}
}
});
});

Verify that state snapshots are independent:

describe("State Immutability", () => {
it("step states are independent snapshots", () => {
const result = runGenerator(myGenerator, someInputs);
const arrays = result.steps.map((s) => s.state.array as number[]);
// All arrays should contain the same values.
for (const arr of arrays) {
expect(arr).toEqual(someInputs.array);
}
// But they should NOT be the same reference.
for (let i = 0; i < arrays.length - 1; i++) {
expect(arrays[i]).not.toBe(arrays[i + 1]);
}
});
it("mutating one step's state does not affect others", () => {
const result = runGenerator(myGenerator, someInputs);
const step0Array = result.steps[0]!.state.array as number[];
step0Array[0] = 9999;
expect((result.steps[1]!.state.array as number[])[0]).not.toBe(9999);
});
});

Cover boundary conditions specific to your algorithm:

describe("Edge Cases", () => {
it("single element, target found", () => {
const result = runGenerator(myGenerator, { array: [42], target: 42 });
const terminal = result.steps[result.steps.length - 1]!;
expect(terminal.state.result).toBe(0);
});
it("single element, target not found", () => {
const result = runGenerator(myGenerator, { array: [42], target: 99 });
const terminal = result.steps[result.steps.length - 1]!;
expect(terminal.state.result).toBe(-1);
});
});
Terminal window
cd web
# Run all tests
npm run test
# Run tests for a specific algorithm
npx vitest run algorithms/classical/binary-search/tests/
# Watch mode (re-runs on file changes)
npx vitest watch algorithms/classical/binary-search/tests/
# With coverage report
npx vitest run --coverage

python/tests/generators/classical/test_binary_search.py
"""Binary Search Generator — Python Tests."""
import pytest
from eigenvue.runner import run_generator
class TestBinarySearchFound:
"""Test cases where the target is found."""
def test_default_example(self) -> None:
steps = run_generator("binary-search", {
"array": [1, 3, 5, 7, 9, 11, 13, 15, 17, 19],
"target": 13,
})
assert len(steps) == 9
assert steps[-1]["id"] == "found"
assert steps[-1]["isTerminal"] is True
assert steps[-1]["state"]["result"] == 6
def test_single_element(self) -> None:
steps = run_generator("binary-search", {
"array": [42],
"target": 42,
})
assert steps[-1]["id"] == "found"
assert steps[-1]["state"]["result"] == 0
class TestBinarySearchNotFound:
"""Test cases where the target is not found."""
def test_target_absent(self) -> None:
steps = run_generator("binary-search", {
"array": [2, 4, 6, 8, 10],
"target": 5,
})
assert steps[-1]["id"] == "not_found"
assert steps[-1]["state"]["result"] == -1
class TestBinarySearchInvariants:
"""Test structural invariants of the step sequence."""
def test_index_contiguity(self) -> None:
steps = run_generator("binary-search", {
"array": [1, 3, 5, 7, 9],
"target": 5,
})
for i, step in enumerate(steps):
assert step["index"] == i
def test_single_terminal(self) -> None:
steps = run_generator("binary-search")
terminals = [s for s in steps if s.get("isTerminal", False)]
assert len(terminals) == 1
assert terminals[0] is steps[-1]
Terminal window
cd python
# Run all tests
pytest
# Run tests for a specific module
pytest tests/generators/classical/test_binary_search.py
# Verbose output
pytest -v
# Stop on first failure
pytest -x
# With coverage
pytest --cov=eigenvue

The parity test verifies that Python and TypeScript produce identical JSON output. This is the most important test in the project.

The script scripts/verify-step-parity.py:

  1. Loads each golden fixture (already in camelCase JSON format).
  2. Deserializes it into Python Step dataclasses via from_dict().
  3. Re-serializes it back to camelCase dicts via to_dict().
  4. Compares the re-serialized output to the original JSON byte-for-byte.

If the round-trip is clean, the Python dataclasses are proven wire-compatible with the TypeScript interfaces.

Terminal window
python scripts/verify-step-parity.py

Expected output:

============================================================
Cross-Language Step Parity Verification
============================================================
Verifying: binary-search-found.fixture.json
PASS
Verifying: binary-search-not-found.fixture.json
PASS
Verifying: single-element.fixture.json
PASS
Verifying: minimal-valid.fixture.json
PASS
============================================================
ALL FIXTURES PASSED
============================================================

See Cross-Language Parity for debugging strategies for common parity failures.


Layout tests verify that layout functions produce correct primitives for known step data.

web/src/engine/layouts/__tests__/
import { describe, it, expect } from "vitest";
describe("array-with-pointers layout", () => {
it("produces N elements for an N-element array", () => {
// Create a mock step with a known array
// Call the layout function
// Assert the number of element primitives
});
it("all primitive IDs are unique", () => {
// IDs must be unique for animation diffing
const ids = scene.primitives.map((p) => p.id);
expect(new Set(ids).size).toBe(ids.length);
});
it("elements fit within canvas bounds", () => {
for (const p of scene.primitives) {
if (p.kind === "element") {
expect(p.x).toBeGreaterThanOrEqual(0);
expect(p.x).toBeLessThanOrEqual(canvasSize.width);
expect(p.y).toBeGreaterThanOrEqual(0);
expect(p.y).toBeLessThanOrEqual(canvasSize.height);
}
}
});
it("handles empty array gracefully", () => {
// Should return an empty primitives array, not throw
});
it("processes highlightElement visual action", () => {
// Create a step with a highlightElement action
// Verify the corresponding element has the correct fill color
});
});

End-to-end tests verify that algorithms render correctly in the browser and that user interactions (stepping forward/backward, changing inputs) work.

tests/e2e/
import { test, expect } from "@playwright/test";
test("binary search visualization loads and steps forward", async ({ page }) => {
await page.goto("/algorithms/binary-search");
// Wait for the visualization to load
await expect(page.locator("canvas")).toBeVisible();
// Verify the step counter starts at 1
await expect(page.locator("[data-testid='step-counter']")).toContainText("1");
// Click the "Next" button
await page.click("[data-testid='next-step']");
// Verify the step counter advances
await expect(page.locator("[data-testid='step-counter']")).toContainText("2");
});
test("binary search handles custom inputs", async ({ page }) => {
await page.goto("/algorithms/binary-search");
// Open the input editor
await page.click("[data-testid='edit-inputs']");
// Modify the target value
await page.fill("[data-testid='input-target']", "99");
// Run the algorithm
await page.click("[data-testid='run-algorithm']");
// Should reach "not found" terminal state
// Navigate to the last step
await page.click("[data-testid='last-step']");
await expect(page.locator("[data-testid='step-title']")).toContainText("Not Found");
});
Terminal window
# Install Playwright browsers (first time only)
npx playwright install
# Run E2E tests
npx playwright test
# Run with UI mode (interactive)
npx playwright test --ui
# Run a specific test file
npx playwright test tests/e2e/binary-search.spec.ts

Many algorithms involve floating-point arithmetic (attention weights, gradient values, loss computations). When comparing floating-point values across TypeScript and Python, use a tolerance of +/-1e-9.

// For individual values
expect(actual).toBeCloseTo(expected, 9); // 9 decimal digits
// For arrays of floats
for (let i = 0; i < actual.length; i++) {
expect(actual[i]).toBeCloseTo(expected[i], 9);
}
import pytest
# For individual values
assert actual == pytest.approx(expected, abs=1e-9)
# For lists of floats
assert actual_list == pytest.approx(expected_list, abs=1e-9)

JavaScript uses IEEE 754 double-precision (64-bit) floats, and Python’s float is also IEEE 754 double-precision. Both have about 15—17 significant decimal digits of precision. A tolerance of 1e-9 allows for small accumulation errors from different ordering of operations while still catching meaningful numerical bugs.


Before submitting a PR, verify:

  • All Vitest tests pass: cd web && npm run test
  • All pytest tests pass: cd python && pytest
  • Cross-language parity passes: python scripts/verify-step-parity.py
  • New algorithm has fixtures for: found, not found, single element
  • Format compliance tests are included
  • State immutability tests are included
  • Edge case tests cover algorithm-specific boundaries
  • No test uses random data (all tests are deterministic)