Game snapshots
A game target scripts a walkthrough of your game — inputs, waits, assertions — and snapshots semantic game state at named markers, with optional screenshots at the same markers. The state diff is the gate, so a gameplay change reads like one:
~ $.markers.game-over.state.Player.position[0]: 398 → 438
~ $.markers.game-over.tick: 174 → 194The CLI is engine-agnostic: it launches the engine and speaks a small versioned protocol (stdio JSON-lines) to a thin engine-side adapter. Godot 4.x is the first supported engine.
Configure a target
{
"kind": "game",
"name": "godot-demo",
"engine": "godot",
"project": "examples/game/godot",
"walkthrough": "examples/game/walkthrough.json"
}Options:
engine(required) — adapter id;"godot"is the only engine today.project(required) — the engine project directory (containsproject.godot).walkthrough(required) — path to the walkthrough script (JSON, below).mode—"semantic"(default) runs fully headless and captures state only;"visual"adds one screenshot per marker and needs a display (a brief window locally,xvfb-runon Linux CI).enginePath— engine binary. Falls back toDUNGBEETLE_GODOT_PATH, thengodotonPATH.seed— RNG seed applied before the run (default0).physicsFps— fixed physics tick rate (default60).screenshotMode—"advisory"(default): visual changes are reported but don't fail the run;"strict": visual changes gate like any other diff.pixelTolerance— visual tolerance for this target (maxChangedRatio,perChannelThreshold). Defaults to the game kind's non-zero tolerance (0.002/3), which absorbs GPU/driver rasterization drift, not gameplay changes.markers— per-marker overrides:{ "game-over": { "pixelTolerance": { "maxChangedRatio": 0 } } }.timeoutMs— whole-run watchdog (default: the lifecycle wait timeout). A broken game script can never hang CI: silence is fatal.
Walkthrough scripts
A walkthrough is a JSON file with a steps array. Five step types:
{
"description": "Menu → collect three squares → game over.",
"steps": [
{ "wait": 10 },
{ "screenshot": "menu" },
{ "input": "ui_accept" },
{ "waitFor": "Main:started == true", "timeoutTicks": 60 },
{ "input": "move_right", "mode": "down" },
{ "waitFor": "Player:collected >= 3", "timeoutTicks": 600 },
{ "input": "move_right", "mode": "up" },
{ "screenshot": "game-over" },
{ "assert": "Player:collected >= 3" }
]
}input— press an input-map action (not a raw key: action-level injection at physics-tick boundaries is measurably deterministic; raw event injection is not).modeis"tap"(default;tickssets the hold duration),"down", or"up".wait— advance N physics ticks.waitFor— poll a predicate each tick until true;timeoutTicksis mandatory, so a walkthrough can never wait forever.screenshot— a named marker: captures semantic state always, plus a screenshot in visual mode. Names are kebab-case and unique; they key the snapshot, the baseline PNGs, and the review UI sections.assert— a predicate that must hold; fails the step (with its index) otherwise.
Predicates read node properties by scene path: "NodePath:property <op> value" with ==, !=, >=, <=, >, <. "Main" (the scene root's name) and "." both address the root.
Exposing game state
Semantic state is an allowlist: only nodes in the engine-side dungbeetle group are captured (class + position), and a node can contribute game-specific fields via get_dungbeetle_state():
func _ready() -> void:
add_to_group("dungbeetle")
func get_dungbeetle_state() -> Dictionary:
return {"collected": collected, "lives": lives}Snapshot model
{
"kind": "game",
"engine": "godot",
"markers": {
"menu": { "tick": 6, "state": { "Player": { "position": [100, 160] } } },
"game-over": { "tick": 174, "state": { "Player": { "position": [398, 160] } } }
},
"screenshots": { "menu": { "sha256": "…" }, "game-over": { "sha256": "…" } }
}- Engine and adapter versions are runtime metadata — an engine patch release never diffs a baseline. Version compatibility is
doctor's job. - Screenshots are stored as digests in the baseline; the PNGs live alongside it as
<name>.<marker>.png. - Marker
tickis part of the snapshot: the run ending 20 ticks later is a gameplay change worth seeing.
Determinism
Enforced by the adapter on every run — there is no opt-out: seeded RNG, fixed physics timestep, vsync off, action-level input injection. On identical hardware this measures byte-identical across runs (state and pixels).
What that means across platforms:
| scope | semantic state | pixels |
|---|---|---|
| same machine | byte-identical | byte-identical |
| same OS + GPU class | identical | default tolerance absorbs driver drift |
| across OS / GPU | identical — promised | not promised — generate baselines where you compare; screenshots stay advisory |
Verify it empirically with the flake harness:
dungbeetle flake --config dungbeetle.config.json --repeat 5which captures each target N times with no baseline and reports run-to-run divergence per marker (non-zero exit on any flake — wire it into CI).
Doctor checks
dungbeetle doctor reports, per target: project exists, walkthrough valid (with step indices on every issue), adapter installed + protocol compatibility (with an upgrade hint naming which side to update), engine version (runs --version), the enforced determinism knobs, marker-override typos, and a size warning for visual mode on anonymous pushes.
The adapter
The Godot addon lives at adapters/godot in the CLI repository: copy addons/dungbeetle/ into your project and enable the plugin (or register the Dungbeetle autoload by hand). It is inert outside CLI-launched runs — shipping it in your game costs nothing. Engine breakage is fixed in adapter patch releases, never as CLI churn; see the adapter README for the supported Godot versions and compatibility policy.
Try it
The Godot example walks the whole loop — green run, deliberate gameplay change, semantic diff — in about two minutes.