Runtime modechecking
Data sourcechecking
BigQuerychecking
Explanation modechecking
Repo datachecking
Cloud configchecking
Env filechecking
Projectchecking
BigQuery datasetchecking
Policy loadedchecking
Offline episodeschecking

Adaptive LiveOps Decision Engine

Policy Replay Simulator with one RL Decision Agent, deterministic safety guardrails, cinematic replay, and benchmark evidence.
The RL Decision Agent scores interventions, applies an internal red safety/risk gate, serves one action, and logs outcomes for explanation.

Runtime mode

Local path uses bundled repo files. Cloud path uses BigQuery/Gemini only when configured and confirmed by /health.
Runtime: checkingChat: checking

Ask the RL agent explanation console

Ask about the current RL recommendation, the red safety gate, OPE, rollout evidence, or what to do next.

Scenario controls

How to test a scenario
1Load a preset below, or move sliders to create a custom player state. Slider changes recalculate the RL recommendation automatically.
2The RL recommendation refreshes automatically after edits; Preview decision is optional.
3The primary play button automatically applies the served RL action after the red safety gate.
4Click Play Next Match to apply the served action, run the battle replay, and refresh the metrics.
Ready: choose a preset or move sliders. Recommendation updates automatically.

Preset scenarios

Live metrics

Win probability--
Frustration--
Churn risk--
Power gap--
Cold start--
History confidence--

Manual mode lets you inspect one recommendation or play one match at a time. Auto RL Plan runs several match-policy cycles automatically and shows how the fixed policy adapts after each simulated outcome.

Policy Replay Simulator

state snapshot -> replay events -> updated telemetry
Player-- power
Boss-- power
Change a slider or select a preset; the RL recommendation refreshes automatically. Click Play Next Match to advance.
Progress0%
Player HP100
Boss HP100
Last actionnone

Autonomous RL Rollout

Run a 10-match policy loop to show: state -> RL recommendation -> applied action -> simulated match -> updated telemetry -> next recommendation.

No automatic rollout has run yet.

Estimate vs actual match response

Left: the policy estimate updates when presets or controls change. Right: actual telemetry appends only after Play Next Match completes.

side-by-side
Control Preview: Estimated Outcomes control-change history
win frustration churn
Each dot is a recalculation after a preset load or slider change. These are estimated outcomes before a simulated match is run.
X-axis: control update sequence, not match time
Actual match telemetry two points per match: before / after
Actual telemetry updates only after completed simulated matches. Separators indicate different matches. Each match adds a before point and an after point.
win probability frustration churn
Adds before/after points when Play Next Match runs and updates the player state.

Benchmark

Policy comparison safety-gated RL vs baselines

Historical and predicted trajectory

Seven-day view for the selected scenario. Hover cards for why churn, frustration, or win probability changed.

Agent operations, health checks, and 7-day progress

Operational evidence for the same RL Decision Agent: health checks, policy metrics, audit/explanation tools, OPE, recent logs, dataset tools, and day-by-day progress cards.

The red safety/risk gate is an internal serving component. The explanation console describes what happened; it does not choose actions.

Operational tools

Focused checks update the results panel without opening a second page.

Live RL Decision Agent Path

Synced with simulator
Choose a preset or move sliders above. The served action, safety gate, expected effect, and match context will appear here automatically.

Explanation console

Ask about the current recommendation, red safety/risk gate, OPE confidence, rollout evidence, or next action.

RL explanation console ready.
Operations raw JSON debug payload, collapsed by default
{}

Dataset summary

Rowsloading
Playersloading
Frustrationderived

Frustration is derived from behavioral telemetry such as retries, failed challenges, near misses, idle time, fatigue, and power gap.