LiveOps Arena RL Policy Engine

Interactive LiveOps sandbox with one RL decision agent, deterministic guardrails, cinematic replay, and benchmark charts.
The RL policy chooses interventions. Safety rules constrain serving. The replay visualizes simulator events; it is not a second decision agent.

Runtime mode

Deterministic fallback uses bundled repo files. Cloud auto can use BigQuery/Gemini only when configured.
Runtime: checkingChat: checking

Ask the RL agent explanation console

Ask about the current RL recommendation, the red safety gate, OPE, rollout evidence, or what to do next.

Scenario controls

How to test a scenario
1Load a preset below, or move sliders to create a custom player state. Slider changes recalculate the RL recommendation automatically.
2Click Recalculate recommendation if you want an explicit refresh after edits.
3Click Apply served action to mutate the simulator state using the RL action after the red safety gate.
4Click Play next match to freeze the current state and run the battle replay.
Ready: choose a preset or move sliders. Recommendation updates automatically.

Preset scenarios

Manual controls

Manual mode lets you inspect one recommendation or play one match at a time. Auto RL Plan runs several match-policy cycles automatically and shows how the fixed policy adapts after each simulated outcome.

Live metrics

Win probability--
Frustration--
Churn risk--
Power gap--
Cold start--
History confidence--
Control Preview: Estimated Outcomes control-change history
Each dot is a recalculation after a preset load or slider change. These are estimated outcomes before a simulated match is run.
X-axis: control update sequence, not match time
win frustration churn

Cinematic battle replay

state snapshot → replay events → updated telemetry
Player-- power
Boss-- power
Change a slider, apply an action, or play a match.
Progress0%
Player HP100
Boss HP100
Last actionnone
Actual match telemetry two points per match: before / after
Actual telemetry updates only after completed simulated matches. Separators indicate different matches. Each match adds a before point and an after point.
win probability frustration churn

Autonomous RL Rollout

Run a 10-match policy loop to show: state -> RL recommendation -> applied action -> simulated match -> updated telemetry -> next recommendation.

No automatic rollout has run yet.

Benchmark

Policy comparison from /arena/benchmark

Agent operations, health checks, and 7-day progress

This restores the original operations console below the arena. It shows health checks, policy metrics, audit/explanation tools, OPE, recent logs, manual recommendation tests, and day-by-day simulation cards.

The console is framed as one RL Decision Agent with an internal red safety/risk gate; the explanation console only answers questions about what the agent did.