Training Notes
This page summarizes how models are trained and stored, including the RL setup used in cage generation experiments.
Supervised Tasks
Degree and min-cycle tasks share a common model family interface.
uv run python -m ai.degree.train --model gin --name v1 --epochs 5000
uv run python -m ai.degree.train --model loopy --name r3_v1 --r 3 --epochs 5000
uv run python -m ai.min_cycle.train --model gcn --name v1 --epochs 5000
uv run python -m ai.min_cycle.train --model sage --name v1 --epochs 5000
Model Artifact Structure
ai/trained/
degree/
gin_v1/
info.json
weights.pt
min_cycle/
loopy_r3_v1/
info.json
weights.pt
cage/
<model_id>/
info.json
weights.pt
The registry reads metrics and metadata from info.json and
loads weights from weights.pt.
PPO / Cage RL
PPO training builds an actor-critic network and collects rollouts with valid-action masks from the cage environment.
uv run python -m ai.cage.rl.train \
--model gin \
--name ppo_exp \
--steps 100000 \
--update-interval 2048 \
--lr 3e-4
During inference, generation requests run in isolated sessions and are polled by session ID until complete or stopped.