Training Notes

This page summarizes how models are trained and stored, including the RL setup used in cage generation experiments.

Supervised Tasks

Degree and min-cycle tasks share a common model family interface.

uv run python -m ai.degree.train --model gin --name v1 --epochs 5000
uv run python -m ai.degree.train --model loopy --name r3_v1 --r 3 --epochs 5000

uv run python -m ai.min_cycle.train --model gcn --name v1 --epochs 5000
uv run python -m ai.min_cycle.train --model sage --name v1 --epochs 5000

Model Artifact Structure

ai/trained/
  degree/
    gin_v1/
      info.json
      weights.pt
  min_cycle/
    loopy_r3_v1/
      info.json
      weights.pt
  cage/
    <model_id>/
      info.json
      weights.pt

The registry reads metrics and metadata from info.json and loads weights from weights.pt.

PPO / Cage RL

PPO training builds an actor-critic network and collects rollouts with valid-action masks from the cage environment.

uv run python -m ai.cage.rl.train \
  --model gin \
  --name ppo_exp \
  --steps 100000 \
  --update-interval 2048 \
  --lr 3e-4

During inference, generation requests run in isolated sessions and are polled by session ID until complete or stopped.

Training is compute-heavy. For deployment, keep training scripts off the public-serving process.