Module 01: Degree Prediction

Goal: for each node, predict its degree based only on the graph structure and node features generated at training time.

Why This Module Is Important

Degree is a foundational structural property. If a GNN cannot predict degree reliably, it is unlikely to be trustworthy for more demanding graph reasoning tasks. This module is the baseline checkpoint for the rest of the project: it tests whether the model family and data generation pipeline are internally consistent before moving to harder objectives.

Practically, this module also exposes calibration behavior. Exact-match degree accuracy and absolute error directly show whether the model is learning stable local counting behavior or just approximate trends.

How It Was Trained

uv run python -m ai.degree.train --model gcn --name v1 --epochs 5000
uv run python -m ai.degree.train --model sage --name v1 --epochs 5000
uv run python -m ai.degree.train --model gin --name v1 --epochs 5000
uv run python -m ai.degree.train --model loopy --name r3_v1 --r 3 --epochs 5000

Saved Results

Source: ai/trained/degree/*/info.json.

Model Accuracy (%) MAE MSE Best Epoch
gin_v1 100.00 0.0000 0.0000 450
sage_v1 100.00 0.0000 0.0000 450
gcn_v1 23.76 1.8810 3.5382 150
loopy_r3_v1 25.89 6.7847 46.0323 250

Why Models Behave Differently Here

Degree is highly local, so architectures with strong neighborhood aggregation can excel quickly. In your current runs, GIN and SAGE hit perfect rounded-node accuracy, while GCN and Loopy underperform. That pattern suggests this specific training distribution favors local aggregation fidelity and is less forgiving to architectural settings that were tuned for other structural objectives.

The large MAE on loopy_r3_v1 indicates mismatch between its bias and this exact target, not necessarily a universally weaker model. It can still dominate on cycle-sensitive tasks, which is why cross-task comparison matters more than single-task ranking.

Meaning of results: in this training setup, GIN and SAGE perfectly match rounded degree labels on evaluation graphs, while GCN and Loopy are much less stable. This indicates strong model/training-distribution sensitivity for this task.