Module 01: Degree Prediction

Goal: for each node, predict its degree based only on the graph structure and node features generated at training time.

Why This Module Is Important

Degree is a foundational structural property. If a GNN cannot predict degree reliably, it is unlikely to be trustworthy for more demanding graph reasoning tasks. This module is the baseline checkpoint for the rest of the project: it tests whether the model family and data generation pipeline are internally consistent before moving to harder objectives.

Practically, this module also exposes calibration behavior. Exact-match degree accuracy and absolute error directly show whether the model is learning stable local counting behavior or just approximate trends.

How It Was Trained

Models trained on random graphs generated each epoch.
4 input features per node, hidden size 64, 4 layers, dropout 0.2.
5000 epochs, learning rate 0.001, 50 graphs per epoch.
Evaluation rounds predictions and computes exact-match node accuracy.

uv run python -m ai.degree.train --model gcn --name v1 --epochs 5000
uv run python -m ai.degree.train --model sage --name v1 --epochs 5000
uv run python -m ai.degree.train --model gin --name v1 --epochs 5000
uv run python -m ai.degree.train --model loopy --name r3_v1 --r 3 --epochs 5000

Saved Results

Source: ai/trained/degree/*/info.json.

Model	Accuracy (%)	MAE	MSE	Best Epoch
gin_v1	100.00	0.0000	0.0000	450
sage_v1	100.00	0.0000	0.0000	450
gcn_v1	23.76	1.8810	3.5382	150
loopy_r3_v1	25.89	6.7847	46.0323	250

Why Models Behave Differently Here

Degree is highly local, so architectures with strong neighborhood aggregation can excel quickly. In your current runs, GIN and SAGE hit perfect rounded-node accuracy, while GCN and Loopy underperform. That pattern suggests this specific training distribution favors local aggregation fidelity and is less forgiving to architectural settings that were tuned for other structural objectives.

The large MAE on loopy_r3_v1 indicates mismatch between its bias and this exact target, not necessarily a universally weaker model. It can still dominate on cycle-sensitive tasks, which is why cross-task comparison matters more than single-task ranking.

Meaning of results: in this training setup, GIN and SAGE perfectly match rounded degree labels on evaluation graphs, while GCN and Loopy are much less stable. This indicates strong model/training-distribution sensitivity for this task.