ilgyu-yi

AlphaStar Architecture

2025-08-18

TL;DR
AlphaStar’s recipe is pragmatic:
(1) structure the state along what it really is (scalars, entities, spatial).
(2) narrow the combinatorial action space with an autoregressive head stack.
(3) keep rich side-channels (pointer keys, map skips) so heads don’t have to squeeze everything through a single vector.


Relational Inductive Bias

Fully-connected nets are expressive, but when locality or relations dominate, you win by choosing structure that matches the world. Images reward convolution; multi-entity worlds reward set reasoning and spatial fusion. The key claim here: the architecture itself acts like a constraint, nudging the model to extract the right relations instead of memorizing spurious ones.


Encoders: three views of the world

1) Scalar Encoder (Dense, with context gating)

2) Entity Encoder (Multi-head self-attention)

3) Spatial Encoder (CNN + ResBlocks, with Scatter Connections)


Core (LSTM): temporal glue

Concatenate Embedded Scalar / Entity / Spatial with the previous hidden state to produce the Embedded State—the base autoregressive embedding for the head stack.


Heads: an autoregressive control surface

A monolithic action prediction explodes combinatorially. Instead, predict a sequence and update context after each choice:

  1. Action Type (ResNet/MLP + GLU gate)
    Uses Scalar Context to gate the top-level decision (build/move/attack/cast…).

  2. Delay / Queue (MLP)
    Often discretized even though it’s scalar—this avoids committing to a specific continuous distribution early and tends to train cleaner. Queue toggles “now vs later”.

  3. Selected Units (Pointer Network)
    Query = current AR embedding (optionally extended with a small LSTM when longer selection sequences help).
    Keys = Entity Embeddings.
    Samples which controllable units will execute the action.

  4. Target Unit (Pointer Network)
    Same mechanism, different role—choose the target unit for attack/heal, etc. Skipped for action types without a unit target.

  5. Location (Deconv/ResNet with FiLM-like modulation)
    Consumes the Map Skips directly; masks invalid coordinates based on the chosen action type. Feature-wise modulation helps combine multi-scale spatial evidence with the current AR context.

Critical loop. After each head samples, fold the choice back into the AR embedding before moving on. That way, later heads “know” what earlier heads picked even without immediate environment feedback.


Scatter Connections — deeper dive


Practical patterns that transfer


Closing

AlphaStar’s architecture is less about exotic tricks and more about matching representation to reality, then letting an AR head stack express complex actions step by step. If your environment is multi-entity and partially observed, this template travels with minimal ceremony.

← To Profile