ilgyu-yi

Generalizable Agents

2025-06-21

Modular RL Agent Architecture for Adaptive Environments

As environmental contexts in architectural design automation continuously evolve, it's crucial that our reinforcement learning(RL) agents remain flexible and easily updatable. To accommodate frequent changes—such as adding new entity types, modifying observation formats, or replacing sub-policies—I adopted a component-based model composition strategy.

Modular Component Design

The RL agent is decomposed into interchangeable components at multiple levels:

These components are defined separately, and the overall agent architecture is constructed via a simple configuration file (e.g., YAML or JSON). Users can swap, extend, or modify components without rewriting core code—allowing rapid adaptation to new tasks or environmental updates.

Benefits of the Modular Approach

  1. Scalability

    • Easy to add new action heads or encoders as new entity types or design tasks emerge.
  2. Maintainability

    • Individual modules can be debugged, optimized, or unit-tested in isolation.
  3. Reusability

    • Shared components (e.g., a common encoder or RL backbone) can be reused across different designs or projects.
  4. Configurability

    • Non-developers or researchers can prototype new architectures by editing a config file—no need to dive into the codebase.
  5. Robustness to Change

    • When the environment's API or observation space changes, only the relevant module needs adjustment, not the whole agent.

Implementation Highlights from the unirl Repo

Overall architecture

architecture-generalizable-agents

This diagram illustrates the overall architecture and dependency flow of the unirl framework.

Overall, the diagram shows how configuration, model components, preprocessing, environment, and learning logic are interconnected to create a modular and easily reconfigurable reinforcement learning system.

Information flow

information-flow-generalizable-agents

This diagram shows the interaction between the Rollout and Train phases in the unirl framework.

Overall, the diagram captures the information flow loop between environment interaction(rollout) and parameter optimization(train), showing exactly which variables are passed between components.

Distributed system

distributed-generalizable-agents

This diagram shows unirls distributed training loop with a Master–Slave setup.

It separates data collection(many runners on both master and slave) from parameter authority(single master), enabling scalable throughput while keeping a single source of truth for updates.

Summary

The unirl codebase exemplifies a composable RL agent framework that aligns perfectly with the evolving nature of architectural automation:

← To Profile