authors: Jason Lowe-Power
last edited: 2024-09-02 21:16:16 +0000

More details of the gem5 Ruby Interconnection Network are here.

Garnet2.0: An On-Chip Network Model for Heterogeneous SoCs

Garnet2.0 is a detailed interconnection network model inside gem5. It is in active development, and patches with more features will be periodically pushed into gem5. Additional garnet-related patches and tool support under development (not part of the repo) can be found at the Garnet page at Georgia Tech.

Garnet2.0 builds upon the original Garnet model which was published in 2009.

If your use of Garnet contributes to a published paper, please cite the following paper:

    @inproceedings{garnet,
      title={GARNET: A detailed on-chip network model inside a full-system simulator},
      author={Agarwal, Niket and Krishna, Tushar and Peh, Li-Shiuan and Jha, Niraj K},
      booktitle={Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on},
      pages={33--42},
      year={2009},
      organization={IEEE}
    }

Garnet2.0 provides a cycle-accurate micro-architectural implementation of an on-chip network router. It leverages the Topology and Routing frastructure provided by gem5’s ruby memory system model. The default router is a state-of-the-art 1-cycle pipeline. There is support to add additional delay of any number of cycles in any router, by specifying it within the topology.

Garnet2.0 can also be used to model an off-chip interconnection network by setting appropriate delays in the routers and links.

Invocation

The garnet networks can be enabled by adding –network=garnet2.0.

Configuration

Garnet2.0 uses the generic network parameters in Network.py:

Additional parameters are specified in garnet2.0/GarnetNetwork.py:

Topology

Garnet2.0 leverages the Topology infrastructure provided by gem5’s ruby memory system model. Any heterogeneous topology can be modeled. Each router in the topology file can be given an independent latency, which overrides the default. In addition, each link has 2 optional parameters: src_outport and dst_inport, which are strings with names of the output and input ports of the source and destination routers for each link. These can be used inside garnet2.0 to implement custom routing algorithms, as described next. For instance, in a Mesh, the west to east links have src_outport set to “west” and dst_inport” set to “east”.

Routing

Garnet2.0 leverages the Routing infrastructure provided by gem5’s ruby memory system model. The default routing algorithm is a deterministic table-based routing algorithm with shortest paths. Link weights can be used to prioritize certain links over others. See src/mem/ruby/network/Topology.cc for details about how the routing table is populated.

Custom Routing: To model custom routing algorithms, say adaptive, we provide a framework to name each link with a src_outport and dst_inport direction, and use these inside garnet to implement routing algorithms. For instance, in a Mesh, West-first can be implemented by sending a flit along the “west” outport link till the flit no longer has any X- hops remaining, and then randomly (or based on next router VC availability) choosing one of the remaining links. See how outportComputeXY() is implemented in src/mem/ruby/network/garnet2.0/RoutingUnit.cc. Similarly, outportComputeCustom() can be implemented, and invoked by adding –routing-algorithm=2 in the command line.

Multicast messages: The network modeled does not have hardware multi-cast support within the network. A multi-cast message gets broken into multiple uni-cast messages at the Network Interface.

Flow Control

Virtual Channel Flow Control is used in the design. Each VC can hold one packet. There are two kinds of VCs in the design - control and data. The buffer depth in each can be independently controlled from GarnetNetwork.py. The default values are 1-flit deep control VCs, and 4-flit deep data VCs. Default size of control packets is 1-flit, and data packets is 5-flit.

Router Microarchitecture

The garnet2.0 router performs the following actions:

  1. Buffer Write (BW): The incoming flit gets buffered in its VC.
  2. Route Compute (RC) The buffered flit computes its output port, and this information is stored in its VC.
  3. Switch Allocation (SA): All buffered flits try to reserve the switch ports for the next cycle. [The allocation occurs in a separable manner: First, each input chooses one input VC, using input arbiters, which places a switch request. Then, each output port breaks conflicts via output arbiters]. All arbiters in ordered virtual networks are queueing to maintain point-to-point ordering. All other arbiters are round-robin.
  4. VC Selection (VS): The winner of SA selects a free VC (if HEAD/HEAD_TAIL flit) from its output port.
  5. Switch Traversal (ST): Flits that won SA traverse the crossbar switch.
  6. Link Traversal (LT): Flits from the crossbar traverse links to reach the next routers.

In the default design, BW, RC, SA, VS, and ST all happen in 1-cycle. LT happens in the next cycle.

Multi-cycle Router: Multi-cycle routers can be modeled by specifying a per-router latency in the topology file, or changing the default router latency in src/mem/ruby/network/BasicRouter.py. This is implemented by making a buffered flit wait in the router for (latency-1) cycles before becoming eligible for SA.

Buffer Management

Each router input port has number_of_virtual_networks Vnets, each with vcs_per_vnet VCs. VCs in control Vnets have a depth of buffers_per_ctrl_vc (default = 1) and VCs in data Vnets have a depth of buffers_per_data_vc (default = 4). Credits are used to relay information about free VCs, and number of buffers within each VC.

Lifecycle of a Network Traversal

Running Garnet2.0 with Synthetic Traffic

Garnet2.0 can be run in a standalone manner and fed with synthetic traffic. The details are described here: Garnet Synthetic Traffic