Automatic Generation of Parallel Programs with Dynamic Scheduling on a Network-on-Chip

From Anita Borg Institute Wiki

Jump to: navigation, search

Presenter: Jungsook Yang (University of California, Irvine)

As the billion-transistor era approaches, the complexity increase in embedded systems design prevents the efficient utilization of resources. Network-on-Chip (NoC) has been introduced to interconnect multiple IP cores by packet switching providing modularity, regularity and scalability. In this work, we propose an application mapping framework and run-time load balancing strategies to fully exploit the power of parallel processors that use a NoC for communication.


Liz Kiewiet, GHC 2009 Live Notetaker. I also blog on the official Grace Hopper blog at http://ghcbloggers.blogspot.com

Increasing power for processor is required. In response to semiconductor challenges, manufacturers are now switching to chip multiprocessor. These have performance demands.

On-chip communication:

  • Bus : faster because they use dedicated bus
  • NoC : more scalable, modular approach, appropriate for on-chip interconnects in the future

Solutions:

  • improve communication scheme for network-on-chip
  • Suggest runtime load balancing stragey on network-on-chip
  • develop multicore network-on-chip simulator

NoC

  • Scalable, general purpose, multi-hop network. Interconnects multiple processing elements. On the chip there are routers that are connected to each other. Adopts packet switching for communication.
  • Router has to forward packets. Looks at header flow control unit(flit) and makes the routing decision based off of that.
  • Three different memory structures exist on the NoC. Core processors have access to all three memories.

Proposed Solution

  • Efficient NoC Communication

- use MIMD style of NoC (explicit message passing needed for communication) - process multiple read and send requests at the same time - read request and write request memory instide of NoC so that when a processor wants to read the packet, it removes the ID from memory and allows it to handle multiple requests out of order. - send request read in first-in, first out order

  • Runtime load balancing

- Problem is that unabalanced workload results in poor performance. - Solution: runtime re-distribution of workload, have workload info in NI register. Make underloaded PE handle the overload of neighbors. Four steps:

  1. underloaded PE sends out workload probes to neighbors.
  2. when neighbor receives probe, it sends a response packet with its information to the requesting PE.
  3. based on workloads of neighbors, the requesting PE calculates the workload overhead.
  4. the requesting PE sends out the updated workload to its neighbors.

NoC Simulator

  • Objectives: measure performance of NoC, executre target parallel applications on the simulator, help develop HW/SW optimization stragegy.
  • Three different versions of simulator: RTL model of Verilog HDL, RTL in SystemC, Transaction Level in SystemC

Questions

Liz Kiewiet, GHC 2009 Live Notetaker. I also blog on the official Grace Hopper blog at http://ghcbloggers.blogspot.com

Personal tools