Automatic Generation of Parallel Programs with Dynamic Scheduling on a Network-on-Chip
From Anita Borg Institute Wiki
Presenter: Jungsook Yang (University of California, Irvine)
As the billion-transistor era approaches, the complexity increase in embedded systems design prevents the efficient utilization of resources. Network-on-Chip (NoC) has been introduced to interconnect multiple IP cores by packet switching providing modularity, regularity and scalability. In this work, we propose an application mapping framework and run-time load balancing strategies to fully exploit the power of parallel processors that use a NoC for communication.
Liz Kiewiet, GHC 2009 Live Notetaker. I also blog on the official Grace Hopper blog at http://ghcbloggers.blogspot.com
Increasing power for processor is required. In response to semiconductor challenges, manufacturers are now switching to chip multiprocessor. These have performance demands.
On-chip communication:
- Bus : faster because they use dedicated bus
- NoC : more scalable, modular approach, appropriate for on-chip interconnects in the future
Solutions:
- improve communication scheme for network-on-chip
- Suggest runtime load balancing stragey on network-on-chip
- develop multicore network-on-chip simulator
NoC
- Scalable, general purpose, multi-hop network. Interconnects multiple processing elements. On the chip there are routers that are connected to each other. Adopts packet switching for communication.
- Router has to forward packets. Looks at header flow control unit(flit) and makes the routing decision based off of that.
- Three different memory structures exist on the NoC. Core processors have access to all three memories.
Proposed Solution
- Efficient NoC Communication
- use MIMD style of NoC (explicit message passing needed for communication) - process multiple read and send requests at the same time - read request and write request memory instide of NoC so that when a processor wants to read the packet, it removes the ID from memory and allows it to handle multiple requests out of order. - send request read in first-in, first out order
- Runtime load balancing
- Problem is that unabalanced workload results in poor performance. - Solution: runtime re-distribution of workload, have workload info in NI register. Make underloaded PE handle the overload of neighbors. Four steps:
- underloaded PE sends out workload probes to neighbors.
- when neighbor receives probe, it sends a response packet with its information to the requesting PE.
- based on workloads of neighbors, the requesting PE calculates the workload overhead.
- the requesting PE sends out the updated workload to its neighbors.
NoC Simulator
- Objectives: measure performance of NoC, executre target parallel applications on the simulator, help develop HW/SW optimization stragegy.
- Three different versions of simulator: RTL model of Verilog HDL, RTL in SystemC, Transaction Level in SystemC
Questions
Liz Kiewiet, GHC 2009 Live Notetaker. I also blog on the official Grace Hopper blog at http://ghcbloggers.blogspot.com