Invited Technical Talk: Next Generation Supercomputers
From Anita Borg Institute Wiki
Invited Technical Talk: Next Generation Supercomputers
Next generation supercomputers: Exploiting innovative massively parallel system architecture to facilitate breakthrough scientific discoveries
Presenter:
Valentina Salapura, Research staff member,Exploratory Server Systems group, IBM T. J. Watson Research center
In her own words – (from article - http://www.hpcwire.com/hpc/1829825.html)
“I feel greatly honored by this invitation to give a plenary speech. I'll be talking about my technical field, computer architecture. As you may know, computer architecture is going through a major industry-wide revolution right now. From faster and faster single processors we're shifting to multiprocessor systems, where multiple processors share in the work. I have worked on multiprocessors for a long time, so this shift is like a personal victory.
Specifically, I'll be talking about the next generation of the Blue Gene system. As you may know, Blue Gene is IBM's top ranked supercomputer and has been the world's fastest system for several years. This new system means a lot to me personally, since I contributed to virtually every part of the system. I served as the lead for the memory coherence architecture with snoop filters, which ensures that the processors can share data correctly with low overhead, and was also the lead for the performance monitoring unit, which keeps track of how well the computer is working. In addition, I led the bring up work to assemble the initial system prototype once our chip came back from manufacturing, coordinating a large cross site team.”
Links:
About Dr Salapura and further reading (papers,tutorials, articles) - http://domino.research.ibm.com/comm/research_people.nsf/pages/salapura.index.html
Dr Salapura’s publications: http://domino.research.ibm.com/comm/research_people.nsf/pages/salapura.pubs.html
Dr Salapura, thought on GHC - http://www.hpcwire.com/hpc/1829825.html
Link to the powerpoint presentation:
http://community.anitaborg.org/wiki/images/9/92/GHC07-BlueGene_salapura.pdf
Notes from powerpoint presentation:
Please refer to Dr Salapura’s website for details and diagrams of system and node designs.
Super Computers:
They are computers but many times faster. It is brain behind Science and can bring many breakthroughs in areas like security, bio science, and astronomy etc.
- CMOS Scaling:
o Dennard’s scaling theory o Increases compute density o Enables higher compute speed
- The end of scaling – Atoms don’t scale, so scale systems, not atoms and start optimizing systems
- Concentrate on the holistic design -Innovation across the entire value stack o core architecture o chip design o devices o system architecture
- Understanding design leverage points is important o Achieve design efficiency o Optimize BlueGene for power/performance High integration, memory hierarchy, SIMD processing
- Low power consumption is important not only at chip level but also at data center level – systems are power hungry and cooling is important
- Thermal and power issues at the data center level o Data center limits determines what systems can be deployed o At data center level, only half of electricity bill goes to systems o Air cooling capacity limited
- Supercomputer Challenge o More Performance means more Power o Scaling up single core performance
- BlueGene Concepts o Usage of many low power processors o Parallelism can deliver higher aggregate performance o Data level parallelism with SIMD o Thread level parallelism with a multi-core design o Improve efficiency of massively parallel systems o High performance networks for synchronization & communication
- BlueGene Design Philosophy o Reduce no of components which must be designed o Build on standards o Mature and familiar software environment
- Designing BlueGene/P o Emphasis on modular design and component reuse o Reuse of BlueGene/L design components when feasible o Add new capabilities when profitable e.g. new data moving engine, new performance monitor unit o Keep existing architecture as much as possible for cost benefits in terms of resources and team bandwidth
- Exploiting Data Level Parallelism: SIMD Floating point unit o Two replicas of a std single-pipe PowerPC FPU o Single instruction operates on multiple
- Thread level Parallelism
- Cost of maintaining coherence o Every time any processor requests data o High overhead cost e.g. cache busy for significant fraction of cycles. Penalty increases as more processors are added o Snoop filtering to remove unnecessary lookups is a solution
- Snoop filters should be small and power efficient and should have functional correctness. BlueGene/P implements multiple snoop filters
- Performance monitoring unit to understand behavior of their programs using hybrid implementation using SRAM arrays