Efficient Mapping of Applications for Future Chip-Multiprocessors in Dark Silicon Era
The failure of Dennard scaling has led to the utilization wall that is the source of dark silicon and limits the percentage of a chip that can actively switch within a given power budget. To address this issue, a structure is needed to guarantee the limited power budget along with providing sufficient flexibility and performance for different applications with various communication requirements. In this article, we present a generalpurpose platform for future many-core Chip-Multiprocessors (CMPs) that benefits from the advantages of clustering, Network-on-Chip (NoC) resource sharing among cores, and power gating the unused components of clusters. We also propose two task mapping methods for the proposed platform in which active and dark cores are dispersed appropriately, so that an excess of power budget can be obtained. Our evaluations reveal that the first and second proposed mapping mechanisms respectively reduce the execution time by up to 28.6% and 39.2% and the NoC power consumption by up to 11.1% and 10%, and gain an excess power budget of up to 7.6% and 13.4% over the baseline architecture.
Authors
Mohaddeseh Hoveida, Fatemeh Aghaaliakbari, Ramin Bashizade, Mohammad Arjomand, Hamid Sarbazi-Azad
Journal
Transactions on Design Automation of Electronic Systems (TODAES)
Publisher
ACM
Pages
1-26
Publication date
2017/6/15
Issue
4
Volume
22
Authors
Fatemeh Aghaaliakbari, Mohaddeseh Hoveida, Mohammad Arjomand, Majid Jalili, Hamid Sarbazi-Azad
Publication date
2016/10/2
Pages
336-343
Publisher
IEEE
Conference
34th International Conference on Computer Design (ICCD)
Efficient Processor Allocation in a Reconfigurable CMP Architecture for Dark Silicon Era
The continuance of Moore's law and failure of Dennard scaling force future chip multiprocessors (CMPs) to have considerable dark regions. How to use up available dark resources is an important concern for computer architects. In harmony with these changes, we must revise processor allocation schemes that severely affect the performance of a parallel on-chip system. A suitable allocation algorithm should reduce runtime and increase the power efficiency with proper thermal distribution to avoid hotspots. With this motivation, this paper proposes a power-efficient and high performance general purpose infrastructure for which a Dark Silicon Aware Processor Allocation (DSAPA) scheme is proposed which targets future many-core systems. To obtain high performance, we suggest a tunable-clustered mesh with the capability of sharing NoC resources in each cluster. We also employ a buffer-level power gating technique is used to improve power efficiency. Evaluation results reveal that the maximum achieved performance and power consumption improvements are 38.7% and 29.4% for multi-threaded workloads over the equivalent conventional design.
Authors
Mohaddeseh Hoveida, Fatemeh Aghaaliakbari, Majid Jalili, Ramin Bashizade, Mohammad Arjomand, Hamid Sarbazi-Azad
Publication date
2018/1/1
Pages
35-81
Publisher
Elsevier
Book
Advances in Computers
Volume
110
Revisiting Processor Allocation and Application Mapping in Future CMPs in Dark Silicon Era
With technology advances and the emergence of new fabrication and VLSI technologies, current and future chip multiprocessors (CMPs) are expected to have tens to hundreds of processing elements and Gigabytes of on-chip caches, which are connected by a high bandwidth network-on-chip (NoC). Unfortunately, due to limited power budget of a computing system, specially for its processing element(s), it is impossible to keep all cores, caches, and network elements working at highest voltage level—that would resulted in dark silicon computing era, where by employing system-level or architecture-level techniques, one can keep a great portion of a CMP elements OFF (or in dim mode) to meet the power budget of the system while the system still delivers a high-performance computation.
In this work, we first describe the importance of NoC design and management in delivering high-performance computation in a dark silicon-based CMP platform—we propose a novel highly scalable NoC architecture and its required management policies in order to support turning some routers/links/buffers OFF while guaranteeing that it delivers the bandwidth needs of the running application(s). Then, by employing the introduced NoC architecture, we propose to revisit the processor allocation strategy and application-to-core mapping algorithm in order to make maximum use of the provided NoC bandwidth and capability while meeting the power and performance goals of the hardware platform and application, respectively. Our extensive simulation results of a 64-core CMP model show that the proposed algorithms are able to improve the system performance by 10%–50% when running multithreaded applications.