Efficient Mapping of Applications for Future Chip-Multiprocessors in Dark Silicon Era

The failure of Dennard scaling has led to the utilization wall that is the source of dark silicon and limits the percentage of a chip that can actively switch within a given power budget. To address this issue, a structure is needed to guarantee the limited power budget along with providing sufficient flexibility and performance for different applications with various communication requirements. In this article, we present a generalpurpose platform for future many-core Chip-Multiprocessors (CMPs) that benefits from the advantages of clustering, Network-on-Chip (NoC) resource sharing among cores, and power gating the unused components of clusters. We also propose two task mapping methods for the proposed platform in which active and dark cores are dispersed appropriately, so that an excess of power budget can be obtained. Our evaluations reveal that the first and second proposed mapping mechanisms respectively reduce the execution time by up to 28.6% and 39.2% and the NoC power consumption by up to 11.1% and 10%, and gain an excess power budget of up to 7.6% and 13.4% over the baseline architecture.

Authors

Mohaddeseh Hoveida, Fatemeh Aghaaliakbari, Ramin Bashizade, Mohammad Arjomand, Hamid Sarbazi-Azad

Journal

Transactions on Design Automation of Electronic Systems (TODAES)

Publisher

ACM

Pages

1-26

Publication date

2017/6/15

Issue

4

Volume

22

Authors

Fatemeh Aghaaliakbari, Mohaddeseh Hoveida, Mohammad Arjomand, Majid Jalili, Hamid Sarbazi-Azad

Publication date

2016/10/2

Pages

336-343

Publisher

IEEE

Conference

34th International Conference on Computer Design (ICCD)

Efficient Processor Allocation in a Reconfigurable CMP Architecture for Dark Silicon Era

The continuance of Moore's law and failure of Dennard scaling force future chip multiprocessors (CMPs) to have considerable dark regions. How to use up available dark resources is an important concern for computer architects. In harmony with these changes, we must revise processor allocation schemes that severely affect the performance of a parallel on-chip system. A suitable allocation algorithm should reduce runtime and increase the power efficiency with proper thermal distribution to avoid hotspots. With this motivation, this paper proposes a power-efficient and high performance general purpose infrastructure for which a Dark Silicon Aware Processor Allocation (DSAPA) scheme is proposed which targets future many-core systems. To obtain high performance, we suggest a tunable-clustered mesh with the capability of sharing NoC resources in each cluster. We also employ a buffer-level power gating technique is used to improve power efficiency. Evaluation results reveal that the maximum achieved performance and power consumption improvements are 38.7% and 29.4% for multi-threaded workloads over the equivalent conventional design.

Authors

Mohaddeseh Hoveida, Fatemeh Aghaaliakbari, Majid Jalili, Ramin Bashizade, Mohammad Arjomand, Hamid Sarbazi-Azad

Publication date

2018/1/1

Pages

35-81

Publisher

Elsevier

Book

Advances in Computers

Volume

110

Revisiting Processor Allocation and Application Mapping in Future CMPs in Dark Silicon Era

With technology advances and the emergence of new fabrication and VLSI technologies, current and future chip multiprocessors (CMPs) are expected to have tens to hundreds of processing elements and Gigabytes of on-chip caches, which are connected by a high bandwidth network-on-chip (NoC). Unfortunately, due to limited power budget of a computing system, specially for its processing element(s), it is impossible to keep all cores, caches, and network elements working at highest voltage level—that would resulted in dark silicon computing era, where by employing system-level or architecture-level techniques, one can keep a great portion of a CMP elements OFF (or in dim mode) to meet the power budget of the system while the system still delivers a high-performance computation.

In this work, we first describe the importance of NoC design and management in delivering high-performance computation in a dark silicon-based CMP platform—we propose a novel highly scalable NoC architecture and its required management policies in order to support turning some routers/links/buffers OFF while guaranteeing that it delivers the bandwidth needs of the running application(s). Then, by employing the introduced NoC architecture, we propose to revisit the processor allocation strategy and application-to-core mapping algorithm in order to make maximum use of the provided NoC bandwidth and capability while meeting the power and performance goals of the hardware platform and application, respectively. Our extensive simulation results of a 64-core CMP model show that the proposed algorithms are able to improve the system performance by 10%–50% when running multithreaded applications.