UNILOGIC

Author:

Ioannou Aggelos D.1,Georgopoulos Konstantinos2,Malakonakis Pavlos3,Pnevmatikatos Dionisios N.4,Papaefstathiou Vassilis D.5,Papaefstathiou Ioannis6,Mavroidis Iakovos7

Affiliation:

1. School of ECE, Technical University of Crete, Greece, Telecommunication Systems Institute, Greece and Foundation for Research 8 Technology-Hellas (FORTH), Heraklion, Crete, Greece

2. Telecommunication Systems Institute, Chania, Greece

3. Telecommunication Systems Institute, Greece and Synelixis Solutions, Chalkida, Greece

4. Telecommunication Systems Institute, Greece and School of ECE, National Technical University of Athens, Athens, Greece

5. Chalmers University of Technology, Gothenburg, Sweden

6. School of ECE, Aristotle University of Thessaloniki, Thessaloniki, Greece

7. Telecommunication Systems Institute, Greece and Foundation for Research 8 Technology-Hellas (FORTH), Heraklion, Crete, Greece

Abstract

One of the main characteristics of High-performance Computing (HPC) applications is that they become increasingly performance and power demanding, pushing HPC systems to their limits. Existing HPC systems have not yet reached exascale performance mainly due to power limitations. Extrapolating from today’s top HPC systems, about 100–200 MWatts would be required to sustain an exaflop-level of performance. A promising solution for tackling power limitations is the deployment of energy-efficient reconfigurable resources (in the form of Field-programmable Gate Arrays (FPGAs)) tightly integrated with conventional CPUs. However, current FPGA tools and programming environments are optimized for accelerating a single application or even task on a single FPGA device. In this work, we present UNILOGIC (Unified Logic), a novel HPC-tailored parallel architecture that efficiently incorporates FPGAs. UNILOGIC adopts the Partitioned Global Address Space (PGAS) model and extends it to include hardware accelerators, i.e., tasks implemented on the reconfigurable resources. The main advantages of UNILOGIC are that (i) the hardware accelerators can be accessed directly by any processor in the system, and (ii) the hardware accelerators can access any memory location in the system. In this way, the proposed architecture offers a unified environment where all the reconfigurable resources can be seamlessly used by any processor/operating system. The UNILOGIC architecture also provides hardware virtualization of the reconfigurable logic so that the hardware accelerators can be shared among multiple applications or tasks. The FPGA layer of the architecture is implemented by splitting its reconfigurable resources into (i) a static partition, which provides the PGAS-related communication infrastructure, and (ii) fixed-size and dynamically reconfigurable slots that can be programmed and accessed independently or combined together to support both fine and coarse grain reconfiguration. 1 Finally, the UNILOGIC architecture has been evaluated on a custom prototype that consists of two 1U chassis, each of which includes eight interconnected daughter boards, called Quad-FPGA Daughter Boards (QFDBs); each QFDB supports four tightly coupled Xilinx Zynq Ultrascale+ MPSoCs as well as 64 Gigabytes of DDR4 memory, and thus, the prototype features a total of 64 Zynq MPSoCs and 1 Terabyte of memory. We tuned and evaluated the UNILOGIC prototype using both low-level (baremetal) performance tests, as well as two popular real-world HPC applications, one compute-intensive and one data-intensive. Our evaluation shows that UNILOGIC offers impressive performance that ranges from being 2.5 to 400 times faster and 46 to 300 times more energy efficient compared to conventional parallel systems utilizing only high-end CPUs, while it also outperforms GPUs by a factor ranging from 3 to 6 times in terms of time to solution, and from 10 to 20 times in terms of energy to solution.

Funder

European Commission under the H2020 Programme and the ECOSCALE project

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference77 articles.

1. AXI 2017. AXI Reference Guide. Retrieved from www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf. AXI 2017. AXI Reference Guide. Retrieved from www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf.

2. BittWare. 2019. BittWare FPGA Acceleration. Retrieved from https://www.bittware.com/. BittWare. 2019. BittWare FPGA Acceleration. Retrieved from https://www.bittware.com/.

3. Reconfigurable future for HPC

4. B. Brech J. Rubio and M. Hollinger. 2015. Data Engine for NoSQL-IBM Power Systems Edition. White Paper. B. Brech J. Rubio and M. Hollinger. 2015. Data Engine for NoSQL-IBM Power Systems Edition. White Paper.

5. HtComp: bringing reconfigurable hardware to future high-performance applications

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3