Improving Performance on Replica-Exchange Molecular Dynamics Simulations by Optimizing GPU Core Utilization

Taisuke Boku; Masatake Sugita; Ryohei Kobayashi; Shinnosuke Furuya; Takuya Fujie; Masahito Ohue; Yutaka Akiyama

doi:10.1145/3673038.3673097

論文・著書情報

タイトル

和文:
英文:	Improving Performance on Replica-Exchange Molecular Dynamics Simulations by Optimizing GPU Core Utilization

著者

和文:	Taisuke Boku, 杉田昌岳, Ryohei Kobayashi, Shinnosuke Furuya, 藤江拓哉, 大上雅史, 秋山泰.
英文:	Taisuke Boku, Masatake Sugita, Ryohei Kobayashi, Shinnosuke Furuya, Takuya Fujie, Masahito Ohue, Yutaka Akiyama.

言語

English

掲載誌/書名

和文:
英文:	Proceedings of the 53rd International Conference on Parallel Processing (ICPP2024)

巻, 号, ページ

Page 1082-1091

出版年月

2024年8月12日

出版者

和文:
英文:	Association for Computing Machinery

会議名称

和文:
英文:	53rd International Conference on Parallel Processing (ICPP2024)

開催地

和文:
英文:	Gotland

公式リンク

https://dl.acm.org/doi/10.1145/3673038.3673097

DOI

https://doi.org/10.1145/3673038.3673097

アブストラクト

While GPUs are the main players of the accelerating devices on high performance computing systems, their performance depends on how to utilize a numerous number of cores in parallel on each device. Typically, a loop structure with a number of iterations is assigned to a device to utilize their cores to map calculations in iterations so that there must be enough count of iterations to fill the thousands of GPU cores in the high-end GPUs. In the advanced GPU represented by NVIDIA H100, several techniques, such as Multi-Process Service (MPS) or Multi-Instance GPU (MIG), which divides GPU cores to be mapped to the multiple user processes, are provided to enhance the core utilization even in a case with a small degree of parallelism. We apply MPS to a practical Molecular Dynamics (MD) simulation with AMBER software for improving the efficiency of GPU core utilization to save the computation resources. The critical issue here is to analyze the core utilization and overhead when running multiple processes on a GPU device as well as the multi-GPU and multi-node parallel execution for overall performance improvement. In this paper, we introduce a method to apply MPS for AMBER to simulate the membrane permeation process of a drug candidate peptide by a two-dimensional replica-exchange method on an advanced supercomputer with NVIDIA H100. We applied several optimizations on parameter settings with NVIDIA H100 and V100 GPUs investigating their performance behavior. Finally, we found that the GPU core utilization improves up to twice compared with a simple process assignment method to maximize the GPU utilization efficiency.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報