Remove from My Interests
We'll present our experience with using OpenACC to port GTC-P, a real-world plasma turbulence simulation, on NVIDIA P100 GPU and SW26010, the Chinese home-grown many-core processor. Meanwhile, we developed the GTC-P code with the native approach on Sunway TaihuLight supercomputer so that we can analyze the performance gap between OpenACC and the native approach on P100 GPU and SW26010. The experiment results show that the performance gap between OpenACC and CUDA on P100 GPU is less than 10% by PGI compiler. However, the gap on SW26010 is more than 50% since the register level communication only supported by native approach can avoid low-efficiency main memory access. Our case study demonstrates that OpenACC can deliver impressively portable performance on P100 GPU, but the lack of software cache via RLC supported by the OpenACC compiler on SW26010 results in large performance gap between OpenACC and the native approach.
Do Not Sell My Personal Information