Why is the exact same simulation running so much slower on A100 GPU than my local 2 GB MX250?

From this document , it is mentioned that increasing the number of CUDA cores as well as the clock rate should increase the speed of the simulation 

The parameter I am using to measure the speed is the time (in s) taken to simulate 1 (one) second of simulation time. The issue I am facing is that A100 takes "significantly" longer to simulate compared to my local desktop system which has a GeForce MX250 - 2GB Memory. 

To compare, the time taken to simulate one second in real time is ~1150 s (19 mins) in my local system while it is currently taking ~7657 s (127 minutes) on the A100 GPU on HPRC

I am running a basic 2D wave generation simulation along with Chrono and Mooring. Total number of particles is around ~62k and this is the GPU information pulled from the run.out file in both cases

What would be the issue here? Why is the GPU taking so much more times when it has almost 20 times as many cores?

I have attached my Def file and both the run files of the partially completed jobs.

Also, this seems to be only happening when I run Chrono else it is fine. So, if you could help me figure it out, it would be great.

@Asalih3d @Alex

Comments

  • The simplest possible issue would be a difference in CPU, since Chrono utilizes CPU only. If Chrono is the "heavy part" of your simulation and for some reason your laptop CPU is stronger than your other CPU in the A100 machine.

    If you can rule this out then I am not sure why it would be so much slower.

    Kind regards

  • Hi Asalih3d,

    Unfortunately, I believe the CPU is the likely cause. The CPU has a spped of 3 Ghz with no possibility of overclocking. Just wanted to know if I am indeed stuck with an unsolvable bottleneck or is there any way to get around it?

  • I see. Perhaps if you are able to use a version of DualSPHysics using multi-CPU chrono it might help? I have read that the newest version (on Github) only has single CPU available, due to issues in the algorithm etc. @Alex might remember the post I am talking about.

    Else the only solution would be play with parameters in Chrono to reduce complexity or try to see if you can get around using it all. Unfortunately I do not think any GPU Chrono exists.

    Kind regards

Sign In or Register to comment.