extremely slow with cudamemcpy after my code modification
Hi, forums
i have make some modifications with Interactionforces. After that, i feel it runs very slow. Then i use nvprof to check. result as below (just 5 step).
we can see cudamemcpy cost a lot, which is weird. i try to debug it, then find after Interactionforce in each loop, the first time executing cudamemcpy (ReduMaxFloat, copy the max value of ViscDt, DevicetoHost, just one float), would spend too much, after that it would be normal.
And i also check the original code ver.4.4.09 (twophase dambreak one step)
though it has the same problem, but it is acceptable.
I think i just change the compute process, nothing to do with the others.
Thanks a lot, for any advise
Comments
Sorry, I have no experience with CUDA as such and probably will not be able to provide any useful feedback :-)
I just wonder why you have to copy the data if you already know where it is - but again I know next to nothing about this, so I will not speculate
Kind regards