Parallel post-processing

Of all of the post-processing tools (partvtk, computeforces, isosurface, etc.) which ones and/or how many of them can be executed in parallel?


For example, can we run partvtk in parallel so that it will process each of the part*bi4 files using one CPU core each (so if you have a system with n cores, it will be able to simultaneously process n files)?


I think that the Isosurface tool can run in parallel, but would it be better say to run more files with fewer cores or better to run with more cores, but just one file at a time?


My current run has about 25 million particles and it's only about 25% done and the total volume of data that the simulation has generated right now is about 500 GB.


I am checking the run by post-processing the data that I have now, except that with partvtk running only on a single CPU core, it takes my system almost 10 hours to extract the bound part, so I am interesting in what are some of the ways or strategies to speed up post-processing.


(And with the total volume of data expected to quadruple, I don't have a solid state hard drive large enough nor fast enough to be able to speed up post-processing in that regard, but I know that makes a huge difference.)


Thank you.

Comments

  • If you read the help of any post-processing you can see that there is an option to choose the number of CPU cores to execute that code.

    However by default the binaries will use the maximum number of cores of your machine.

    Regards

  • So if partvtk doesn't use more than one core, despite the fact that my system has 8 cores, does that mean that it can't process multiple part&.bi4 files at the same time?


    The help file say that parallelization is used for interpolation.


    "-threads:<int>  Indicates the number of threads for parallel execution of the interpolation, it takes the number of cores of the device by default (or uses zero value)."


    Since partvtk4&win64.exe isn't interpolating, does that mean that it can't use multiple cores/multiple processors?


    This is where I am confused.


    (For example, I have almost one thousand part&.bi4 files. My computer has 8 cores. So my question is can it or is it possible to start multiple instances of partvtk4&win64.exe so that it will be able to convert those files faster automatically, or will I have to start the multiple instances of partvtk4&win64.exe manually, using separate command prompt windows to do the same thing?)


    (P.S. Due to the formatting syntax of the forum software, I had to change partvtk4_win64.exe to partvtk4&win64.exe and also same thing with the asterisk wildcard to denote that I have lots of part&.bi4 files.)


    Thank you.

  • They will process the output files one after the other, but the tasks will be parallelised.

    For example PartVTK can be also used to compute "Pressure" or "Ace" or "Vorticity" of the particles so that is done in parallel with the cores of your CPU. Some more interpolation is carried out in MeasureTool for example.

    Regards

  • But if I want to run for example:


    partvtk4_win64.exe -savevtk Boundpart.vtk -onlytype:-all,bound


    what I am seeing is that it will only use 1 core (out of 8 available) to execute that task.


    Maybe this might provide some background context in regards to my question (about whether or not this task can be parallelised or sped-up in any way).


    Thank you.

  • Perhaps @jmdalonso can explain what tasks are parallelised in PartVTK and the other post-processing tools

    Regards

  • From my experience, it would appear that the IsoSurface tool is parallelised.


    But for computeforces and partvtk, it looks like that they're not. So, I'm just running multiple single-threaded sessions for those (up to n number of cores on my computer) and as long as I am using a fast enough storage device (e.g. NVMe SSD) that can serve up the data as fast as those tools can process the data, it works.

  • edited December 2020

    And, seconding this post after some time, it would be handy that the user gets control of the number of threads dedicated to a post-processing job. (Linux users can recognise this in the flag -jof Make.)

    For example: if I have 16 threads available, I might want to use only 8 of them, so that the application does not monopolize the machine and I have room to manage other tasks.

  • All post-processing tools are parallelised with OpenMP

    -threads:<int>  Indicates the number of threads for parallel execution of  the interpolation, it takes the number of cores of the device by default

    Regards

  • @Alex Thanks. I had missed it, or it slipped out of my working memory, clearly to the point of thinking it was not there. I will use it.

Sign In or Register to comment.