Number of GPU Cores used

edited February 2016 in Old versions
Hi

maybe a very stupid question, but how can i change the number of GPU cores used within DualSPH?
I use a GeForce 960 GTX with 1024 Cuda Cores referred to the GPU driver. DualSPH uses 8 cores. Running casedambreak without any changes, it takes me round about 1700 sec using CPU only and 1600 sec using the GPU version. No big advantage... so i think i make something wrong.

Comments

  • in the file Run.out you can check if you are using GPU or CPU.
    DSPH will use automatically the available number of cores of your GPU or CPU.

    Executing simulation with low number of particles, the difference is not big. Only simulating large simulation the speedup increases drastically usin GPU

    Regards
  • edited February 2016
    Thank you for the fast answer!
    Run.out says i'm using 8 Cores. Is there a diffence, if i compile DSPH by my own or does the precompiled version use the available number of cores, too?
  • There is no difference.

    In Run.out you can also read which device you are using as execution device.
    Please paste here the first part of Run.out, before Part_0001 appears and we will explain you the information that appear there

    Regards
  • DualSPHysics v3 (17-12-2013)
    =============================
    [Select CUDA Device]
    Device 0: "GeForce GTX 960"
    Compute capbility: 5.2
    Multiprocessors: 8 (-8 cores)
    Memory global: 2047 MB
    Clock rate: 1.18 GHz
    Run time limit on kernels: Yes
    ECC support enabled: No

    [GPU Hardware]
    Device default: 0 "GeForce GTX 960"
    Compute capbility: 5.2
    Memory global: 2047 MB
    Memory shared: 49152 Bytes
    [Initialising JSphGpuSingle v3.00 27-02-2016 08:21:22]
    **Case configuration is loaded
    Loading initial state of particles...
    Loaded particles: 600056 (600056+0)
    MapPos(border)=(-2.000031,0.007969,0.003969)-(1.996031,2.006031,0.994031)
    MapPos=(-2.000031,0.007969,0.003969)-(1.996031,2.006031,1.013832)
    **Initial state of particles is loaded
    **3D-Simulation parameters:
    CaseName="CaseHani"
    RunName="CaseHani"
    SvTimers=True
    StepAlgorithm="Verlet"
    VerletSteps=40
    Kernel="Cubic"
    Viscosity="Artificial"
    Visco=0.100000
    ShepardSteps=0
    DeltaSph="DBC"
    DeltaSphValue=0.100000
    CaseNp=600056
    CaseNbound=66252
    CaseNfixed=66252
    CaseNmoving=0
    CaseNfloat=0
    PeriodicActive=0
    Dx=0.018000
    H=0.031177
    CteB=433882.281250
    Gamma=7.000000
    Rhop0=1000.000000
    Eps=0.500000
    Cs0=55.110580
    CFLnumber=0.200000
    DtIni=0.000100
    DtMin=0.000010
    MassFluid=0.005832
    MassBound=0.005832
    CubicCte.a1=0.318310
    CubicCte.aa=336912.812500
    CubicCte.a24=2625.975586
    CubicCte.c1=-1010738.437500
    CubicCte.c2=-252684.609375
    CubicCte.d1=758053.812500
    CubicCte.od_wdeltap=0.000171
    TimeMax=5.000000
    TimePart=0.010000
    Gravity=(0.000000,0.000000,-9.810000)
    PartOutMax=533804
    RhopOut=False
    CellOrder="XYZ"
    **Requested gpu memory for 600056 particles: 66.4 MB.
    **CellDiv: Requested GPU memory for 600056 particles: 4.6 MB.
    **CellDiv: Requested gpu memory for 276705 cells (CellMode=H): 4.2 MB.
    CellMode="H"
    Hdiv=2
    MapCells=(129,65,33)

    PtxasFile="../../EXECS/DualSPHysics_linux64_ptxasinfo"
    Using code for compute capability 2.0 on hardware 3.0
    BsForcesBound=256 (35 regs)
    BsForcesFluid=128 (56 regs)

    RunMode="Single-Gpu, HostName:pc-stephan"
    Allocated memory in CPU: 20465904 (19.52 MB)
    Allocated memory in GPU: 78891776 (75.24 MB)
    Part_0000 600056 particles successfully stored
  • The GPU you have is not agood one for executions. Note that your GPU has 8 cores while common ones such as Teslas has thousands

    Regards
  • edited February 2016
    Well... Nvidia 960 GTX. Just a home computer.
  • I have checked the specifications again of your GPU and has 1024 cores (no 8...)
    So it is not a bad one...

    Can you please run one of the testcases increasing number of particles (dcreasing dp) on CPU and GPU) to compare results. Maybe your GPU becomes very efficient starting from 10^4 particles.... not sure

    Regards
  • Hi Stephan, I have the same issue with my 970, is an issue of NVIDIA checking the cores internally, in my case i get

    [Select CUDA Device]
    Device 0: "GeForce GTX 970"
    Compute capability: 5.2
    Multiprocessors: 13 (-13 cores)
    Memory global: 4096 MB
    Clock rate: 1.25 GHz
    Run time limit on kernels: Yes
    ECC support enabled: No


    this happens because both cards are "downgrades" of the 980

    my cards runs perfectly, uses the whole memory and it is quite fast.

    So you don´t have to worry about this issue.

    Regards

    Anxo

    DualSPHysics Developer Team
Sign In or Register to comment.