Commit Graph

131 Commits

Author SHA1 Message Date
Martin Bauernfeind
fe56c50efd Added one more output line to output the force kernel throughput 2022-07-20 22:43:57 +02:00
Martin Bauernfeind
7a61cbbabf Instrumented the reneighbor function in order to obtain runtimes of its compontents 2022-07-19 20:38:11 +02:00
Martin Bauernfeind
176de0525b Instrumented the reneighbor function with timers (via getTimestamp()) to measure the runtime of its different components/methods 2022-07-17 18:34:17 +02:00
Martin Bauernfeind
7bad7e84b6 Fixed compiler errors 2022-07-13 14:52:37 +02:00
Martin Bauernfeind
fb304f240b Small changes in buildNeighbor to initialize the bincount list and other arrays only once 2022-07-13 14:42:34 +02:00
Martin Bauernfeind
5a6d1851ed Ported updateAtomsPbc to cuda and changed the code to use the cuda version from now on 2022-07-13 14:07:19 +02:00
Martin Bauernfeind
f61f59ba3f Fixed a compiler error and removed an unnecessary memcpy (from device to host) - performance seems to have crossed the 300M updates/second mark for the A100 2022-07-11 00:55:42 +02:00
Martin Bauernfeind
d1c2249b55 Added code to sort the contents of all bins to make it comparable to the CPU version 2022-07-11 00:24:48 +02:00
Martin Bauernfeind
c9db6e45fa Fixed compiler errors 2022-07-10 21:13:37 +02:00
Martin Bauernfeind
0967e8f671 The program now does the binning on the GPU via the binatoms_cuda method 2022-07-10 18:05:06 +02:00
Martin Bauernfeind
fa409c016c Added a struct to contain binning information such as the pointer to bincount and bins - not used yet 2022-07-08 13:52:45 +02:00
Martin Bauernfeind
b65199308d Ported the binatoms method to cuda - not used in the program yet 2022-07-06 01:09:11 +02:00
Martin Bauernfeind
71798f5ec5 🐛 Fixed aforementioned correctness issue by deleting a superflous cudaMemcpy in computeForce() that was overwriting correct data with incorrect data 2022-07-05 00:54:11 +02:00
Martin Bauernfeind
4f0403d3ea Fixed an correctness issue by conservatively copying over data from and to the GPU 2022-07-05 00:33:12 +02:00
Martin Bauernfeind
fa86e44f90 Fixed wrong number of threadblock being launched 2022-07-04 19:36:09 +02:00
Martin Bauernfeind
7e8fd96fa4 Fixed some compiler errors - the simulation seems to be off regarding how many ghost atoms are used -> some bugfixing might be needed 2022-07-03 21:14:33 +02:00
Martin Bauernfeind
463de5b1ed Ported the updatePbc method to cuda 2022-07-03 19:53:33 +02:00
Martin Bauernfeind
4a32a62a98 🐛 Fixed some bugs - neighborhood computation now seems to be quite fast 2022-06-26 20:19:59 +02:00
Martin Bauernfeind
16e8b76012 Added debug output to find memory leak 2022-06-26 19:43:10 +02:00
Martin Bauernfeind
60ed524dd8 Fixed various compiler errors - now there's probably a memory leak remaining 2022-06-26 18:37:09 +02:00
Martin Bauernfeind
45f83c7607 Fixed some struct declaration mistakes 2022-06-26 17:52:09 +02:00
Martin Bauernfeind
c49278cb21 First crude attempt at parallelizing neighborhood computation (only the part after binning the atoms is parallelized with cuda) 2022-06-26 16:25:59 +02:00
Martin Bauernfeind
757d4329f3 Added a rough sketch for the next steps of porting neighborhood computation to cuda 2022-06-23 23:58:15 +02:00
Martin Bauernfeind
67f9c769ef Fixing errors - hopefully it works this time 2022-06-23 22:25:55 +02:00
Martin Bauernfeind
b5b4d23c0c 🐛 further refactoring fixing 2022-06-23 19:46:29 +02:00
Martin Bauernfeind
fea1e41daa 🐛 further refactoring step fixing 2022-06-23 19:43:36 +02:00
Martin Bauernfeind
f1998b7acc 🐛 further refactor step fixing 2022-06-23 19:39:36 +02:00
Martin Bauernfeind
2fe3cd80a0 🐛 further refactor step fixing 2022-06-23 19:36:59 +02:00
Martin Bauernfeind
f4313f64e5 ♻️ further refactoring step fixing 2022-06-23 19:34:16 +02:00
Martin Bauernfeind
7f068a6959 ♻️ Fixing refactoring step 2022-06-23 19:32:09 +02:00
Martin Bauernfeind
62cfc22856 ♻️ Refactoring: pulled definition of the GPU atom and neighbor representation from force.cu and put it into main 2022-06-23 18:54:56 +02:00
Maximilian Gaul
696e6da01d Implement Neighbour list AoS memory layout + performance measurement 2022-01-31 20:27:59 +01:00
Maximilian Gaul
b2a6574426 Remove unnecessary atom force backcopy in computeForce 2022-01-24 18:09:27 +01:00
Maximilian Gaul
c4080e866e Make integrate kernels aware of neighbour list update 2022-01-24 18:04:50 +01:00
Maximilian Gaul
dc4d5f1a9c Porting atom velocity memory layout to AoS, porting velocity integration to CUDA, adding measurements + logbook update 2022-01-01 18:18:12 +01:00
Maximilian Gaul
50007216ed Implemented atom force AoS memory layout, added performance measurements + logbook Update 2022-01-01 16:09:21 +01:00
Maximilian Gaul
72e4599acc Copy neighbour lists only when reneighbouring happens, added measurements + logbook update 2022-01-01 12:56:42 +01:00
Maximilian Gaul
8fa03733e9 Copy parameters & cutforces threshold only once at the start + measurements 2021-12-28 16:48:26 +01:00
Maximilian Gaul
bf1ae3d013 Removed debug prints, only zero atom forces and not copy them, added measurements 2021-12-28 16:32:54 +01:00
Maximilian Gaul
8009b54113 Trying to debug segfault if cudaMemcpy is limited to neighbour list update 2021-12-25 15:36:08 +01:00
Maximilian Gaul
0ea0587442 Only malloc once at the beginning plus measurement csv 2021-12-25 13:52:33 +01:00
Maximilian Gaul
134e3f4b78 Also pinnend neighbor-struct memory, added additional performance measurements, added nvprof result to logbook 2021-12-18 15:58:56 +01:00
Maximilian Gaul
c2bfa3ca3f Add scripts for perf measurement, made atom-memory allocation pinnend using 'cudaMallocHost', added measurements for atom pinnend memory 2021-12-18 13:02:04 +01:00
Maximilian Gaul
2a099da5b7 Started cuda profiling, added first result to logbook 2021-12-03 08:13:43 +01:00
Maximilian Gaul
7691b23d67 Measure memory transfer of CPU to GPU, add explanation how to distribute calculation among multiple GPUs 2021-12-01 17:16:32 +01:00
Maximilian Gaul
da90466f98 Added first performance measurements with threads per block from 1 to 32 2021-11-25 08:09:20 +01:00
Maximilian Gaul
8f723c1299 Added command line description of MD-Bench, added memory transfer rate from CPU to GPU to force.cu 2021-11-23 15:55:23 +01:00
Maximilian Gaul
0586ef150a Fix num of threads instead of num of blocks, add logbook template 2021-11-15 19:39:09 +01:00
Maximilian Gaul
2e5d973f7d Rough rewrite to execute outer loop of force calculation in parallel, not inner loop 2021-11-14 10:02:23 +01:00
Maximilian Gaul
e2fd1a0476 Fixed bug, results are now equal to master branch (but still slow) 2021-11-11 21:00:30 +01:00