Martin Bauernfeind
|
b32254b03f
|
Changed data types in currently unused sort method to also work with single precision floating numbers
|
2022-07-22 13:55:27 +02:00 |
|
Martin Bauernfeind
|
4dac820784
|
Added newline in output to improve formatting
|
2022-07-20 23:04:22 +02:00 |
|
Martin Bauernfeind
|
fe56c50efd
|
Added one more output line to output the force kernel throughput
|
2022-07-20 22:43:57 +02:00 |
|
Martin Bauernfeind
|
7a61cbbabf
|
Instrumented the reneighbor function in order to obtain runtimes of its compontents
|
2022-07-19 20:38:11 +02:00 |
|
Martin Bauernfeind
|
176de0525b
|
Instrumented the reneighbor function with timers (via getTimestamp()) to measure the runtime of its different components/methods
|
2022-07-17 18:34:17 +02:00 |
|
Martin Bauernfeind
|
7bad7e84b6
|
Fixed compiler errors
|
2022-07-13 14:52:37 +02:00 |
|
Martin Bauernfeind
|
fb304f240b
|
Small changes in buildNeighbor to initialize the bincount list and other arrays only once
|
2022-07-13 14:42:34 +02:00 |
|
Martin Bauernfeind
|
5a6d1851ed
|
Ported updateAtomsPbc to cuda and changed the code to use the cuda version from now on
|
2022-07-13 14:07:19 +02:00 |
|
Martin Bauernfeind
|
f61f59ba3f
|
Fixed a compiler error and removed an unnecessary memcpy (from device to host) - performance seems to have crossed the 300M updates/second mark for the A100
|
2022-07-11 00:55:42 +02:00 |
|
Martin Bauernfeind
|
d1c2249b55
|
Added code to sort the contents of all bins to make it comparable to the CPU version
|
2022-07-11 00:24:48 +02:00 |
|
Martin Bauernfeind
|
c9db6e45fa
|
Fixed compiler errors
|
2022-07-10 21:13:37 +02:00 |
|
Martin Bauernfeind
|
0967e8f671
|
The program now does the binning on the GPU via the binatoms_cuda method
|
2022-07-10 18:05:06 +02:00 |
|
Martin Bauernfeind
|
fa409c016c
|
Added a struct to contain binning information such as the pointer to bincount and bins - not used yet
|
2022-07-08 13:52:45 +02:00 |
|
Martin Bauernfeind
|
b65199308d
|
Ported the binatoms method to cuda - not used in the program yet
|
2022-07-06 01:09:11 +02:00 |
|
Martin Bauernfeind
|
71798f5ec5
|
🐛 Fixed aforementioned correctness issue by deleting a superflous cudaMemcpy in computeForce() that was overwriting correct data with incorrect data
|
2022-07-05 00:54:11 +02:00 |
|
Martin Bauernfeind
|
4f0403d3ea
|
Fixed an correctness issue by conservatively copying over data from and to the GPU
|
2022-07-05 00:33:12 +02:00 |
|
Martin Bauernfeind
|
fa86e44f90
|
Fixed wrong number of threadblock being launched
|
2022-07-04 19:36:09 +02:00 |
|
Martin Bauernfeind
|
7e8fd96fa4
|
Fixed some compiler errors - the simulation seems to be off regarding how many ghost atoms are used -> some bugfixing might be needed
|
2022-07-03 21:14:33 +02:00 |
|
Martin Bauernfeind
|
463de5b1ed
|
Ported the updatePbc method to cuda
|
2022-07-03 19:53:33 +02:00 |
|
Martin Bauernfeind
|
4a32a62a98
|
🐛 Fixed some bugs - ✨ neighborhood computation now seems to be quite fast
|
2022-06-26 20:19:59 +02:00 |
|
Martin Bauernfeind
|
16e8b76012
|
Added debug output to find memory leak
|
2022-06-26 19:43:10 +02:00 |
|
Martin Bauernfeind
|
60ed524dd8
|
Fixed various compiler errors - now there's probably a memory leak remaining
|
2022-06-26 18:37:09 +02:00 |
|
Martin Bauernfeind
|
45f83c7607
|
Fixed some struct declaration mistakes
|
2022-06-26 17:52:09 +02:00 |
|
Martin Bauernfeind
|
c49278cb21
|
First crude attempt at parallelizing neighborhood computation (only the part after binning the atoms is parallelized with cuda)
|
2022-06-26 16:25:59 +02:00 |
|
Martin Bauernfeind
|
757d4329f3
|
Added a rough sketch for the next steps of porting neighborhood computation to cuda
|
2022-06-23 23:58:15 +02:00 |
|
Martin Bauernfeind
|
67f9c769ef
|
Fixing errors - hopefully it works this time
|
2022-06-23 22:25:55 +02:00 |
|
Martin Bauernfeind
|
b5b4d23c0c
|
🐛 further refactoring fixing
|
2022-06-23 19:46:29 +02:00 |
|
Martin Bauernfeind
|
fea1e41daa
|
🐛 further refactoring step fixing
|
2022-06-23 19:43:36 +02:00 |
|
Martin Bauernfeind
|
f1998b7acc
|
🐛 further refactor step fixing
|
2022-06-23 19:39:36 +02:00 |
|
Martin Bauernfeind
|
2fe3cd80a0
|
🐛 further refactor step fixing
|
2022-06-23 19:36:59 +02:00 |
|
Martin Bauernfeind
|
f4313f64e5
|
♻️ further refactoring step fixing
|
2022-06-23 19:34:16 +02:00 |
|
Martin Bauernfeind
|
7f068a6959
|
♻️ Fixing refactoring step
|
2022-06-23 19:32:09 +02:00 |
|
Martin Bauernfeind
|
62cfc22856
|
♻️ Refactoring: pulled definition of the GPU atom and neighbor representation from force.cu and put it into main
|
2022-06-23 18:54:56 +02:00 |
|
Maximilian Gaul
|
b024adaf5b
|
Re-measure for 2000 time steps
|
2022-02-05 14:13:36 +01:00 |
|
Maximilian Gaul
|
696e6da01d
|
Implement Neighbour list AoS memory layout + performance measurement
|
2022-01-31 20:27:59 +01:00 |
|
Maximilian Gaul
|
b2a6574426
|
Remove unnecessary atom force backcopy in computeForce
|
2022-01-24 18:09:27 +01:00 |
|
Maximilian Gaul
|
c4080e866e
|
Make integrate kernels aware of neighbour list update
|
2022-01-24 18:04:50 +01:00 |
|
Maximilian Gaul
|
7b592b5fc7
|
Moved presentation resources to second presentation
|
2022-01-05 12:48:37 +01:00 |
|
Maximilian Gaul
|
4690542db5
|
Added CPU metrics {Cache, FLOPS, L2, L3}, restructured resource folders
|
2022-01-05 12:31:47 +01:00 |
|
Maximilian Gaul
|
8c131a7699
|
Reminder for likwid perf measurements
|
2022-01-04 13:51:53 +01:00 |
|
Maximilian Gaul
|
dc4d5f1a9c
|
Porting atom velocity memory layout to AoS, porting velocity integration to CUDA, adding measurements + logbook update
|
2022-01-01 18:18:12 +01:00 |
|
Maximilian Gaul
|
50007216ed
|
Implemented atom force AoS memory layout, added performance measurements + logbook Update
|
2022-01-01 16:09:21 +01:00 |
|
Maximilian Gaul
|
72e4599acc
|
Copy neighbour lists only when reneighbouring happens, added measurements + logbook update
|
2022-01-01 12:56:42 +01:00 |
|
Maximilian Gaul
|
8fa03733e9
|
Copy parameters & cutforces threshold only once at the start + measurements
|
2021-12-28 16:48:26 +01:00 |
|
Maximilian Gaul
|
bf1ae3d013
|
Removed debug prints, only zero atom forces and not copy them, added measurements
|
2021-12-28 16:32:54 +01:00 |
|
Maximilian Gaul
|
8009b54113
|
Trying to debug segfault if cudaMemcpy is limited to neighbour list update
|
2021-12-25 15:36:08 +01:00 |
|
Maximilian Gaul
|
0ea0587442
|
Only malloc once at the beginning plus measurement csv
|
2021-12-25 13:52:33 +01:00 |
|
Maximilian Gaul
|
134e3f4b78
|
Also pinnend neighbor-struct memory, added additional performance measurements, added nvprof result to logbook
|
2021-12-18 15:58:56 +01:00 |
|
Maximilian Gaul
|
c2bfa3ca3f
|
Add scripts for perf measurement, made atom-memory allocation pinnend using 'cudaMallocHost', added measurements for atom pinnend memory
|
2021-12-18 13:02:04 +01:00 |
|
Maximilian Gaul
|
2a099da5b7
|
Started cuda profiling, added first result to logbook
|
2021-12-03 08:13:43 +01:00 |
|