Commit Graph

  • 0e742766b7 Add working version of Simd4xn kernel with half neighbor lists Rafael Ravedutti 2022-03-23 15:54:18 +0100
  • e72323ab6a Fix Simd2xnn Kernel with half neighbor lists and add AVX512 intrinsics with double Rafael Ravedutti 2022-03-23 15:21:07 +0100
  • 94521f03b3 Fix reference version with half neighbor lists Rafael Ravedutti 2022-03-23 14:31:47 +0100
  • 8709bc2a06 Add first version for half neighbor lists in GROMACS variant Rafael Ravedutti 2022-03-22 23:47:05 +0100
  • 2a555a7deb Add simd reduction pragma to vectorize innermost loop on half-neighbor variant Rafael Ravedutti 2022-03-21 17:02:09 +0100
  • 719330807b Change data layout for force arrays according to position Rafael Ravedutti 2022-03-18 01:40:51 +0100
  • e7737e9151 Refactor half neighbor lists code Rafael Ravedutti 2022-03-18 01:28:11 +0100
  • 5df544637f Fix force calculation time in LAMMPS variant Rafael Ravedutti 2022-03-17 02:53:58 +0100
  • 887f41871c Add parameter reading for LAMMPS variant Rafael Ravedutti 2022-03-17 02:44:34 +0100
  • d4b34e1fa4 Fix intrinsics for AVX2 Rafael Ravedutti 2022-03-17 00:35:21 +0100
  • 4090f43095 Optimize partial forces reduction for compute_4xn kernel Rafael Ravedutti 2022-03-16 17:54:52 +0100
  • f3263a2d48 Separate simd file into multiple files Rafael Ravedutti 2022-03-16 14:52:55 +0100
  • 459853dc25 Merge pull request #4 from RRZE-HPC/gromacs_sp rafaelravedutti 2022-03-15 20:31:42 +0100
  • d47173d7a2 Fix Simd2xNN kernel Rafael Ravedutti 2022-03-15 19:59:10 +0100
  • d61576699d Add first compilable version of Gromacs with SP Rafael Ravedutti 2022-03-15 02:40:56 +0100
  • 8669f2f6d7 Fix LJ Simd4xN kernel Rafael Ravedutti 2022-03-11 01:12:59 +0100
  • d79c3c2a1d Add first working version with 4x8 config (ref kernel) Rafael Ravedutti 2022-03-10 22:33:41 +0100
  • c2fcd50773 Initial version of lammps halfneighbor list Jan Eitzinger 2022-03-10 17:06:45 +0100
  • ba3a0524f6 Merge branch 'master' of github.com:RRZE-HPC/MD-Bench Jan Eitzinger 2022-03-10 16:30:40 +0100
  • 6203cb12b6 Start to introduce halfneigh version Jan Eitzinger 2022-03-10 16:30:37 +0100
  • 22d0f0b958 Commit version that works for M=N Rafael Ravedutti 2022-03-10 01:31:50 +0100
  • 2b441e691e Make code compilable Rafael Ravedutti 2022-03-09 17:23:49 +0100
  • c7360305c8 Add first draft version of GROMACS method separating i-clusters and j-clusters Rafael Ravedutti 2022-03-09 02:25:39 +0100
  • cecb31d6a9 Update params for argon_1000 test case Rafael Ravedutti 2022-03-07 14:49:38 +0100
  • ba6785a865 Allow parameter reading from files and update data Rafael Ravedutti 2022-03-05 03:21:52 +0100
  • aae29a5b5a Add code to read GRO files Rafael Ravedutti 2022-03-03 20:03:33 +0100
  • af92800c64 Add SIMD version with AVX (no AVX2) and XTC output Rafael Ravedutti 2022-03-02 23:12:04 +0100
  • 022aa75c75 Add cutoff radius and skin as parameters of simulation Rafael Ravedutti 2022-02-28 22:34:42 +0100
  • 1389f89fb7 Add prunning kernel Rafael Ravedutti 2022-02-28 17:20:39 +0100
  • c62e4ea4ad Add clusters efficiency on stats Rafael Ravedutti 2022-02-28 16:10:09 +0100
  • ed2929c813 Add percentage of atoms within cutoff radius when using LAMMPS reference version Rafael Ravedutti 2022-02-25 14:40:33 +0100
  • e637a26844 Add percentage of atoms within cutoff radius when using GROMACS reference version Rafael Ravedutti 2022-02-25 14:19:48 +0100
  • fdd18df816 Fix argon simulation Rafael Ravedutti 2022-02-24 16:42:58 +0100
  • 1a708f2d3b Add PDB reading functions to lammps variant Rafael Ravedutti 2022-02-24 15:17:51 +0100
  • d0ec9520f2 Write function to read PDB files and include data for Argon simulation Rafael Ravedutti 2022-02-24 02:36:17 +0100
  • ca7775a62a Add average atoms per cluster on stats Rafael Ravedutti 2022-02-09 17:50:54 +0100
  • 769bab0faa Separate local and ghost cluster edges on VTK output Rafael Ravedutti 2022-02-08 16:12:22 +0100
  • 6a35a7a482 Update stats for cluster version Rafael Ravedutti 2022-02-08 00:55:27 +0100
  • 8deee3d954 Add cluster edges in VTK output Rafael Ravedutti 2022-02-08 00:11:10 +0100
  • cd15911a97 When building neighbor lists, skip first iterations until z is in range Rafael Ravedutti 2022-02-07 18:28:53 +0100
  • 0eacb2453e Inline getBoundingBoxDistanceSq and avoid redundant loads from bbminz and bbmaxz Rafael Ravedutti 2022-02-07 18:00:21 +0100
  • b024adaf5b Re-measure for 2000 time steps mucosim_cuda Maximilian Gaul 2022-02-04 17:40:12 +0100
  • cdb1d5b9f1 Add version with AVX2 intrinsics for gromacs scheme Rafael Ravedutti 2022-02-04 17:52:48 +0100
  • 34ce407f18 Update stats for gromacs scheme Rafael Ravedutti 2022-02-04 14:47:37 +0100
  • 6e6a3f6502 Use aligned loads when gathering j atoms Rafael Ravedutti 2022-02-04 14:29:32 +0100
  • 7b90800a2b Setting forces to zero before calculation is not required Rafael Ravedutti 2022-02-04 14:05:04 +0100
  • 9daf9e5f4d Fix exclusion masks and add SIMD debug tools Rafael Ravedutti 2022-02-02 21:54:18 +0100
  • 4c5f013bf4 Assign masked adds results to forces Rafael Ravedutti 2022-02-02 18:07:56 +0100
  • 6ad1e58a3e Add first kernel using SIMD instrinsics for 4xn cases Rafael Ravedutti 2022-02-02 18:00:44 +0100
  • 5fd2d422ee Adjust kernels to work with MxN loops Rafael Ravedutti 2022-02-02 00:49:55 +0100
  • 85e7954932 Check all clusters in cell when building neighbor lists because ghost clusters may not be sorted Rafael Ravedutti 2022-02-01 20:16:04 +0100
  • 4a5216a177 Remove bb z-check on while loop when building neighbor lists Rafael Ravedutti 2022-02-01 00:46:12 +0100
  • e64c3345bc Fix a few more bugs on gromacs variant Rafael Ravedutti 2022-01-31 23:46:20 +0100
  • 696e6da01d Implement Neighbour list AoS memory layout + performance measurement Maximilian Gaul 2022-01-31 20:27:59 +0100
  • e0e6b6a68c Perform a few fixes for gromacs variant Rafael Ravedutti 2022-01-31 17:49:22 +0100
  • 6691803910 Add first version of force calculation with cluster scheme Rafael Ravedutti 2022-01-28 18:07:41 +0100
  • eedcc97e4a Remove segfaults Rafael Ravedutti 2022-01-28 15:18:54 +0100
  • a119fcdfdd Fix some segfaults and add function to update single atoms Rafael Ravedutti 2022-01-27 03:07:31 +0100
  • aa0f4048d0 Rename default directory to lammps and reorganize gromacs variant steps Rafael Ravedutti 2022-01-25 21:00:11 +0100
  • cbe42b8149 Fix errors to make gromacs approach compilable so far Rafael Ravedutti 2022-01-25 12:19:28 +0100
  • 6291709ae7 Add first draft code with GROMACS approach Rafael Ravedutti 2022-01-25 00:43:10 +0100
  • b2a6574426 Remove unnecessary atom force backcopy in computeForce Maximilian Gaul 2022-01-24 18:09:27 +0100
  • c4080e866e Make integrate kernels aware of neighbour list update Maximilian Gaul 2022-01-24 18:04:50 +0100
  • 72730bc27b Update Makefile and config.mk Rafael Ravedutti 2022-01-17 14:16:39 +0100
  • df09c2861e Add first version with more than one optimization scheme Rafael Ravedutti 2022-01-17 14:15:02 +0100
  • 489e7ee9d3 Update .gitignore Rafael Ravedutti 2022-01-17 11:46:57 +0100
  • 165335cea0 Update compilation flags for all available compilers Rafael Ravedutti 2022-01-17 11:40:44 +0100
  • 7b592b5fc7 Moved presentation resources to second presentation Maximilian Gaul 2022-01-05 12:48:37 +0100
  • 4690542db5 Added CPU metrics {Cache, FLOPS, L2, L3}, restructured resource folders Maximilian Gaul 2022-01-05 12:31:47 +0100
  • 8c131a7699 Reminder for likwid perf measurements Maximilian Gaul 2022-01-04 13:51:53 +0100
  • dc4d5f1a9c Porting atom velocity memory layout to AoS, porting velocity integration to CUDA, adding measurements + logbook update Maximilian Gaul 2022-01-01 18:18:12 +0100
  • 50007216ed Implemented atom force AoS memory layout, added performance measurements + logbook Update Maximilian Gaul 2022-01-01 16:09:21 +0100
  • 72e4599acc Copy neighbour lists only when reneighbouring happens, added measurements + logbook update Maximilian Gaul 2022-01-01 12:56:42 +0100
  • 8fa03733e9 Copy parameters & cutforces threshold only once at the start + measurements Maximilian Gaul 2021-12-28 16:48:26 +0100
  • bf1ae3d013 Removed debug prints, only zero atom forces and not copy them, added measurements Maximilian Gaul 2021-12-28 16:32:54 +0100
  • 8009b54113 Trying to debug segfault if cudaMemcpy is limited to neighbour list update Maximilian Gaul 2021-12-25 15:36:08 +0100
  • 0ea0587442 Only malloc once at the beginning plus measurement csv Maximilian Gaul 2021-12-25 13:52:33 +0100
  • 134e3f4b78 Also pinnend neighbor-struct memory, added additional performance measurements, added nvprof result to logbook Maximilian Gaul 2021-12-18 15:58:56 +0100
  • c2bfa3ca3f Add scripts for perf measurement, made atom-memory allocation pinnend using 'cudaMallocHost', added measurements for atom pinnend memory Maximilian Gaul 2021-12-18 13:02:04 +0100
  • 2a099da5b7 Started cuda profiling, added first result to logbook Maximilian Gaul 2021-12-03 08:13:43 +0100
  • 7691b23d67 Measure memory transfer of CPU to GPU, add explanation how to distribute calculation among multiple GPUs Maximilian Gaul 2021-12-01 17:16:32 +0100
  • 35c110155e Separate tracing from force computation and fix stubbed version Rafael Ravedutti 2021-12-01 00:07:45 +0100
  • bb21a885a1 Add new setups for Copper melting with LJ and EAM Rafael Ravedutti 2021-11-30 01:33:55 +0100
  • da90466f98 Added first performance measurements with threads per block from 1 to 32 Maximilian Gaul 2021-11-25 08:09:20 +0100
  • 8f723c1299 Added command line description of MD-Bench, added memory transfer rate from CPU to GPU to force.cu Maximilian Gaul 2021-11-23 15:55:23 +0100
  • 0586ef150a Fix num of threads instead of num of blocks, add logbook template Maximilian Gaul 2021-11-15 19:39:09 +0100
  • 2e5d973f7d Rough rewrite to execute outer loop of force calculation in parallel, not inner loop Maximilian Gaul 2021-11-14 10:02:23 +0100
  • e2fd1a0476 Fixed bug, results are now equal to master branch (but still slow) Maximilian Gaul 2021-11-11 21:00:30 +0100
  • 4105c844c6 Runs fine (but slow), results seem to be slightly off from original Maximilian Gaul 2021-11-11 20:47:06 +0100
  • 1f5c9c4b23 Fixed segfault error, added more cudaErrorChecks, added cudaFree to avoid memory leak Maximilian Gaul 2021-11-11 20:29:14 +0100
  • 29e115464b Fixed cudaMemcpy for AOS data layout, added debug outputs, added cudaErrorChecks Maximilian Gaul 2021-11-11 20:14:30 +0100
  • 1a54314c8b First run but segfault at the moment after a few seconds Maximilian Gaul 2021-11-11 15:23:46 +0100
  • 280f595b7f Fixed linker error by putting includes and cuda function in extern 'C' Maximilian Gaul 2021-11-11 14:49:29 +0100
  • 3428974730 getTimeStamp() couldn't get linked Maximilian Gaul 2021-11-11 08:03:56 +0100
  • b54842f764 Added Makefile instructions for .cu files Maximilian Gaul 2021-11-11 07:27:12 +0100
  • 9730164e6f Rename force.c to force.cu because of cuda build errors Maximilian Gaul 2021-11-10 16:20:04 +0100
  • 0f5fdd3708 Sum results after cuda function executed Maximilian Gaul 2021-11-10 16:02:05 +0100
  • f7010113bf Include commented timestamping on asm Rafael Ravedutti 2021-11-10 14:39:44 +0100
  • 841dfb9490 Fix data types for rdr and rdrho Rafael Ravedutti 2021-11-09 20:36:23 +0100
  • 3f7fb7f22a cudaMemcpy of Atom and other properties, first draft implementation of CUDA kernel Maximilian Gaul 2021-11-09 16:40:25 +0100