Go to file
Rafael Ravedutti b20e8c6986 Adjust script for GROMACS scheme
Signed-off-by: Rafael Ravedutti <rafaelravedutti@gmail.com>
2022-12-14 17:54:18 +01:00
asm Add first version with more than one optimization scheme 2022-01-17 14:15:02 +01:00
common Write debug_printf to avoid warnings and fix latency/cfd script 2022-12-14 16:17:28 +01:00
data Fix DEM setup 2022-07-19 04:13:06 +02:00
figures Update gather_bench figure 2022-09-29 17:44:35 +02:00
gather-bench@2f654cb043 Add gather-bench as submodule 2022-09-29 14:55:11 +02:00
gromacs Explicitly set half_neigh to zero on stubbed versions 2022-12-14 17:21:09 +01:00
lammps Explicitly set half_neigh to zero on stubbed versions 2022-12-14 17:21:09 +01:00
util Adjust script for GROMACS scheme 2022-12-14 17:54:18 +01:00
.gitignore Update build options for each compiler and include ICX 2022-12-13 01:06:59 +01:00
.gitmodules Add gather-bench as submodule 2022-09-29 14:55:11 +02:00
config.mk Update build options for each compiler and include ICX 2022-12-13 01:06:59 +01:00
include_CLANG.mk Update build options for each compiler and include ICX 2022-12-13 01:06:59 +01:00
include_GCC.mk Update build options for each compiler and include ICX 2022-12-13 01:06:59 +01:00
include_GROMACS.mk Add SIMD version with AVX (no AVX2) and XTC output 2022-03-02 23:12:04 +01:00
include_ICC.mk Update build options for each compiler and include ICX 2022-12-13 01:06:59 +01:00
include_ICX.mk Add script to automate latency and CFD evaluation 2022-12-13 15:35:41 +01:00
include_ISA.mk Add AVX_FMA ISA 2022-11-15 01:24:30 +01:00
include_LIKWID.mk Add LIKWID Option. Allow to overwrite with asm variant. 2021-06-11 09:38:34 +02:00
include_NVCC.mk Small fixes into GROMACS GPU code 2022-11-14 18:21:14 +01:00
include_ONEAPI.mk Add ONEAPI config. Remove omp simd for full neigh. 2022-04-01 15:57:54 +02:00
LICENSE
Makefile Add AVX_FMA ISA 2022-11-15 01:24:30 +01:00
README.md Remove AVX512 reciprocal usage in AVX2 file 2022-11-15 01:40:37 +01:00

MD-Bench

Image

MD-Bench is a toolbox for the performance engineering of short-range force calculation kernels on molecular-dynamics applications. It aims at covering all available state-of-the-art algorithms from different community codes such as LAMMPS and GROMACS.

Apart from that, many tools to study and evaluate the in-depth performance of such kernels on distinct hardware are offered, like gather-bench, a standalone benchmark that mimics the data movement from MD kernels and the stubbed force calculation cases that focus on isolating the impacts caused by memory latencies and control flow divergence contributions in the overall performance.

Verlet Lists GROMACS MxN Stubbed cases
Image Image Image

Build instructions

Properly configure your building by changing config.mk file. The following options are available:

  • TAG: Compiler tag (available options: GCC, CLANG, ICC, ONEAPI, NVCC).
  • ISA: Instruction set (available options: SSE, AVX, AVX_FMA, AVX2, AVX512).
  • MASK_REGISTERS: Use AVX512 mask registers (always true when ISA is set to AVX512).
  • OPT_SCHEME: Optimization algorithm (available options: lammps, gromacs).
  • ENABLE_LIKWID: Enable likwid to make use of HPM counters.
  • DATA_TYPE: Floating-point precision (available options: SP, DP).
  • DATA_LAYOUT: Data layout for atom vector properties (available options: AOS, SOA).
  • ASM_SYNTAX: Assembly syntax to use when generating assembly files (available options: ATT, INTEL).
  • DEBUG: Toggle debug mode.
  • EXPLICIT_TYPES: Explicitly store and load atom types.
  • MEM_TRACER: Trace memory addresses for cache simulator.
  • INDEX_TRACER: Trace indexes and distances for gather-md.
  • COMPUTE_STATS: Compute statistics.

Configurations for LAMMPS Verlet Lists optimization scheme:

  • ENABLE_OMP_SIMD: Use omp simd pragma on half neighbor-lists kernels.
  • USE_SIMD_KERNEL: Compile kernel with explicit SIMD intrinsics.

Configurations for GROMACS MxN optimization scheme:

  • USE_REFERENCE_VERSION: Use reference version (only for correction purposes).
  • XTC_OUTPUT: Enable XTC output.
  • HALF_NEIGHBOR_LISTS_CHECK_CJ: Check if j-clusters are local when decreasing the reaction force.

Configurations for CUDA:

  • USE_CUDA_HOST_MEMORY: Use CUDA host memory to optimize host-device transfers.

When done, just use make to compile the code. You can clean intermediate build results with make clean, and all build results with make distclean. You have to call make clean before make if you changed the build settings.

Usage

Use the following command to run a simulation:

./MD-Bench-<TAG>-<OPT_SCHEME> [OPTION]...

Where TAG and OPT_SCHEME correspond to the building options with the same name. Without any options, a Copper FCC lattice system with size 32x32x32 (131072 atoms) over 200 time-steps using the Lennard-Jones potential (sigma=1.0, epsilon=1.0) is simulated.

The default behavior and other options can be changed using the following parameters:

-p <string>:          file to read parameters from (can be specified more than once)
-f <string>:          force field (lj or eam), default lj
-i <string>:          input file with atom positions (dump)
-e <string>:          input file for EAM
-n / --nsteps <int>:  set number of timesteps for simulation
-nx/-ny/-nz <int>:    set linear dimension of systembox in x/y/z direction
-r / --radius <real>: set cutoff radius
-s / --skin <real>:   set skin (verlet buffer)
--freq <real>:        processor frequency (GHz)
--vtk <string>:       VTK file for visualization
--xtc <string>:       XTC file for visualization

Examples

TBD

Citations

Rafael Ravedutti Lucio Machado, Jan Eitzinger, Harald Köstler, and Gerhard Wellein: MD-Bench: A generic proxy-app toolbox for state-of-the-art molecular dynamics algorithms. Accepted for PPAM 2022, the 14th International Conference on Parallel Processing and Applied Mathematics, Gdansk, Poland, September 11-14, 2022. PPAM 2022 Best Paper Award. Preprint: arXiv:2207.13094

Credits

MD-Bench is developed by the Erlangen National High Performance Computing Center (NHR@FAU) at the University of Erlangen-Nürnberg.

License

LGPL-3.0