T

Rafael Ravedutti 1ad981a059 Add static analysis for gromacs-avx2-dp on Zen3

Signed-off-by: Rafael Ravedutti <rafaelravedutti@gmail.com>

2023-04-09 00:07:04 +02:00

asm

Small fixes

2023-02-12 01:44:48 +01:00

common

Fix GROMACS AVX2 code

2023-04-07 21:54:07 +02:00

data

Increase cutoff for Argon case

2023-04-03 15:06:32 +02:00

figures

Update gather_bench figure

2022-09-29 17:44:35 +02:00

gather-bench @ 2f654cb043

Add gather-bench as submodule

2022-09-29 14:55:11 +02:00

gromacs

Fix GROMACS AVX2 code

2023-04-07 21:54:07 +02:00

lammps

Fix stubbed versions and debug messages

2023-03-30 03:49:57 +02:00

likwid-outputs

added static analysis and likwid files

2023-02-09 17:33:22 +01:00

static_analysis

Add static analysis for gromacs-avx2-dp on Zen3

2023-04-09 00:07:04 +02:00

util

Update scripts with division factor

2023-04-05 23:56:35 +02:00

.gitignore

Change ICX flags based on ISA

2023-01-16 23:13:40 +01:00

.gitmodules

Add gather-bench as submodule

2022-09-29 14:55:11 +02:00

config.mk

Update build options for each compiler and include ICX

2022-12-13 01:06:59 +01:00

include_CLANG.mk

Add flags with -march=core-avx2 for Milan

2023-01-11 15:30:26 +01:00

include_GCC.mk

Use ISA for GCC flags and change binary and build directory names

2023-01-16 23:05:21 +01:00

include_GROMACS.mk

Add SIMD version with AVX (no AVX2) and XTC output

2022-03-02 23:12:04 +01:00

include_ICC.mk

Use ISA for GCC flags and change binary and build directory names

2023-01-16 23:05:21 +01:00

include_ICX.mk

Add -xHost option for AVX2

2023-01-18 16:39:19 +01:00

include_ISA.mk

Fix compilation for gromacs-avx512-sp

2022-12-21 16:19:00 +01:00

include_LIKWID.mk

Add LIKWID Option. Allow to overwrite with asm variant.

2021-06-11 09:38:34 +02:00

include_NVCC.mk

Small fixes into GROMACS GPU code

2022-11-14 18:21:14 +01:00

include_ONEAPI.mk

Add ONEAPI config. Remove omp simd for full neigh.

2022-04-01 15:57:54 +02:00

LICENSE

Switch License to LGPL3

2020-08-19 10:47:40 +02:00

Makefile

Fix stubbed versions and debug messages

2023-03-30 03:49:57 +02:00

README.md

Remove AVX512 reciprocal usage in AVX2 file

2022-11-15 01:40:37 +01:00

README.md

MD-Bench

MD-Bench is a toolbox for the performance engineering of short-range force calculation kernels on molecular-dynamics applications. It aims at covering all available state-of-the-art algorithms from different community codes such as LAMMPS and GROMACS.

Apart from that, many tools to study and evaluate the in-depth performance of such kernels on distinct hardware are offered, like gather-bench, a standalone benchmark that mimics the data movement from MD kernels and the stubbed force calculation cases that focus on isolating the impacts caused by memory latencies and control flow divergence contributions in the overall performance.

Verlet Lists	GROMACS MxN	Stubbed cases

Build instructions

Properly configure your building by changing config.mk file. The following options are available:

TAG: Compiler tag (available options: GCC, CLANG, ICC, ONEAPI, NVCC).
ISA: Instruction set (available options: SSE, AVX, AVX_FMA, AVX2, AVX512).
MASK_REGISTERS: Use AVX512 mask registers (always true when ISA is set to AVX512).
OPT_SCHEME: Optimization algorithm (available options: lammps, gromacs).
ENABLE_LIKWID: Enable likwid to make use of HPM counters.
DATA_TYPE: Floating-point precision (available options: SP, DP).
DATA_LAYOUT: Data layout for atom vector properties (available options: AOS, SOA).
ASM_SYNTAX: Assembly syntax to use when generating assembly files (available options: ATT, INTEL).
DEBUG: Toggle debug mode.
EXPLICIT_TYPES: Explicitly store and load atom types.
MEM_TRACER: Trace memory addresses for cache simulator.
INDEX_TRACER: Trace indexes and distances for gather-md.
COMPUTE_STATS: Compute statistics.

Configurations for LAMMPS Verlet Lists optimization scheme:

ENABLE_OMP_SIMD: Use omp simd pragma on half neighbor-lists kernels.
USE_SIMD_KERNEL: Compile kernel with explicit SIMD intrinsics.

Configurations for GROMACS MxN optimization scheme:

USE_REFERENCE_VERSION: Use reference version (only for correction purposes).
XTC_OUTPUT: Enable XTC output.
HALF_NEIGHBOR_LISTS_CHECK_CJ: Check if j-clusters are local when decreasing the reaction force.

Configurations for CUDA:

USE_CUDA_HOST_MEMORY: Use CUDA host memory to optimize host-device transfers.

When done, just use make to compile the code. You can clean intermediate build results with make clean, and all build results with make distclean. You have to call make clean before make if you changed the build settings.

Usage

Use the following command to run a simulation:

./MD-Bench-<TAG>-<OPT_SCHEME> [OPTION]...

Where TAG and OPT_SCHEME correspond to the building options with the same name. Without any options, a Copper FCC lattice system with size 32x32x32 (131072 atoms) over 200 time-steps using the Lennard-Jones potential (sigma=1.0, epsilon=1.0) is simulated.

The default behavior and other options can be changed using the following parameters:

-p <string>:          file to read parameters from (can be specified more than once)
-f <string>:          force field (lj or eam), default lj
-i <string>:          input file with atom positions (dump)
-e <string>:          input file for EAM
-n / --nsteps <int>:  set number of timesteps for simulation
-nx/-ny/-nz <int>:    set linear dimension of systembox in x/y/z direction
-r / --radius <real>: set cutoff radius
-s / --skin <real>:   set skin (verlet buffer)
--freq <real>:        processor frequency (GHz)
--vtk <string>:       VTK file for visualization
--xtc <string>:       XTC file for visualization

Examples

TBD

Citations

Rafael Ravedutti Lucio Machado, Jan Eitzinger, Harald Köstler, and Gerhard Wellein: MD-Bench: A generic proxy-app toolbox for state-of-the-art molecular dynamics algorithms. Accepted for PPAM 2022, the 14th International Conference on Parallel Processing and Applied Mathematics, Gdansk, Poland, September 11-14, 2022. PPAM 2022 Best Paper Award. Preprint: arXiv:2207.13094

Credits

MD-Bench is developed by the Erlangen National High Performance Computing Center (NHR@FAU) at the University of Erlangen-Nürnberg.

License

LGPL-3.0