From 9bdd6083612c9625cf109e7b26578eb7fac86422 Mon Sep 17 00:00:00 2001 From: Jan Eitzinger Date: Wed, 14 Jun 2023 07:31:29 +0200 Subject: [PATCH 1/2] Extend docs --- README.md | 5 +-- docs/Job-Archive.md | 78 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index c8cd60b..0bd3364 100644 --- a/README.md +++ b/README.md @@ -41,8 +41,9 @@ versions of third party packages. ## Demo Setup -We provide a shell skript that downloads demo data and automatically builds and starts cc-backend. -You need `wget`, `go`, `node`, `rollup` and `yarn` in your path to start the demo. The demo will download 32MB of data (223MB on disk). +We provide a shell skript that downloads demo data and automatically builds and +starts cc-backend. You need `wget`, `go`, `node`, `npm` in your path to start +the demo. The demo will download 32MB of data (223MB on disk). ```sh git clone https://github.com/ClusterCockpit/cc-backend.git diff --git a/docs/Job-Archive.md b/docs/Job-Archive.md index e69de29..601f32d 100644 --- a/docs/Job-Archive.md +++ b/docs/Job-Archive.md @@ -0,0 +1,78 @@ +The job archive specifies an exchange format for job meta and performance metric +data. It consists of two parts: +* a [SQLite database schema](https://github.com/ClusterCockpit/cc-backend/wiki/Job-Archive#sqlite-database-schema) for job meta data and performance statistics +* a [Json file format](https://github.com/ClusterCockpit/cc-backend/wiki/Job-Archive#json-file-format) together with a [Directory hierarchy specification](https://github.com/ClusterCockpit/cc-backend/wiki/Job-Archive#directory-hierarchy-specification) + +By using an open, portable and simple specification based on files it is +possible to exchange job performance data for research and analysis purposes as +well as use it as a robust way for archiving job performance data to disk. + +# SQLite database schema +## Introduction + +A SQLite 3 database schema is provided to standardize the job meta data +information in a portable way. The schema also includes optional columns for job +performance statistics (called a job performance footprint). The database acts +as a front end to filter and select subsets of job IDs, that are the keys to get +the full job performance data in the job performance tree hierarchy. + +## Database schema + +The schema includes 3 tables: the job table, a tag table and a jobtag table +representing the MANY-TO-MANY relation between jobs and tags. The SQL schema is +specified +[here](https://github.com/ClusterCockpit/cc-specifications/blob/master/schemas/jobs-sqlite.sql). +Explanation of the various columns including the JSON datatypes is documented +[here](https://github.com/ClusterCockpit/cc-specifications/blob/master/datastructures/job-meta.schema.json). + +# Directory hierarchy specification + +## Specification + +To manage the number of directories within a single directory a tree approach is +used splitting the integer job ID. The job id is split in junks of 1000 each. +Usually 2 layers of directories is sufficient but the concept can be used for an +arbitrary number of layers. + +For a 2 layer schema this can be achieved with (code example in Perl): +``` perl +$level1 = $jobID/1000; +$level2 = $jobID%1000; +$dstPath = sprintf("%s/%s/%d/%03d", $trunk, $destdir, $level1, $level2); +``` + +## Example + +For the job ID 1034871 the directory path is `./1034/871/`. + +# Json file format +## Overview + +Every cluster must be configured in a `cluster.json` file. + +The job data consists of two files: +* `meta.json`: Contains job meta information and job statistics. +* `data.json`: Contains complete job data with time series + +The description of the json format specification is available as [[json +schema|https://json-schema.org/]] format file. The latest version of the json +schema is part of the `cc-backend` source tree. For external reference it is +also available in a separate repository. + +## Specification `cluster.json` + +The json schema specification is available +[here](https://github.com/ClusterCockpit/cc-specifications/blob/master/datastructures/cluster.schema.json). + +## Specification `meta.json` + +The json schema specification is available +[here](https://github.com/RRZE-HPC/HPCJobDatabase/blob/master/json-schema/job-meta.schema.json). + +## Specification `data.json` + +The json schema specification is available +[here](https://github.com/RRZE-HPC/HPCJobDatabase/blob/master/json-schema/job-data.schema.json). +Metric time series data is stored for a fixed time step. The time step is set +per metric. If no value is available for a metric time series data timestamp +`null` is entered. From 93396aa0ea0f75903a315e2ccd8e92d2e4dafe7b Mon Sep 17 00:00:00 2001 From: Jan Eitzinger Date: Wed, 14 Jun 2023 07:31:43 +0200 Subject: [PATCH 2/2] Fix bug in compression service --- pkg/archive/fsBackend.go | 1 + 1 file changed, 1 insertion(+) diff --git a/pkg/archive/fsBackend.go b/pkg/archive/fsBackend.go index f0a0477..0a9c224 100644 --- a/pkg/archive/fsBackend.go +++ b/pkg/archive/fsBackend.go @@ -363,6 +363,7 @@ func (fsa *FsArchive) CompressLast(starttime int64) int64 { b, err := os.ReadFile(filename) if err != nil { log.Errorf("fsBackend Compress - %v", err) + os.WriteFile(filename, []byte(fmt.Sprintf("%d", starttime)), 0644) return starttime } last, err := strconv.ParseInt(strings.TrimSuffix(string(b), "\n"), 10, 64)