Extend docs

2026-06-14 07:37:30 +02:00 · 2023-06-14 07:31:29 +02:00
parent 27d7d57225
commit 9bdd608361
2 changed files with 81 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -41,8 +41,9 @@ versions of third party packages.
 ## Demo Setup
-We provide a shell skript that downloads demo data and automatically builds and starts cc-backend.
+We provide a shell skript that downloads demo data and automatically builds and
-You need `wget`, `go`, `node`, `rollup` and `yarn` in your path to start the demo. The demo will download 32MB of data (223MB on disk).
+starts cc-backend. You need `wget`, `go`, `node`, `npm` in your path to start
 the demo. The demo will download 32MB of data (223MB on disk).
 ```sh
 git clone https://github.com/ClusterCockpit/cc-backend.git
--- a/docs/Job-Archive.md
+++ b/docs/Job-Archive.md
@@ -0,0 +1,78 @@
 The job archive specifies an exchange format for job meta and performance metric
 data. It consists of two parts:
 * a [SQLite database schema](https://github.com/ClusterCockpit/cc-backend/wiki/Job-Archive#sqlite-database-schema)  for job meta data and performance statistics
 * a [Json file format](https://github.com/ClusterCockpit/cc-backend/wiki/Job-Archive#json-file-format) together with a [Directory hierarchy specification](https://github.com/ClusterCockpit/cc-backend/wiki/Job-Archive#directory-hierarchy-specification)
 By using an open, portable and simple specification based on files it is
 possible to exchange job performance data for research and analysis purposes as
 well as use it as a robust way for archiving job performance data to disk.
 # SQLite database schema
 ## Introduction
 A SQLite 3 database schema is provided to standardize the job meta data
 information in a portable way. The schema also includes optional columns for job
 performance statistics (called a job performance footprint). The database acts
 as a front end to filter and select subsets of job IDs, that are the keys to get
 the full job performance data in the job performance tree hierarchy.
 ## Database schema
 The schema includes 3 tables: the job table, a tag table and a jobtag table
 representing the MANY-TO-MANY relation between jobs and tags. The SQL schema is
 specified
 [here](https://github.com/ClusterCockpit/cc-specifications/blob/master/schemas/jobs-sqlite.sql).
 Explanation of the various columns including the JSON datatypes is documented
 [here](https://github.com/ClusterCockpit/cc-specifications/blob/master/datastructures/job-meta.schema.json).
 # Directory hierarchy specification
 ## Specification
 To manage the number of directories within a single directory a tree approach is
 used splitting the integer job ID. The job id is split in junks of 1000 each.
 Usually 2 layers of directories is sufficient but the concept can be used for an
 arbitrary number of layers.
 For a 2 layer schema this can be achieved with (code example in Perl):
 ``` perl
 $level1 = $jobID/1000;
 $level2 = $jobID%1000;
 $dstPath = sprintf("%s/%s/%d/%03d", $trunk, $destdir, $level1, $level2);
 ```
 ## Example
 For the job ID 1034871 the directory path is `./1034/871/`.
 # Json file format
 ## Overview
 Every cluster must be configured in a `cluster.json` file.
 The job data consists of two files:
 * `meta.json`: Contains job meta information and job statistics.
 * `data.json`: Contains complete job data with time series
 The description of the json format specification is available as [[json
 schema|https://json-schema.org/]] format file. The latest version of the json
 schema is part of the `cc-backend` source tree. For external reference it is
 also available in a separate repository.
 ## Specification `cluster.json`
 The json schema specification is available
 [here](https://github.com/ClusterCockpit/cc-specifications/blob/master/datastructures/cluster.schema.json).
 ## Specification `meta.json`
 The json schema specification is available
 [here](https://github.com/RRZE-HPC/HPCJobDatabase/blob/master/json-schema/job-meta.schema.json).
 ## Specification `data.json`
 The json schema specification is available
 [here](https://github.com/RRZE-HPC/HPCJobDatabase/blob/master/json-schema/job-data.schema.json).
 Metric time series data is stored for a fixed time step. The time step is set
 per metric. If no value is available for a metric time series data timestamp
 `null` is entered.