From 9bdd6083612c9625cf109e7b26578eb7fac86422 Mon Sep 17 00:00:00 2001
From: Jan Eitzinger <jan@moebiusband.org>
Date: Wed, 14 Jun 2023 07:31:29 +0200
Subject: [PATCH 1/2] Extend docs

---
 README.md           |  5 +--
 docs/Job-Archive.md | 78 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index c8cd60b..0bd3364 100644
--- a/README.md
+++ b/README.md
@@ -41,8 +41,9 @@ versions of third party packages.
 
 ## Demo Setup
 
-We provide a shell skript that downloads demo data and automatically builds and starts cc-backend.
-You need `wget`, `go`, `node`, `rollup` and `yarn` in your path to start the demo. The demo will download 32MB of data (223MB on disk).
+We provide a shell skript that downloads demo data and automatically builds and
+starts cc-backend. You need `wget`, `go`, `node`, `npm` in your path to start
+the demo. The demo will download 32MB of data (223MB on disk).
 
 ```sh
 git clone https://github.com/ClusterCockpit/cc-backend.git
diff --git a/docs/Job-Archive.md b/docs/Job-Archive.md
index e69de29..601f32d 100644
--- a/docs/Job-Archive.md
+++ b/docs/Job-Archive.md
@@ -0,0 +1,78 @@
+The job archive specifies an exchange format for job meta and performance metric
+data. It consists of two parts:
+* a [SQLite database schema](https://github.com/ClusterCockpit/cc-backend/wiki/Job-Archive#sqlite-database-schema)  for job meta data and performance statistics
+* a [Json file format](https://github.com/ClusterCockpit/cc-backend/wiki/Job-Archive#json-file-format) together with a [Directory hierarchy specification](https://github.com/ClusterCockpit/cc-backend/wiki/Job-Archive#directory-hierarchy-specification)
+
+By using an open, portable and simple specification based on files it is
+possible to exchange job performance data for research and analysis purposes as
+well as use it as a robust way for archiving job performance data to disk.
+
+# SQLite database schema
+## Introduction
+
+A SQLite 3 database schema is provided to standardize the job meta data
+information in a portable way. The schema also includes optional columns for job
+performance statistics (called a job performance footprint). The database acts
+as a front end to filter and select subsets of job IDs, that are the keys to get
+the full job performance data in the job performance tree hierarchy.
+
+## Database schema
+
+The schema includes 3 tables: the job table, a tag table and a jobtag table
+representing the MANY-TO-MANY relation between jobs and tags. The SQL schema is
+specified
+[here](https://github.com/ClusterCockpit/cc-specifications/blob/master/schemas/jobs-sqlite.sql).
+Explanation of the various columns including the JSON datatypes is documented
+[here](https://github.com/ClusterCockpit/cc-specifications/blob/master/datastructures/job-meta.schema.json).
+
+# Directory hierarchy specification
+
+## Specification
+
+To manage the number of directories within a single directory a tree approach is
+used splitting the integer job ID. The job id is split in junks of 1000 each.
+Usually 2 layers of directories is sufficient but the concept can be used for an
+arbitrary number of layers.
+
+For a 2 layer schema this can be achieved with (code example in Perl):
+``` perl
+$level1 = $jobID/1000;
+$level2 = $jobID%1000;
+$dstPath = sprintf("%s/%s/%d/%03d", $trunk, $destdir, $level1, $level2);
+```
+
+## Example
+
+For the job ID 1034871 the directory path is `./1034/871/`.
+
+# Json file format
+## Overview
+
+Every cluster must be configured in a `cluster.json` file.
+
+The job data consists of two files:
+* `meta.json`: Contains job meta information and job statistics.
+* `data.json`: Contains complete job data with time series
+
+The description of the json format specification is available as [[json
+schema|https://json-schema.org/]] format file. The latest version of the json
+schema is part of the `cc-backend` source tree. For external reference it is
+also available in a separate repository.
+
+## Specification `cluster.json`
+
+The json schema specification is available
+[here](https://github.com/ClusterCockpit/cc-specifications/blob/master/datastructures/cluster.schema.json).
+
+## Specification `meta.json`
+
+The json schema specification is available
+[here](https://github.com/RRZE-HPC/HPCJobDatabase/blob/master/json-schema/job-meta.schema.json).
+
+## Specification `data.json`
+
+The json schema specification is available
+[here](https://github.com/RRZE-HPC/HPCJobDatabase/blob/master/json-schema/job-data.schema.json).
+Metric time series data is stored for a fixed time step. The time step is set
+per metric. If no value is available for a metric time series data timestamp
+`null` is entered. 

From 93396aa0ea0f75903a315e2ccd8e92d2e4dafe7b Mon Sep 17 00:00:00 2001
From: Jan Eitzinger <jan@moebiusband.org>
Date: Wed, 14 Jun 2023 07:31:43 +0200
Subject: [PATCH 2/2] Fix bug in compression service

---
 pkg/archive/fsBackend.go | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pkg/archive/fsBackend.go b/pkg/archive/fsBackend.go
index f0a0477..0a9c224 100644
--- a/pkg/archive/fsBackend.go
+++ b/pkg/archive/fsBackend.go
@@ -363,6 +363,7 @@ func (fsa *FsArchive) CompressLast(starttime int64) int64 {
 	b, err := os.ReadFile(filename)
 	if err != nil {
 		log.Errorf("fsBackend Compress - %v", err)
+		os.WriteFile(filename, []byte(fmt.Sprintf("%d", starttime)), 0644)
 		return starttime
 	}
 	last, err := strconv.ParseInt(strings.TrimSuffix(string(b), "\n"), 10, 64)