Merge branch 'master' into add-influxdb2-client

This commit is contained in:
Christoph Kluge 2022-03-22 11:10:32 +01:00
commit 2811136fd4
33 changed files with 2194 additions and 955 deletions

View File

@ -14,5 +14,4 @@ jobs:
run: |
go build ./...
go vet ./...
go test .
env BASEPATH="../" go test ./repository
go test ./...

View File

@ -1,10 +1,38 @@
# ClusterCockpit with a Golang backend
# ClusterCockpit REST and GraphQL API backend
[![Build](https://github.com/ClusterCockpit/cc-backend/actions/workflows/test.yml/badge.svg)](https://github.com/ClusterCockpit/cc-backend/actions/workflows/test.yml)
Create your job-archive accoring to [this specification](https://github.com/ClusterCockpit/cc-specifications). At least one cluster with a valid `cluster.json` file is required. Having no jobs in the job-archive at all is fine. You may use the sample job-archive available for download [in cc-docker/develop](https://github.com/ClusterCockpit/cc-docker/tree/develop).
This is a Golang backend implementation for a REST and GraphQL API according to the [ClusterCockpit specifications](https://github.com/ClusterCockpit/cc-specifications).
It also includes a web interface for ClusterCockpit based on the components implemented in
[cc-frontend](https://github.com/ClusterCockpit/cc-frontend), which is included as a git submodule.
This implementation replaces the previous PHP Symfony based ClusterCockpit web-interface.
### Run server
## Overview
This is a golang web backend for the ClusterCockpit job-specific performance monitoring framework.
It provides a REST API for integrating ClusterCockpit with a HPC cluster batch system and external analysis scripts.
Data exchange between the web frontend and backend is based on a GraphQL API.
The web frontend is also served by the backend using [Svelte](https://svelte.dev/) components implemented in [cc-frontend](https://github.com/ClusterCockpit/cc-frontend).
Layout and styling is based on [Bootstrap 5](https://getbootstrap.com/) using [Bootstrap Icons](https://icons.getbootstrap.com/).
The backend uses [SQLite 3](https://sqlite.org/) as relational SQL database by default. It can optionally use a MySQL/MariaDB database server.
Finished batch jobs are stored in a so called job archive following [this specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive).
The backend supports authentication using local accounts or an external LDAP directory.
Authorization for APIs is implemented using [JWT](https://jwt.io/) tokens created with public/private key encryption.
## Demo Setup
We provide a shell skript that downloads demo data and automatically builds and starts cc-backend.
You need `wget`, `go`, and `yarn` in your path to start the demo. The demo will download 32MB of data (223MB on disk).
```sh
# The frontend is a submodule, so use `--recursive`
git clone --recursive git@github.com:ClusterCockpit/cc-backend.git
./startDemo.sh
```
You can access the web interface at http://localhost:8080. Credentials for login: `demo:AdminDev`. Please note that some views do not work without a metric backend (e.g., the Systems view).
## Howto Build and Run
```sh
# The frontend is a submodule, so use `--recursive`
@ -41,18 +69,35 @@ vim ./.env
# Show other options:
./cc-backend --help
```
### Run as systemd daemon
In order to run this program as a deamon, look at [utils/systemd/README.md](./utils/systemd/README.md) where a systemd unit file and more explanation is provided.
In order to run this program as a daemon, look at [utils/systemd/README.md](./utils/systemd/README.md) where a systemd unit file and more explanation is provided.
## Configuration and Setup
cc-backend can be used as a local web-interface for an existing job archive or
as a general web-interface server for a live ClusterCockpit Monitoring
framework.
Create your job-archive according to [this specification](https://github.com/ClusterCockpit/cc-specifications). At least
one cluster with a valid `cluster.json` file is required. Having no jobs in the
job-archive at all is fine. You may use the sample job-archive available for
download [in cc-docker/develop](https://github.com/ClusterCockpit/cc-docker/tree/develop).
### Configuration
A config file in the JSON format can be provided using `--config` to override the defaults. Look at the beginning of `server.go` for the defaults and consequently the format of the configuration file.
A config file in the JSON format can be provided using `--config` to override the defaults.
Look at the beginning of `server.go` for the defaults and consequently the format of the configuration file.
### Update GraphQL schema
This project uses [gqlgen](https://github.com/99designs/gqlgen) for the GraphQL API. The schema can be found in `./graph/schema.graphqls`. After changing it, you need to run `go run github.com/99designs/gqlgen` which will update `graph/model`. In case new resolvers are needed, they will be inserted into `graph/schema.resolvers.go`, where you will need to implement them.
This project uses [gqlgen](https://github.com/99designs/gqlgen) for the GraphQL
API. The schema can be found in `./graph/schema.graphqls`. After changing it,
you need to run `go run github.com/99designs/gqlgen` which will update
`graph/model`. In case new resolvers are needed, they will be inserted into
`graph/schema.resolvers.go`, where you will need to implement them.
### Project Structure
## Project Structure
- `api/` contains the REST API. The routes defined there should be called whenever a job starts/stops. The API is documented in the OpenAPI 3.0 format in [./api/openapi.yaml](./api/openapi.yaml).
- `auth/` is where the (optional) authentication middleware can be found, which adds the currently authenticated user to the request context. The `user` table is created and managed here as well.
@ -68,24 +113,15 @@ This project uses [gqlgen](https://github.com/99designs/gqlgen) for the GraphQL
- `metricdata/archive.go` provides functions for fetching metrics from the job-archive and archiving a job to the job-archive.
- `metricdata/cc-metric-store.go` contains an implementation of the `MetricDataRepository` interface which can fetch data from an [cc-metric-store](https://github.com/ClusterCockpit/cc-metric-store)
- `metricdata/influxdb-v2` contains an implementation of the `MetricDataRepository` interface which can fetch data from an InfluxDBv2 database. It is currently disabled and out of date and can not be used as of writing.
- `repository/` all SQL related stuff.
- `repository/init.go` initializes the `job` (and `tag` and `jobtag`) table if the `--init-db` flag is provided. Not only is the table created in the correct schema, but the job-archive is traversed as well.
- `schema/` contains type definitions used all over this project extracted in this package as Go disallows cyclic dependencies between packages.
- `schema/float.go` contains a custom `float64` type which overwrites JSON and GraphQL Marshaling/Unmarshalling. This is needed because a regular optional `Float` in GraphQL will map to `*float64` types in Go. Wrapping every single metric value in an allocation would be a lot of overhead.
- `schema/job.go` provides the types representing a job and its resources. Those can be used as type for a `meta.json` file and/or a row in the `job` table.
- `templates/` is mostly full of HTML templates and a small helper go module.
- `utils/systemd` describes how to deploy/install this as a systemd service
- `utils/` is mostly outdated. Look at the [cc-util repo](https://github.com/ClusterCockpit/cc-util) for more up-to-date scripts.
- `test/` rudimentery tests.
- `utils/`
- `.env` *must* be changed before you deploy this. It contains a Base64 encoded [Ed25519](https://en.wikipedia.org/wiki/EdDSA) key-pair, the secret used for sessions and the password to the LDAP server if LDAP authentication is enabled.
- `gqlgen.yml` configures the behaviour and generation of [gqlgen](https://github.com/99designs/gqlgen).
- `init-db.go` initializes the `job` (and `tag` and `jobtag`) table if the `--init-db` flag is provided. Not only is the table created in the correct schema, but the job-archive is traversed as well.
- `server.go` contains the main function and starts the actual http server.
### TODO
- [ ] write (unit) tests
- [ ] fix `LoadNodeData` in cc-metric-store MetricDataRepository. Currently does not work for non-node scoped metrics because partition is unkown for a node
- [ ] make tokens and sessions (currently based on cookies) expire after some configurable time
- [ ] when authenticating using a JWT, check if that user still exists
- [ ] fix InfluxDB MetricDataRepository (new or old line-protocol format? Support node-level metrics only?)
- [ ] documentation, comments in the code base
- [ ] write more TODOs
- [ ] use more prepared statements and [sqrl](https://github.com/elgris/sqrl) instead of *squirrel*

View File

@ -38,6 +38,9 @@ paths:
- name: items-per-page
in: query
schema: { type: integer }
- name: with-metadata
in: query
schema: { type: boolean }
responses:
200:
description: 'Array of jobs'

View File

@ -108,6 +108,7 @@ type TagJobApiRequest []*struct {
// Return a list of jobs
func (api *RestApi) getJobs(rw http.ResponseWriter, r *http.Request) {
withMetadata := false
filter := &model.JobFilter{}
page := &model.PageRequest{ItemsPerPage: -1, Page: 1}
order := &model.OrderByInput{Field: "startTime", Order: model.SortDirectionEnumDesc}
@ -156,6 +157,8 @@ func (api *RestApi) getJobs(rw http.ResponseWriter, r *http.Request) {
return
}
page.ItemsPerPage = x
case "with-metadata":
withMetadata = true
default:
http.Error(rw, "invalid query parameter: "+key, http.StatusBadRequest)
return
@ -170,6 +173,13 @@ func (api *RestApi) getJobs(rw http.ResponseWriter, r *http.Request) {
results := make([]*schema.JobMeta, 0, len(jobs))
for _, job := range jobs {
if withMetadata {
if _, err := api.JobRepository.FetchMetadata(job); err != nil {
http.Error(rw, err.Error(), http.StatusInternalServerError)
return
}
}
res := &schema.JobMeta{
ID: &job.ID,
BaseJob: job.BaseJob,
@ -274,16 +284,16 @@ func (api *RestApi) startJob(rw http.ResponseWriter, r *http.Request) {
}
// Check if combination of (job_id, cluster_id, start_time) already exists:
job, err := api.JobRepository.Find(&req.JobID, &req.Cluster, &req.StartTime)
job, err := api.JobRepository.Find(&req.JobID, &req.Cluster, nil)
if err != nil && err != sql.ErrNoRows {
handleError(fmt.Errorf("checking for duplicate failed: %w", err), http.StatusInternalServerError, rw)
return
}
if err != sql.ErrNoRows {
} else if err == nil {
if (req.StartTime - job.StartTimeUnix) < 86400 {
handleError(fmt.Errorf("a job with that jobId, cluster and startTime already exists: dbid: %d", job.ID), http.StatusUnprocessableEntity, rw)
return
}
}
id, err := api.JobRepository.Start(&req)
if err != nil {

View File

@ -14,6 +14,7 @@ import (
"strings"
"time"
"github.com/ClusterCockpit/cc-backend/graph/model"
"github.com/ClusterCockpit/cc-backend/log"
sq "github.com/Masterminds/squirrel"
"github.com/golang-jwt/jwt/v4"
@ -71,11 +72,11 @@ func (auth *Authentication) Init(db *sqlx.DB, ldapConfig *LdapConfig) error {
auth.db = db
_, err := db.Exec(`
CREATE TABLE IF NOT EXISTS user (
username varchar(255) PRIMARY KEY,
username varchar(255) PRIMARY KEY NOT NULL,
password varchar(255) DEFAULT NULL,
ldap tinyint DEFAULT 0,
ldap tinyint NOT NULL DEFAULT 0,
name varchar(255) DEFAULT NULL,
roles varchar(255) DEFAULT NULL,
roles varchar(255) NOT NULL DEFAULT "[]",
email varchar(255) DEFAULT NULL);`)
if err != nil {
return err
@ -233,6 +234,28 @@ func (auth *Authentication) FetchUser(username string) (*User, error) {
return user, nil
}
func FetchUser(ctx context.Context, db *sqlx.DB, username string) (*model.User, error) {
me := GetUser(ctx)
if me != nil && !me.HasRole(RoleAdmin) && me.Username != username {
return nil, errors.New("forbidden")
}
user := &model.User{Username: username}
var name, email sql.NullString
if err := sq.Select("name", "email").From("user").Where("user.username = ?", username).
RunWith(db).QueryRow().Scan(&name, &email); err != nil {
if err == sql.ErrNoRows {
return nil, nil
}
return nil, err
}
user.Name = name.String
user.Email = email.String
return user, nil
}
// Handle a POST request that should log the user in, starting a new session.
func (auth *Authentication) Login(onsuccess http.Handler, onfailure func(rw http.ResponseWriter, r *http.Request, loginErr error)) http.Handler {
return http.HandlerFunc(func(rw http.ResponseWriter, r *http.Request) {

View File

@ -20,10 +20,14 @@ import (
var db *sqlx.DB
var lookupConfigStmt *sqlx.Stmt
var lock sync.RWMutex
var uiDefaults map[string]interface{}
var cache *lrucache.Cache = lrucache.New(1024)
var Clusters []*model.Cluster
var nodeLists map[string]map[string]NodeList
func Init(usersdb *sqlx.DB, authEnabled bool, uiConfig map[string]interface{}, jobArchive string) error {
db = usersdb
@ -34,6 +38,7 @@ func Init(usersdb *sqlx.DB, authEnabled bool, uiConfig map[string]interface{}, j
}
Clusters = []*model.Cluster{}
nodeLists = map[string]map[string]NodeList{}
for _, de := range entries {
raw, err := os.ReadFile(filepath.Join(jobArchive, de.Name(), "cluster.json"))
if err != nil {
@ -53,8 +58,8 @@ func Init(usersdb *sqlx.DB, authEnabled bool, uiConfig map[string]interface{}, j
return err
}
if len(cluster.Name) == 0 || len(cluster.MetricConfig) == 0 || len(cluster.Partitions) == 0 {
return errors.New("cluster.name, cluster.metricConfig and cluster.Partitions should not be empty")
if len(cluster.Name) == 0 || len(cluster.MetricConfig) == 0 || len(cluster.SubClusters) == 0 {
return errors.New("cluster.name, cluster.metricConfig and cluster.SubClusters should not be empty")
}
for _, mc := range cluster.MetricConfig {
@ -83,6 +88,19 @@ func Init(usersdb *sqlx.DB, authEnabled bool, uiConfig map[string]interface{}, j
}
Clusters = append(Clusters, &cluster)
nodeLists[cluster.Name] = make(map[string]NodeList)
for _, sc := range cluster.SubClusters {
if sc.Nodes == "" {
continue
}
nl, err := ParseNodeList(sc.Nodes)
if err != nil {
return fmt.Errorf("in %s/cluster.json: %w", cluster.Name, err)
}
nodeLists[cluster.Name][sc.Name] = nl
}
}
if authEnabled {
@ -188,7 +206,7 @@ func UpdateConfig(key, value string, ctx context.Context) error {
return nil
}
func GetClusterConfig(cluster string) *model.Cluster {
func GetCluster(cluster string) *model.Cluster {
for _, c := range Clusters {
if c.Name == cluster {
return c
@ -197,11 +215,11 @@ func GetClusterConfig(cluster string) *model.Cluster {
return nil
}
func GetPartition(cluster, partition string) *model.Partition {
func GetSubCluster(cluster, subcluster string) *model.SubCluster {
for _, c := range Clusters {
if c.Name == cluster {
for _, p := range c.Partitions {
if p.Name == partition {
for _, p := range c.SubClusters {
if p.Name == subcluster {
return p
}
}
@ -222,3 +240,40 @@ func GetMetricConfig(cluster, metric string) *model.MetricConfig {
}
return nil
}
// AssignSubCluster sets the `job.subcluster` property of the job based
// on its cluster and resources.
func AssignSubCluster(job *schema.BaseJob) error {
cluster := GetCluster(job.Cluster)
if cluster == nil {
return fmt.Errorf("unkown cluster: %#v", job.Cluster)
}
if job.SubCluster != "" {
for _, sc := range cluster.SubClusters {
if sc.Name == job.SubCluster {
return nil
}
}
return fmt.Errorf("already assigned subcluster %#v unkown (cluster: %#v)", job.SubCluster, job.Cluster)
}
if len(job.Resources) == 0 {
return fmt.Errorf("job without any resources/hosts")
}
host0 := job.Resources[0].Hostname
for sc, nl := range nodeLists[job.Cluster] {
if nl != nil && nl.Contains(host0) {
job.SubCluster = sc
return nil
}
}
if cluster.SubClusters[0].Nodes == "" {
job.SubCluster = cluster.SubClusters[0].Name
return nil
}
return fmt.Errorf("no subcluster found for cluster %#v and host %#v", job.Cluster, host0)
}

136
config/nodelist.go Normal file
View File

@ -0,0 +1,136 @@
package config
import (
"fmt"
"strconv"
"strings"
"github.com/ClusterCockpit/cc-backend/log"
)
type NLExprString string
func (nle NLExprString) consume(input string) (next string, ok bool) {
str := string(nle)
if strings.HasPrefix(input, str) {
return strings.TrimPrefix(input, str), true
}
return "", false
}
type NLExprIntRange struct {
start, end int64
zeroPadded bool
digits int
}
func (nle NLExprIntRange) consume(input string) (next string, ok bool) {
if !nle.zeroPadded || nle.digits < 1 {
log.Error("node list: only zero-padded ranges are allowed")
return "", false
}
if len(input) < nle.digits {
return "", false
}
numerals, rest := input[:nle.digits], input[nle.digits:]
for len(numerals) > 1 && numerals[0] == '0' {
numerals = numerals[1:]
}
x, err := strconv.ParseInt(numerals, 10, 32)
if err != nil {
return "", false
}
if nle.start <= x && x <= nle.end {
return rest, true
}
return "", false
}
type NodeList [][]interface {
consume(input string) (next string, ok bool)
}
func (nl *NodeList) Contains(name string) bool {
var ok bool
for _, term := range *nl {
str := name
for _, expr := range term {
str, ok = expr.consume(str)
if !ok {
break
}
}
if ok && str == "" {
return true
}
}
return false
}
func ParseNodeList(raw string) (NodeList, error) {
nl := NodeList{}
isLetter := func(r byte) bool { return ('a' <= r && r <= 'z') || ('A' <= r && r <= 'Z') }
isDigit := func(r byte) bool { return '0' <= r && r <= '9' }
for _, rawterm := range strings.Split(raw, ",") {
exprs := []interface {
consume(input string) (next string, ok bool)
}{}
for i := 0; i < len(rawterm); i++ {
c := rawterm[i]
if isLetter(c) || isDigit(c) {
j := i
for j < len(rawterm) && (isLetter(rawterm[j]) || isDigit(rawterm[j])) {
j++
}
exprs = append(exprs, NLExprString(rawterm[i:j]))
i = j - 1
} else if c == '[' {
end := strings.Index(rawterm[i:], "]")
if end == -1 {
return nil, fmt.Errorf("node list: unclosed '['")
}
minus := strings.Index(rawterm[i:i+end], "-")
if minus == -1 {
return nil, fmt.Errorf("node list: no '-' found inside '[...]'")
}
s1, s2 := rawterm[i+1:i+minus], rawterm[i+minus+1:i+end]
if len(s1) != len(s2) || len(s1) == 0 {
return nil, fmt.Errorf("node list: %#v and %#v are not of equal length or of length zero", s1, s2)
}
x1, err := strconv.ParseInt(s1, 10, 32)
if err != nil {
return nil, fmt.Errorf("node list: %w", err)
}
x2, err := strconv.ParseInt(s2, 10, 32)
if err != nil {
return nil, fmt.Errorf("node list: %w", err)
}
exprs = append(exprs, NLExprIntRange{
start: x1,
end: x2,
digits: len(s1),
zeroPadded: true,
})
i += end
} else {
return nil, fmt.Errorf("node list: invalid character: %#v", rune(c))
}
}
nl = append(nl, exprs)
}
return nl, nil
}

37
config/nodelist_test.go Normal file
View File

@ -0,0 +1,37 @@
package config
import (
"testing"
)
func TestNodeList(t *testing.T) {
nl, err := ParseNodeList("hallo,wel123t,emmy[01-99],fritz[005-500],woody[100-200]")
if err != nil {
t.Fatal(err)
}
// fmt.Printf("terms\n")
// for i, term := range nl.terms {
// fmt.Printf("term %d: %#v\n", i, term)
// }
if nl.Contains("hello") || nl.Contains("woody") {
t.Fail()
}
if nl.Contains("fritz1") || nl.Contains("fritz9") || nl.Contains("fritz004") || nl.Contains("woody201") {
t.Fail()
}
if !nl.Contains("hallo") || !nl.Contains("wel123t") {
t.Fail()
}
if !nl.Contains("emmy01") || !nl.Contains("emmy42") || !nl.Contains("emmy99") {
t.Fail()
}
if !nl.Contains("woody100") || !nl.Contains("woody199") {
t.Fail()
}
}

4
go.mod
View File

@ -8,6 +8,7 @@ require (
github.com/go-ldap/ldap/v3 v3.4.1
github.com/go-sql-driver/mysql v1.5.0
github.com/golang-jwt/jwt/v4 v4.1.0
github.com/google/gops v0.3.22
github.com/gorilla/handlers v1.5.1
github.com/gorilla/mux v1.8.0
github.com/gorilla/sessions v1.2.1
@ -18,6 +19,7 @@ require (
github.com/vektah/gqlparser/v2 v2.1.0
golang.org/x/crypto v0.0.0-20211117183948-ae814b36b871
)
<<<<<<< HEAD
require (
github.com/Azure/go-ntlmssp v0.0.0-20200615164410-66371956d46c // indirect
@ -36,3 +38,5 @@ require (
golang.org/x/net v0.0.0-20211112202133-69e39bad7dc2 // indirect
gopkg.in/yaml.v2 v2.3.0 // indirect
)
=======
>>>>>>> master

33
go.sum
View File

@ -5,6 +5,7 @@ github.com/Azure/go-ntlmssp v0.0.0-20200615164410-66371956d46c/go.mod h1:chxPXzS
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/Masterminds/squirrel v1.5.1 h1:kWAKlLLJFxZG7N2E0mBMNWVp5AuUX+JUrnhFN74Eg+w=
github.com/Masterminds/squirrel v1.5.1/go.mod h1:NNaOrjSoIDfDA40n7sr2tPNZRfjzjA400rg+riTZj10=
github.com/StackExchange/wmi v1.2.1/go.mod h1:rcmrprowKIVzvc+NUiLncP2uuArMWLCbu9SBzvHz7e8=
github.com/agnivade/levenshtein v1.0.1/go.mod h1:CURSv5d9Uaml+FovSIICkLbAUZ9S4RqaHDIsdSBg7lM=
github.com/agnivade/levenshtein v1.0.3 h1:M5ZnqLOoZR8ygVq0FfkXsNOKzMCk0xRiow0R5+5VkQ0=
github.com/agnivade/levenshtein v1.0.3/go.mod h1:4SFRZbbXWLF4MU1T9Qg0pGgH3Pjs+t6ie5efyrwRJXs=
@ -32,14 +33,24 @@ github.com/go-chi/chi v3.3.2+incompatible/go.mod h1:eB3wogJHnLi3x/kFX2A+IbTBlXxm
github.com/go-chi/chi/v5 v5.0.0/go.mod h1:BBug9lr0cqtdAhsu6R4AAdvufI0/XBzAQSsUqJpoZOs=
github.com/go-ldap/ldap/v3 v3.4.1 h1:fU/0xli6HY02ocbMuozHAYsaHLcnkLjvho2r5a34BUU=
github.com/go-ldap/ldap/v3 v3.4.1/go.mod h1:iYS1MdmrmceOJ1QOTnRXrIs7i3kloqtmGQjRvjKpyMg=
<<<<<<< HEAD
github.com/go-openapi/jsonpointer v0.19.5/go.mod h1:Pl9vOtqEWErmShwVjC8pYs9cog34VGT37dQOVbmoatg=
github.com/go-openapi/swag v0.19.5/go.mod h1:POnQmlKehdgb5mhVOsnJFsivZCEZ/vjK9gh66Z9tfKk=
=======
github.com/go-ole/go-ole v1.2.5/go.mod h1:pprOEPIfldk/42T2oK7lQ4v4JSDwmV0As9GaiUsvbm0=
github.com/go-ole/go-ole v1.2.6-0.20210915003542-8b1f7f90f6b1/go.mod h1:pprOEPIfldk/42T2oK7lQ4v4JSDwmV0As9GaiUsvbm0=
>>>>>>> master
github.com/go-sql-driver/mysql v1.5.0 h1:ozyZYNQW3x3HtqT1jira07DN2PArx2v7/mN66gGcHOs=
github.com/go-sql-driver/mysql v1.5.0/go.mod h1:DCzpHaOWr8IXmIStZouvnhqoel9Qv2LBy8hT2VhHyBg=
github.com/gogo/protobuf v1.0.0/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
github.com/golang-jwt/jwt/v4 v4.1.0 h1:XUgk2Ex5veyVFVeLm0xhusUTQybEbexJXrvPNOKkSY0=
github.com/golang-jwt/jwt/v4 v4.1.0/go.mod h1:/xlHOz8bRuivTWchD4jCa+NbatV+wEUSzwAxVc6locg=
<<<<<<< HEAD
github.com/golangci/lint-1 v0.0.0-20181222135242-d2cdd8c08219/go.mod h1:/X8TswGSh1pIozq4ZwCfxS0WA5JGXguxk94ar/4c87Y=
=======
github.com/google/gops v0.3.22 h1:lyvhDxfPLHAOR2xIYwjPhN387qHxyU21Sk9sz/GhmhQ=
github.com/google/gops v0.3.22/go.mod h1:7diIdLsqpCihPSX3fQagksT/Ku/y4RL9LHTlKyEUDl8=
>>>>>>> master
github.com/gorilla/context v0.0.0-20160226214623-1ea25387ff6f/go.mod h1:kBGZzfjB9CEq2AlWe17Uuf7NDRt0dE0s8S51q0aT7Yg=
github.com/gorilla/handlers v1.5.1 h1:9lRY6j8DEeeBT10CvO9hGW0gmky0BprnvDI5vfhUHH4=
github.com/gorilla/handlers v1.5.1/go.mod h1:t8XrUpc4KVXb7HGyJ4/cEnwQiaxrX/hz1Zv/4g96P1Q=
@ -62,6 +73,7 @@ github.com/influxdata/line-protocol v0.0.0-20200327222509-2487e7298839 h1:W9WBk7
github.com/influxdata/line-protocol v0.0.0-20200327222509-2487e7298839/go.mod h1:xaLFMmpvUxqXtVkUJfg9QmT88cDaCJ3ZKgdZ78oO8Qo=
github.com/jmoiron/sqlx v1.3.1 h1:aLN7YINNZ7cYOPK3QC83dbM6KT0NMqVMw961TqrejlE=
github.com/jmoiron/sqlx v1.3.1/go.mod h1:2BljVx/86SuTyjE+aPYlHCTNvZrnJXghYGpNiXLBMCQ=
github.com/keybase/go-ps v0.0.0-20190827175125-91aafc93ba19/go.mod h1:hY+WOq6m2FpbvyrI93sMaypsttvaIL5nhVR92dTMUcQ=
github.com/kr/pretty v0.1.0 h1:L/CwN0zerZDmRFUapSPitk6f+Q3+0za1rQkzVuMiMFI=
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
@ -102,6 +114,7 @@ github.com/rs/cors v1.6.0/go.mod h1:gFx+x8UowdsKA9AchylcLynDq+nNFfI8FkUZdN/jGCU=
github.com/russross/blackfriday/v2 v2.0.1/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/sergi/go-diff v1.1.0 h1:we8PVUC3FE2uYfodKH/nBHMSetSfHDR6scGdBi+erh0=
github.com/sergi/go-diff v1.1.0/go.mod h1:STckp+ISIX8hZLjrqAeVduY0gWCT9IjLuqbuNXdaHfM=
github.com/shirou/gopsutil/v3 v3.21.9/go.mod h1:YWp/H8Qs5fVmf17v7JNZzA0mPJ+mS2e9JdiUF9LlKzQ=
github.com/shurcooL/httpfs v0.0.0-20171119174359-809beceb2371/go.mod h1:ZY1cvUeJuFPAdZ/B6v7RHavJWZn2YPVFQ1OSXhCGOkg=
github.com/shurcooL/sanitized_anchor_name v1.0.0/go.mod h1:1NzhyTcUVG4SuEtjjoZeVRXNmyL/1OwPU0+IJeTBvfc=
github.com/shurcooL/vfsgen v0.0.0-20180121065927-ffb13db8def0/go.mod h1:TrYk7fJVaAttu97ZZKrO9UbRa8izdowaMIZcxYMbVaw=
@ -112,6 +125,13 @@ github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UV
github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4=
github.com/stretchr/testify v1.5.1 h1:nOGnQDM7FYENwehXlg/kFVnos3rEvtKTjRvOWSzb6H4=
github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5cxcmMvtA=
<<<<<<< HEAD
=======
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/tklauser/go-sysconf v0.3.9/go.mod h1:11DU/5sG7UexIrp/O6g35hrWzu0JxlwQ3LSFUzyeuhs=
github.com/tklauser/numcpus v0.3.0/go.mod h1:yFGUr7TUHQRAhyqBcEg0Ge34zDBAsIvJJcyE6boqnA8=
github.com/urfave/cli/v2 v2.1.1 h1:Qt8FeAtxE/vfdrLmR3rxR6JRE0RoVmbXu8+6kZtYU4k=
>>>>>>> master
github.com/urfave/cli/v2 v2.1.1/go.mod h1:SE9GqnLQmjVa0iPEY0f1w3ygNIYcIJ0OKPMoW2caLfQ=
github.com/valyala/bytebufferpool v1.0.0/go.mod h1:6bBcMArwyJ5K/AmCkWv1jt77kVWyCJ6HpOuEn7z0Csc=
github.com/valyala/fasttemplate v1.0.1/go.mod h1:UQGH1tvbgY+Nz5t2n7tXsz52dQxojPUpymEIMZ47gx8=
@ -119,6 +139,7 @@ github.com/valyala/fasttemplate v1.2.1/go.mod h1:KHLXt3tVN2HBp8eijSv/kGJopbvo7S+
github.com/vektah/dataloaden v0.2.1-0.20190515034641-a19b9a6e7c9e/go.mod h1:/HUdMve7rvxZma+2ZELQeNh88+003LL7Pf/CZ089j8U=
github.com/vektah/gqlparser/v2 v2.1.0 h1:uiKJ+T5HMGGQM2kRKQ8Pxw8+Zq9qhhZhz/lieYvCMns=
github.com/vektah/gqlparser/v2 v2.1.0/go.mod h1:SyUiHgLATUR8BiYURfTirrTcGpcE+4XkV2se04Px1Ms=
github.com/xlab/treeprint v1.1.0/go.mod h1:gj5Gd3gPdKtR1ikdDK6fnFLdmIS0X30kTTuNd/WEJu0=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
golang.org/x/crypto v0.0.0-20200604202706-70a84ac30bf9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
@ -139,8 +160,12 @@ golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJ
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190222072716-a9d3bda3a223/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
<<<<<<< HEAD
golang.org/x/sys v0.0.0-20190813064441-fde4db37ae7a/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
=======
golang.org/x/sys v0.0.0-20190916202348-b4ddaad3f8a3/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
>>>>>>> master
golang.org/x/sys v0.0.0-20200116001909-b77594299b42/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200223170610-d5e6a3e2c0ae/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
@ -149,7 +174,13 @@ golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7w
golang.org/x/sys v0.0.0-20210124154548-22da62e12c0c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
<<<<<<< HEAD
golang.org/x/term v0.0.0-20201117132131-f5c789dd3221/go.mod h1:Nr5EML6q2oocZ2LXRh80K7BxOlk5/8JxuGnuhpl+muw=
=======
golang.org/x/sys v0.0.0-20210816074244-15123e1e1f71/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20210902050250-f475640dd07b h1:S7hKs0Flbq0bbc9xgYt4stIEG1zNDFqyrPwAX2Wj/sE=
golang.org/x/sys v0.0.0-20210902050250-f475640dd07b/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
>>>>>>> master
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
@ -172,5 +203,7 @@ gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.2.4/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.3.0 h1:clyUAQHOM3G0M3f5vQj7LuJrETvjVot3Z5el9nffUtU=
gopkg.in/yaml.v2 v2.3.0/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
rsc.io/goversion v1.2.0/go.mod h1:Eih9y/uIBS3ulggl7KNJ09xGSLcuNaLgmvvqa07sgfo=
sourcegraph.com/sourcegraph/appdash v0.0.0-20180110180208-2cc67fd64755/go.mod h1:hI742Nqp5OhwiqlzhgfbWU4mW4yO10fP+LoT9WOswdU=
sourcegraph.com/sourcegraph/appdash-data v0.0.0-20151005221446-73f23eafcf67/go.mod h1:L5q+DGLGOQFpo1snNEkLOJT2d1YTW66rWNzatr3He1k=

View File

@ -61,6 +61,10 @@ models:
resolver: true
metaData:
resolver: true
Cluster:
fields:
partitions:
resolver: true
NullableFloat: { model: "github.com/ClusterCockpit/cc-backend/schema.Float" }
MetricScope: { model: "github.com/ClusterCockpit/cc-backend/schema.MetricScope" }
JobStatistics: { model: "github.com/ClusterCockpit/cc-backend/schema.JobStatistics" }

File diff suppressed because it is too large Load Diff

View File

@ -9,7 +9,7 @@ type Cluster struct {
Name string `json:"name"`
MetricConfig []*MetricConfig `json:"metricConfig"`
FilterRanges *FilterRanges `json:"filterRanges"`
Partitions []*Partition `json:"partitions"`
SubClusters []*SubCluster `json:"subClusters"`
// NOT part of the GraphQL API. This has to be a JSON object with a field `"kind"`.
// All other fields depend on that kind (e.g. "cc-metric-store", "influxdb-v2").

View File

@ -33,6 +33,11 @@ type FloatRange struct {
To float64 `json:"to"`
}
type Footprints struct {
Nodehours []schema.Float `json:"nodehours"`
Metrics []*MetricFootprints `json:"metrics"`
}
type HistoPoint struct {
Count int `json:"count"`
Value int `json:"value"`
@ -95,6 +100,7 @@ type MetricConfig struct {
Name string `json:"name"`
Unit string `json:"unit"`
Scope schema.MetricScope `json:"scope"`
Aggregation *string `json:"aggregation"`
Timestep int `json:"timestep"`
Peak float64 `json:"peak"`
Normal float64 `json:"normal"`
@ -103,8 +109,8 @@ type MetricConfig struct {
}
type MetricFootprints struct {
Name string `json:"name"`
Footprints []schema.Float `json:"footprints"`
Metric string `json:"metric"`
Data []schema.Float `json:"data"`
}
type NodeMetrics struct {
@ -122,8 +128,16 @@ type PageRequest struct {
Page int `json:"page"`
}
type Partition struct {
type StringInput struct {
Eq *string `json:"eq"`
Contains *string `json:"contains"`
StartsWith *string `json:"startsWith"`
EndsWith *string `json:"endsWith"`
}
type SubCluster struct {
Name string `json:"name"`
Nodes string `json:"nodes"`
ProcessorType string `json:"processorType"`
SocketsPerNode int `json:"socketsPerNode"`
CoresPerSocket int `json:"coresPerSocket"`
@ -134,13 +148,6 @@ type Partition struct {
Topology *Topology `json:"topology"`
}
type StringInput struct {
Eq *string `json:"eq"`
Contains *string `json:"contains"`
StartsWith *string `json:"startsWith"`
EndsWith *string `json:"endsWith"`
}
type TimeRange struct {
From *time.Time `json:"from"`
To *time.Time `json:"to"`
@ -160,6 +167,12 @@ type Topology struct {
Accelerators []*Accelerator `json:"accelerators"`
}
type User struct {
Username string `json:"username"`
Name string `json:"name"`
Email string `json:"email"`
}
type Aggregate string
const (

View File

@ -11,8 +11,10 @@ type Job {
user: String!
project: String!
cluster: String!
subCluster: String!
startTime: Time!
duration: Int!
walltime: Int!
numNodes: Int!
numHWThreads: Int!
numAcc: Int!
@ -22,20 +24,24 @@ type Job {
arrayJobId: Int!
monitoringStatus: Int!
state: JobState!
metaData: Any
tags: [Tag!]!
resources: [Resource!]!
metaData: Any
userData: User
}
type Cluster {
name: String!
partitions: [String!]! # Slurm partitions
metricConfig: [MetricConfig!]!
filterRanges: FilterRanges!
partitions: [Partition!]!
subClusters: [SubCluster!]! # Hardware partitions/subclusters
}
type Partition {
type SubCluster {
name: String!
nodes: String!
processorType: String!
socketsPerNode: Int!
coresPerSocket: Int!
@ -65,6 +71,7 @@ type MetricConfig {
name: String!
unit: String!
scope: MetricScope!
aggregation: String
timestep: Int!
peak: Float!
normal: Float!
@ -118,8 +125,13 @@ type StatsSeries {
}
type MetricFootprints {
name: String!
footprints: [NullableFloat!]!
metric: String!
data: [NullableFloat!]!
}
type Footprints {
nodehours: [NullableFloat!]!
metrics: [MetricFootprints!]!
}
enum Aggregate { USER, PROJECT, CLUSTER }
@ -134,13 +146,21 @@ type Count {
count: Int!
}
type User {
username: String!
name: String!
email: String!
}
type Query {
clusters: [Cluster!]! # List of all clusters
tags: [Tag!]! # List of all tags
user(username: String!): User
job(id: ID!): Job
jobMetrics(id: ID!, metrics: [String!], scopes: [MetricScope!]): [JobMetricWithName!]!
jobsFootprints(filter: [JobFilter!], metrics: [String!]!): [MetricFootprints]!
jobsFootprints(filter: [JobFilter!], metrics: [String!]!): Footprints
jobs(filter: [JobFilter!], page: PageRequest, order: OrderByInput): JobResultList!
jobsStatistics(filter: [JobFilter!], groupBy: Aggregate): [JobsStatistics!]!

View File

@ -18,14 +18,22 @@ import (
"github.com/ClusterCockpit/cc-backend/schema"
)
func (r *jobResolver) MetaData(ctx context.Context, obj *schema.Job) (interface{}, error) {
return r.Repo.FetchMetadata(obj)
func (r *clusterResolver) Partitions(ctx context.Context, obj *model.Cluster) ([]string, error) {
return r.Repo.Partitions(obj.Name)
}
func (r *jobResolver) Tags(ctx context.Context, obj *schema.Job) ([]*schema.Tag, error) {
return r.Repo.GetTags(&obj.ID)
}
func (r *jobResolver) MetaData(ctx context.Context, obj *schema.Job) (interface{}, error) {
return r.Repo.FetchMetadata(obj)
}
func (r *jobResolver) UserData(ctx context.Context, obj *schema.Job) (*model.User, error) {
return auth.FetchUser(ctx, r.DB, obj.User)
}
func (r *mutationResolver) CreateTag(ctx context.Context, typeArg string, name string) (*schema.Tag, error) {
id, err := r.Repo.CreateTag(typeArg, name)
if err != nil {
@ -98,6 +106,10 @@ func (r *queryResolver) Tags(ctx context.Context) ([]*schema.Tag, error) {
return r.Repo.GetTags(nil)
}
func (r *queryResolver) User(ctx context.Context, username string) (*model.User, error) {
return auth.FetchUser(ctx, r.DB, username)
}
func (r *queryResolver) Job(ctx context.Context, id string) (*schema.Job, error) {
numericId, err := strconv.ParseInt(id, 10, 64)
if err != nil {
@ -144,7 +156,7 @@ func (r *queryResolver) JobMetrics(ctx context.Context, id string, metrics []str
return res, err
}
func (r *queryResolver) JobsFootprints(ctx context.Context, filter []*model.JobFilter, metrics []string) ([]*model.MetricFootprints, error) {
func (r *queryResolver) JobsFootprints(ctx context.Context, filter []*model.JobFilter, metrics []string) (*model.Footprints, error) {
return r.jobsFootprints(ctx, filter, metrics)
}
@ -204,7 +216,7 @@ func (r *queryResolver) NodeMetrics(ctx context.Context, cluster string, partiti
}
if metrics == nil {
for _, mc := range config.GetClusterConfig(cluster).MetricConfig {
for _, mc := range config.GetCluster(cluster).MetricConfig {
metrics = append(metrics, mc.Name)
}
}
@ -236,6 +248,9 @@ func (r *queryResolver) NodeMetrics(ctx context.Context, cluster string, partiti
return nodeMetrics, nil
}
// Cluster returns generated.ClusterResolver implementation.
func (r *Resolver) Cluster() generated.ClusterResolver { return &clusterResolver{r} }
// Job returns generated.JobResolver implementation.
func (r *Resolver) Job() generated.JobResolver { return &jobResolver{r} }
@ -245,6 +260,7 @@ func (r *Resolver) Mutation() generated.MutationResolver { return &mutationResol
// Query returns generated.QueryResolver implementation.
func (r *Resolver) Query() generated.QueryResolver { return &queryResolver{r} }
type clusterResolver struct{ *Resolver }
type jobResolver struct{ *Resolver }
type mutationResolver struct{ *Resolver }
type queryResolver struct{ *Resolver }

View File

@ -32,8 +32,8 @@ func (r *queryResolver) jobsStatistics(ctx context.Context, filter []*model.JobF
// `socketsPerNode` and `coresPerSocket` can differ from cluster to cluster, so we need to explicitly loop over those.
for _, cluster := range config.Clusters {
for _, partition := range cluster.Partitions {
corehoursCol := fmt.Sprintf("CAST(ROUND(SUM(job.duration * job.num_nodes * %d * %d) / 3600) as int)", partition.SocketsPerNode, partition.CoresPerSocket)
for _, subcluster := range cluster.SubClusters {
corehoursCol := fmt.Sprintf("CAST(ROUND(SUM(job.duration * job.num_nodes * %d * %d) / 3600) as int)", subcluster.SocketsPerNode, subcluster.CoresPerSocket)
var query sq.SelectBuilder
if groupBy == nil {
query = sq.Select(
@ -54,7 +54,7 @@ func (r *queryResolver) jobsStatistics(ctx context.Context, filter []*model.JobF
query = query.
Where("job.cluster = ?", cluster.Name).
Where("job.partition = ?", partition.Name)
Where("job.subcluster = ?", subcluster.Name)
query = repository.SecurityCheck(ctx, query)
for _, f := range filter {
@ -254,7 +254,7 @@ func (r *Resolver) rooflineHeatmap(ctx context.Context, filter []*model.JobFilte
}
// Helper function for the jobsFootprints GraphQL query placed here so that schema.resolvers.go is not too full.
func (r *queryResolver) jobsFootprints(ctx context.Context, filter []*model.JobFilter, metrics []string) ([]*model.MetricFootprints, error) {
func (r *queryResolver) jobsFootprints(ctx context.Context, filter []*model.JobFilter, metrics []string) (*model.Footprints, error) {
jobs, err := r.Repo.QueryJobs(ctx, filter, &model.PageRequest{Page: 1, ItemsPerPage: MAX_JOBS_FOR_ANALYSIS + 1}, nil)
if err != nil {
return nil, err
@ -268,19 +268,25 @@ func (r *queryResolver) jobsFootprints(ctx context.Context, filter []*model.JobF
avgs[i] = make([]schema.Float, 0, len(jobs))
}
nodehours := make([]schema.Float, 0, len(jobs))
for _, job := range jobs {
if err := metricdata.LoadAverages(job, metrics, avgs, ctx); err != nil {
return nil, err
}
nodehours = append(nodehours, schema.Float(float64(job.Duration)/60.0*float64(job.NumNodes)))
}
res := make([]*model.MetricFootprints, len(avgs))
for i, arr := range avgs {
res[i] = &model.MetricFootprints{
Name: metrics[i],
Footprints: arr,
Metric: metrics[i],
Data: arr,
}
}
return res, nil
return &model.Footprints{
Nodehours: nodehours,
Metrics: res,
}, nil
}

View File

@ -157,14 +157,14 @@ func GetStatistics(job *schema.Job) (map[string]schema.JobStatistics, error) {
// Writes a running job to the job-archive
func ArchiveJob(job *schema.Job, ctx context.Context) (*schema.JobMeta, error) {
allMetrics := make([]string, 0)
metricConfigs := config.GetClusterConfig(job.Cluster).MetricConfig
metricConfigs := config.GetCluster(job.Cluster).MetricConfig
for _, mc := range metricConfigs {
allMetrics = append(allMetrics, mc.Name)
}
// TODO: For now: Only single-node-jobs get archived in full resolution
// TODO: Talk about this! What resolutions to store data at...
scopes := []schema.MetricScope{schema.MetricScopeNode}
if job.NumNodes == 1 {
if job.NumNodes <= 8 {
scopes = append(scopes, schema.MetricScopeCore)
}

View File

@ -243,7 +243,7 @@ var (
func (ccms *CCMetricStore) buildQueries(job *schema.Job, metrics []string, scopes []schema.MetricScope) ([]ApiQuery, []schema.MetricScope, error) {
queries := make([]ApiQuery, 0, len(metrics)*len(scopes)*len(job.Resources))
topology := config.GetPartition(job.Cluster, job.Partition).Topology
topology := config.GetSubCluster(job.Cluster, job.SubCluster).Topology
assignedScope := []schema.MetricScope{}
for _, metric := range metrics {

View File

@ -72,7 +72,7 @@ var cache *lrucache.Cache = lrucache.New(512 * 1024 * 1024)
// Fetches the metric data for a job.
func LoadData(job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context) (schema.JobData, error) {
data := cache.Get(cacheKey(job, metrics, scopes), func() (interface{}, time.Duration, int) {
data := cache.Get(cacheKey(job, metrics, scopes), func() (_ interface{}, ttl time.Duration, size int) {
var jd schema.JobData
var err error
if job.State == schema.JobStateRunning ||
@ -88,7 +88,7 @@ func LoadData(job *schema.Job, metrics []string, scopes []schema.MetricScope, ct
}
if metrics == nil {
cluster := config.GetClusterConfig(job.Cluster)
cluster := config.GetCluster(job.Cluster)
for _, mc := range cluster.MetricConfig {
metrics = append(metrics, mc.Name)
}
@ -102,30 +102,43 @@ func LoadData(job *schema.Job, metrics []string, scopes []schema.MetricScope, ct
return err, 0, 0
}
}
size = jd.Size()
} else {
jd, err = loadFromArchive(job)
if err != nil {
return err, 0, 0
}
// Avoid sending unrequested data to the client:
if metrics != nil {
res := schema.JobData{}
for _, metric := range metrics {
if metricdata, ok := jd[metric]; ok {
res[metric] = metricdata
if perscope, ok := jd[metric]; ok {
if len(scopes) > 1 {
subset := make(map[schema.MetricScope]*schema.JobMetric)
for _, scope := range scopes {
if jm, ok := perscope[scope]; ok {
subset[scope] = jm
}
}
perscope = subset
}
res[metric] = perscope
}
}
jd = res
}
size = 1 // loadFromArchive() caches in the same cache.
}
ttl := 5 * time.Hour
ttl = 5 * time.Hour
if job.State == schema.JobStateRunning {
ttl = 2 * time.Minute
}
prepareJobData(job, jd, scopes)
return jd, ttl, jd.Size()
return jd, ttl, size
})
if err, ok := data.(error); ok {
@ -176,7 +189,7 @@ func LoadNodeData(cluster, partition string, metrics, nodes []string, scopes []s
}
if metrics == nil {
for _, m := range config.GetClusterConfig(cluster).MetricConfig {
for _, m := range config.GetCluster(cluster).MetricConfig {
metrics = append(metrics, m.Name)
}
}

View File

@ -16,12 +16,12 @@ import (
)
const NamedJobInsert string = `INSERT INTO job (
job_id, user, project, cluster, ` + "`partition`" + `, array_job_id, num_nodes, num_hwthreads, num_acc,
exclusive, monitoring_status, smt, job_state, start_time, duration, resources, meta_data,
job_id, user, project, cluster, subcluster, ` + "`partition`" + `, array_job_id, num_nodes, num_hwthreads, num_acc,
exclusive, monitoring_status, smt, job_state, start_time, duration, walltime, resources, meta_data,
mem_used_max, flops_any_avg, mem_bw_avg, load_avg, net_bw_avg, net_data_vol_total, file_bw_avg, file_data_vol_total
) VALUES (
:job_id, :user, :project, :cluster, :partition, :array_job_id, :num_nodes, :num_hwthreads, :num_acc,
:exclusive, :monitoring_status, :smt, :job_state, :start_time, :duration, :resources, :meta_data,
:job_id, :user, :project, :cluster, :subcluster, :partition, :array_job_id, :num_nodes, :num_hwthreads, :num_acc,
:exclusive, :monitoring_status, :smt, :job_state, :start_time, :duration, :walltime, :resources, :meta_data,
:mem_used_max, :flops_any_avg, :mem_bw_avg, :load_avg, :net_bw_avg, :net_data_vol_total, :file_bw_avg, :file_data_vol_total
);`
@ -122,12 +122,13 @@ func (r *JobRepository) ImportJob(jobMeta *schema.JobMeta, jobData *schema.JobDa
return nil
}
// This function also sets the subcluster if necessary!
func SanityChecks(job *schema.BaseJob) error {
if c := config.GetClusterConfig(job.Cluster); c == nil {
if c := config.GetCluster(job.Cluster); c == nil {
return fmt.Errorf("no such cluster: %#v", job.Cluster)
}
if p := config.GetPartition(job.Cluster, job.Partition); p == nil {
return fmt.Errorf("no such partition: %#v (on cluster %#v)", job.Partition, job.Cluster)
if err := config.AssignSubCluster(job); err != nil {
return err
}
if !job.State.Valid() {
return fmt.Errorf("not a valid job state: %#v", job.State)

View File

@ -1,4 +1,4 @@
package main
package repository
import (
"bufio"
@ -9,14 +9,13 @@ import (
"time"
"github.com/ClusterCockpit/cc-backend/log"
"github.com/ClusterCockpit/cc-backend/repository"
"github.com/ClusterCockpit/cc-backend/schema"
"github.com/jmoiron/sqlx"
)
// `AUTO_INCREMENT` is in a comment because of this hack:
// https://stackoverflow.com/a/41028314 (sqlite creates unique ids automatically)
const JOBS_DB_SCHEMA string = `
const JobsDBSchema string = `
DROP TABLE IF EXISTS jobtag;
DROP TABLE IF EXISTS job;
DROP TABLE IF EXISTS tag;
@ -25,13 +24,15 @@ const JOBS_DB_SCHEMA string = `
id INTEGER PRIMARY KEY /*!40101 AUTO_INCREMENT */,
job_id BIGINT NOT NULL,
cluster VARCHAR(255) NOT NULL,
subcluster VARCHAR(255) NOT NULL,
start_time BIGINT NOT NULL, -- Unix timestamp
user VARCHAR(255) NOT NULL,
project VARCHAR(255) NOT NULL,
` + "`partition`" + ` VARCHAR(255) NOT NULL, -- partition is a keyword in mysql -.-
array_job_id BIGINT NOT NULL,
duration INT,
duration INT NOT NULL DEFAULT 0,
walltime INT NOT NULL DEFAULT 0,
job_state VARCHAR(255) NOT NULL CHECK(job_state IN ('running', 'completed', 'failed', 'cancelled', 'stopped', 'timeout', 'preempted', 'out_of_memory')),
meta_data TEXT, -- JSON
resources TEXT NOT NULL, -- JSON
@ -66,7 +67,8 @@ const JOBS_DB_SCHEMA string = `
FOREIGN KEY (tag_id) REFERENCES tag (id) ON DELETE CASCADE);
`
const JOBS_DB_INDEXES string = `
// Indexes are created after the job-archive is traversed for faster inserts.
const JobsDbIndexes string = `
CREATE INDEX job_by_user ON job (user);
CREATE INDEX job_by_starttime ON job (start_time);
CREATE INDEX job_by_job_id ON job (job_id);
@ -75,12 +77,12 @@ const JOBS_DB_INDEXES string = `
// Delete the tables "job", "tag" and "jobtag" from the database and
// repopulate them using the jobs found in `archive`.
func initDB(db *sqlx.DB, archive string) error {
func InitDB(db *sqlx.DB, archive string) error {
starttime := time.Now()
log.Print("Building job table...")
// Basic database structure:
_, err := db.Exec(JOBS_DB_SCHEMA)
_, err := db.Exec(JobsDBSchema)
if err != nil {
return err
}
@ -94,16 +96,21 @@ func initDB(db *sqlx.DB, archive string) error {
return err
}
// Inserts are bundled into transactions because in sqlite,
// that speeds up inserts A LOT.
tx, err := db.Beginx()
if err != nil {
return err
}
stmt, err := tx.PrepareNamed(repository.NamedJobInsert)
stmt, err := tx.PrepareNamed(NamedJobInsert)
if err != nil {
return err
}
// Not using log.Print because we want the line to end with `\r` and
// this function is only ever called when a special command line flag
// is passed anyways.
fmt.Printf("%d jobs inserted...\r", 0)
i := 0
tags := make(map[string]int64)
@ -157,6 +164,8 @@ func initDB(db *sqlx.DB, archive string) error {
return err
}
// For compability with the old job-archive directory structure where
// there was no start time directory.
for _, startTimeDir := range startTimeDirs {
if startTimeDir.Type().IsRegular() && startTimeDir.Name() == "meta.json" {
if err := handleDirectory(dirpath); err != nil {
@ -178,7 +187,7 @@ func initDB(db *sqlx.DB, archive string) error {
// Create indexes after inserts so that they do not
// need to be continually updated.
if _, err := db.Exec(JOBS_DB_INDEXES); err != nil {
if _, err := db.Exec(JobsDbIndexes); err != nil {
return err
}
@ -224,7 +233,7 @@ func loadJob(tx *sqlx.Tx, stmt *sqlx.NamedStmt, tags map[string]int64, path stri
return err
}
if err := repository.SanityChecks(&job.BaseJob); err != nil {
if err := SanityChecks(&job.BaseJob); err != nil {
return err
}
@ -260,11 +269,3 @@ func loadJob(tx *sqlx.Tx, stmt *sqlx.NamedStmt, tags map[string]int64, path stri
return nil
}
func loadJobStat(job *schema.JobMeta, metric string) float64 {
if stats, ok := job.Statistics[metric]; ok {
return stats.Avg
}
return 0.0
}

View File

@ -13,6 +13,7 @@ import (
"github.com/ClusterCockpit/cc-backend/graph/model"
"github.com/ClusterCockpit/cc-backend/schema"
sq "github.com/Masterminds/squirrel"
"github.com/iamlouk/lrucache"
"github.com/jmoiron/sqlx"
)
@ -20,25 +21,27 @@ type JobRepository struct {
DB *sqlx.DB
stmtCache *sq.StmtCache
cache *lrucache.Cache
}
func (r *JobRepository) Init() error {
r.stmtCache = sq.NewStmtCache(r.DB)
r.cache = lrucache.New(1024 * 1024)
return nil
}
var jobColumns []string = []string{
"job.id", "job.job_id", "job.user", "job.project", "job.cluster", "job.start_time", "job.partition", "job.array_job_id",
"job.id", "job.job_id", "job.user", "job.project", "job.cluster", "job.subcluster", "job.start_time", "job.partition", "job.array_job_id",
"job.num_nodes", "job.num_hwthreads", "job.num_acc", "job.exclusive", "job.monitoring_status", "job.smt", "job.job_state",
"job.duration", "job.resources", // "job.meta_data",
"job.duration", "job.walltime", "job.resources", // "job.meta_data",
}
func scanJob(row interface{ Scan(...interface{}) error }) (*schema.Job, error) {
job := &schema.Job{}
if err := row.Scan(
&job.ID, &job.JobID, &job.User, &job.Project, &job.Cluster, &job.StartTimeUnix, &job.Partition, &job.ArrayJobId,
&job.ID, &job.JobID, &job.User, &job.Project, &job.Cluster, &job.SubCluster, &job.StartTimeUnix, &job.Partition, &job.ArrayJobId,
&job.NumNodes, &job.NumHWThreads, &job.NumAcc, &job.Exclusive, &job.MonitoringStatus, &job.SMT, &job.State,
&job.Duration, &job.RawResources /*&job.MetaData*/); err != nil {
&job.Duration, &job.Walltime, &job.RawResources /*&job.MetaData*/); err != nil {
return nil, err
}
@ -56,6 +59,12 @@ func scanJob(row interface{ Scan(...interface{}) error }) (*schema.Job, error) {
}
func (r *JobRepository) FetchMetadata(job *schema.Job) (map[string]string, error) {
cachekey := fmt.Sprintf("metadata:%d", job.ID)
if cached := r.cache.Get(cachekey, nil); cached != nil {
job.MetaData = cached.(map[string]string)
return job.MetaData, nil
}
if err := sq.Select("job.meta_data").From("job").Where("job.id = ?", job.ID).
RunWith(r.stmtCache).QueryRow().Scan(&job.RawMetaData); err != nil {
return nil, err
@ -69,9 +78,42 @@ func (r *JobRepository) FetchMetadata(job *schema.Job) (map[string]string, error
return nil, err
}
r.cache.Put(cachekey, job.MetaData, len(job.RawMetaData), 24*time.Hour)
return job.MetaData, nil
}
func (r *JobRepository) UpdateMetadata(job *schema.Job, key, val string) (err error) {
cachekey := fmt.Sprintf("metadata:%d", job.ID)
r.cache.Del(cachekey)
if job.MetaData == nil {
if _, err = r.FetchMetadata(job); err != nil {
return err
}
}
if job.MetaData != nil {
cpy := make(map[string]string, len(job.MetaData)+1)
for k, v := range job.MetaData {
cpy[k] = v
}
cpy[key] = val
job.MetaData = cpy
} else {
job.MetaData = map[string]string{key: val}
}
if job.RawMetaData, err = json.Marshal(job.MetaData); err != nil {
return err
}
if _, err = sq.Update("job").Set("meta_data", job.RawMetaData).Where("job.id = ?", job.ID).RunWith(r.stmtCache).Exec(); err != nil {
return err
}
r.cache.Put(cachekey, job.MetaData, len(job.RawMetaData), 24*time.Hour)
return nil
}
// Find executes a SQL query to find a specific batch job.
// The job is queried using the batch job id, the cluster name,
// and the start time of the job in UNIX epoch time seconds.
@ -120,11 +162,11 @@ func (r *JobRepository) Start(job *schema.JobMeta) (id int64, err error) {
}
res, err := r.DB.NamedExec(`INSERT INTO job (
job_id, user, project, cluster, `+"`partition`"+`, array_job_id, num_nodes, num_hwthreads, num_acc,
exclusive, monitoring_status, smt, job_state, start_time, duration, resources, meta_data
job_id, user, project, cluster, subcluster, `+"`partition`"+`, array_job_id, num_nodes, num_hwthreads, num_acc,
exclusive, monitoring_status, smt, job_state, start_time, duration, walltime, resources, meta_data
) VALUES (
:job_id, :user, :project, :cluster, :partition, :array_job_id, :num_nodes, :num_hwthreads, :num_acc,
:exclusive, :monitoring_status, :smt, :job_state, :start_time, :duration, :resources, :meta_data
:job_id, :user, :project, :cluster, :subcluster, :partition, :array_job_id, :num_nodes, :num_hwthreads, :num_acc,
:exclusive, :monitoring_status, :smt, :job_state, :start_time, :duration, :walltime, :resources, :meta_data
);`, job)
if err != nil {
return -1, err
@ -260,3 +302,19 @@ func (r *JobRepository) FindJobOrUser(ctx context.Context, searchterm string) (j
return 0, "", ErrNotFound
}
func (r *JobRepository) Partitions(cluster string) ([]string, error) {
var err error
partitions := r.cache.Get("partitions:"+cluster, func() (interface{}, time.Duration, int) {
parts := []string{}
if err = r.DB.Select(&parts, `SELECT DISTINCT job.partition FROM job WHERE job.cluster = ?;`, cluster); err != nil {
return nil, 0, 1000
}
return parts, 1 * time.Hour, 1
})
if err != nil {
return nil, err
}
return partitions.([]string), nil
}

View File

@ -5,14 +5,17 @@ import (
"testing"
"github.com/jmoiron/sqlx"
"github.com/ClusterCockpit/cc-backend/test"
_ "github.com/mattn/go-sqlite3"
)
var db *sqlx.DB
func init() {
db = test.InitDB()
var err error
db, err = sqlx.Open("sqlite3", "../test/test.db")
if err != nil {
fmt.Println(err)
}
}
func setup(t *testing.T) *JobRepository {

157
routes.go
View File

@ -9,6 +9,9 @@ import (
"github.com/ClusterCockpit/cc-backend/auth"
"github.com/ClusterCockpit/cc-backend/config"
"github.com/ClusterCockpit/cc-backend/graph"
"github.com/ClusterCockpit/cc-backend/graph/model"
"github.com/ClusterCockpit/cc-backend/log"
"github.com/ClusterCockpit/cc-backend/schema"
"github.com/ClusterCockpit/cc-backend/templates"
"github.com/gorilla/mux"
@ -24,6 +27,136 @@ type Route struct {
Setup func(i InfoType, r *http.Request) InfoType
}
var routes []Route = []Route{
{"/", "home.tmpl", "ClusterCockpit", false, setupHomeRoute},
{"/config", "config.tmpl", "Settings", false, func(i InfoType, r *http.Request) InfoType { return i }},
{"/monitoring/jobs/", "monitoring/jobs.tmpl", "Jobs - ClusterCockpit", true, func(i InfoType, r *http.Request) InfoType { return i }},
{"/monitoring/job/{id:[0-9]+}", "monitoring/job.tmpl", "Job <ID> - ClusterCockpit", false, setupJobRoute},
{"/monitoring/users/", "monitoring/list.tmpl", "Users - ClusterCockpit", true, func(i InfoType, r *http.Request) InfoType { i["listType"] = "USER"; return i }},
{"/monitoring/projects/", "monitoring/list.tmpl", "Projects - ClusterCockpit", true, func(i InfoType, r *http.Request) InfoType { i["listType"] = "PROJECT"; return i }},
{"/monitoring/tags/", "monitoring/taglist.tmpl", "Tags - ClusterCockpit", false, setupTaglistRoute},
{"/monitoring/user/{id}", "monitoring/user.tmpl", "User <ID> - ClusterCockpit", true, setupUserRoute},
{"/monitoring/systems/{cluster}", "monitoring/systems.tmpl", "Cluster <ID> - ClusterCockpit", false, setupClusterRoute},
{"/monitoring/node/{cluster}/{hostname}", "monitoring/node.tmpl", "Node <ID> - ClusterCockpit", false, setupNodeRoute},
{"/monitoring/analysis/{cluster}", "monitoring/analysis.tmpl", "Analaysis - ClusterCockpit", true, setupAnalysisRoute},
}
func setupHomeRoute(i InfoType, r *http.Request) InfoType {
type cluster struct {
Name string
RunningJobs int
TotalJobs int
RecentShortJobs int
}
runningJobs, err := jobRepo.CountGroupedJobs(r.Context(), model.AggregateCluster, []*model.JobFilter{{
State: []schema.JobState{schema.JobStateRunning},
}}, nil)
if err != nil {
log.Errorf("failed to count jobs: %s", err.Error())
runningJobs = map[string]int{}
}
totalJobs, err := jobRepo.CountGroupedJobs(r.Context(), model.AggregateCluster, nil, nil)
if err != nil {
log.Errorf("failed to count jobs: %s", err.Error())
totalJobs = map[string]int{}
}
from := time.Now().Add(-24 * time.Hour)
recentShortJobs, err := jobRepo.CountGroupedJobs(r.Context(), model.AggregateCluster, []*model.JobFilter{{
StartTime: &model.TimeRange{From: &from, To: nil},
Duration: &model.IntRange{From: 0, To: graph.ShortJobDuration},
}}, nil)
if err != nil {
log.Errorf("failed to count jobs: %s", err.Error())
recentShortJobs = map[string]int{}
}
clusters := make([]cluster, 0)
for _, c := range config.Clusters {
clusters = append(clusters, cluster{
Name: c.Name,
RunningJobs: runningJobs[c.Name],
TotalJobs: totalJobs[c.Name],
RecentShortJobs: recentShortJobs[c.Name],
})
}
i["clusters"] = clusters
return i
}
func setupJobRoute(i InfoType, r *http.Request) InfoType {
i["id"] = mux.Vars(r)["id"]
return i
}
func setupUserRoute(i InfoType, r *http.Request) InfoType {
username := mux.Vars(r)["id"]
i["id"] = username
i["username"] = username
if user, _ := auth.FetchUser(r.Context(), jobRepo.DB, username); user != nil {
i["name"] = user.Name
i["email"] = user.Email
}
return i
}
func setupClusterRoute(i InfoType, r *http.Request) InfoType {
vars := mux.Vars(r)
i["id"] = vars["cluster"]
i["cluster"] = vars["cluster"]
from, to := r.URL.Query().Get("from"), r.URL.Query().Get("to")
if from != "" || to != "" {
i["from"] = from
i["to"] = to
}
return i
}
func setupNodeRoute(i InfoType, r *http.Request) InfoType {
vars := mux.Vars(r)
i["cluster"] = vars["cluster"]
i["hostname"] = vars["hostname"]
from, to := r.URL.Query().Get("from"), r.URL.Query().Get("to")
if from != "" || to != "" {
i["from"] = from
i["to"] = to
}
return i
}
func setupAnalysisRoute(i InfoType, r *http.Request) InfoType {
i["cluster"] = mux.Vars(r)["cluster"]
return i
}
func setupTaglistRoute(i InfoType, r *http.Request) InfoType {
var username *string = nil
if user := auth.GetUser(r.Context()); user != nil && !user.HasRole(auth.RoleAdmin) {
username = &user.Username
}
tags, counts, err := jobRepo.CountTags(username)
tagMap := make(map[string][]map[string]interface{})
if err != nil {
log.Errorf("GetTags failed: %s", err.Error())
i["tagmap"] = tagMap
return i
}
for _, tag := range tags {
tagItem := map[string]interface{}{
"id": tag.ID,
"name": tag.Name,
"count": counts[tag.Name],
}
tagMap[tag.Type] = append(tagMap[tag.Type], tagItem)
}
i["tagmap"] = tagMap
return i
}
func buildFilterPresets(query url.Values) map[string]interface{} {
filterPresets := map[string]interface{}{}
@ -41,8 +174,8 @@ func buildFilterPresets(query url.Values) map[string]interface{} {
filterPresets["user"] = query.Get("user")
filterPresets["userMatch"] = "eq"
}
if query.Get("state") != "" && schema.JobState(query.Get("state")).Valid() {
filterPresets["state"] = query.Get("state")
if len(query["state"]) != 0 {
filterPresets["state"] = query["state"]
}
if rawtags, ok := query["tag"]; ok {
tags := make([]int, len(rawtags))
@ -55,6 +188,16 @@ func buildFilterPresets(query url.Values) map[string]interface{} {
}
filterPresets["tags"] = tags
}
if query.Get("duration") != "" {
parts := strings.Split(query.Get("duration"), "-")
if len(parts) == 2 {
a, e1 := strconv.Atoi(parts[0])
b, e2 := strconv.Atoi(parts[1])
if e1 == nil && e2 == nil {
filterPresets["duration"] = map[string]int{"from": a, "to": b}
}
}
}
if query.Get("numNodes") != "" {
parts := strings.Split(query.Get("numNodes"), "-")
if len(parts) == 2 {
@ -65,6 +208,16 @@ func buildFilterPresets(query url.Values) map[string]interface{} {
}
}
}
if query.Get("numAccelerators") != "" {
parts := strings.Split(query.Get("numAccelerators"), "-")
if len(parts) == 2 {
a, e1 := strconv.Atoi(parts[0])
b, e2 := strconv.Atoi(parts[1])
if e1 == nil && e2 == nil {
filterPresets["numAccelerators"] = map[string]int{"from": a, "to": b}
}
}
}
if query.Get("jobId") != "" {
filterPresets["jobId"] = query.Get("jobId")
}

View File

@ -12,6 +12,9 @@ import (
"syscall"
)
// Very simple and limited .env file reader.
// All variable definitions found are directly
// added to the processes environment.
func loadEnv(file string) error {
f, err := os.Open(file)
if err != nil {
@ -74,6 +77,10 @@ func loadEnv(file string) error {
return s.Err()
}
// Changes the processes user and group to that
// specified in the config.json. The go runtime
// takes care of all threads (and not only the calling one)
// executing the underlying systemcall.
func dropPrivileges() error {
if programConfig.Group != "" {
g, err := user.LookupGroup(programConfig.Group)

View File

@ -14,6 +14,7 @@ type BaseJob struct {
User string `json:"user" db:"user"`
Project string `json:"project" db:"project"`
Cluster string `json:"cluster" db:"cluster"`
SubCluster string `json:"subCluster" db:"subcluster"`
Partition string `json:"partition" db:"partition"`
ArrayJobId int32 `json:"arrayJobId" db:"array_job_id"`
NumNodes int32 `json:"numNodes" db:"num_nodes"`
@ -24,6 +25,7 @@ type BaseJob struct {
SMT int32 `json:"smt" db:"smt"`
State JobState `json:"jobState" db:"job_state"`
Duration int32 `json:"duration" db:"duration"`
Walltime int64 `json:"walltime" db:"walltime"`
Tags []*Tag `json:"tags"`
RawResources []byte `json:"-" db:"resources"`
Resources []*Resource `json:"resources"`
@ -54,7 +56,6 @@ type Job struct {
type JobMeta struct {
ID *int64 `json:"id,omitempty"` // never used in the job-archive, only available via REST-API
BaseJob
Walltime int64 `json:"walltime"` // TODO: Missing in DB
StartTime int64 `json:"startTime" db:"start_time"`
Statistics map[string]JobStatistics `json:"statistics,omitempty"`
}

224
server.go
View File

@ -25,12 +25,11 @@ import (
"github.com/ClusterCockpit/cc-backend/config"
"github.com/ClusterCockpit/cc-backend/graph"
"github.com/ClusterCockpit/cc-backend/graph/generated"
"github.com/ClusterCockpit/cc-backend/graph/model"
"github.com/ClusterCockpit/cc-backend/log"
"github.com/ClusterCockpit/cc-backend/metricdata"
"github.com/ClusterCockpit/cc-backend/repository"
"github.com/ClusterCockpit/cc-backend/schema"
"github.com/ClusterCockpit/cc-backend/templates"
"github.com/google/gops/agent"
"github.com/gorilla/handlers"
"github.com/gorilla/mux"
"github.com/jmoiron/sqlx"
@ -39,7 +38,6 @@ import (
_ "github.com/mattn/go-sqlite3"
)
var db *sqlx.DB
var jobRepo *repository.JobRepository
// Format of the configurartion (file). See below for the defaults.
@ -83,6 +81,10 @@ type ProgramConfig struct {
HttpsCertFile string `json:"https-cert-file"`
HttpsKeyFile string `json:"https-key-file"`
// If not the empty string and `addr` does not end in ":80",
// redirect every request incoming at port 80 to that url.
RedirectHttpTo string `json:"redirect-http-to"`
// If overwriten, at least all the options in the defaults below must
// be provided! Most options here can be overwritten by the user.
UiDefaults map[string]interface{} `json:"ui-defaults"`
@ -102,8 +104,6 @@ var programConfig ProgramConfig = ProgramConfig{
LdapConfig: nil,
SessionMaxAge: "168h",
JwtMaxAge: "0",
HttpsCertFile: "",
HttpsKeyFile: "",
UiDefaults: map[string]interface{}{
"analysis_view_histogramMetrics": []string{"flops_any", "mem_bw", "mem_used"},
"analysis_view_scatterPlotMetrics": [][]string{{"flops_any", "mem_bw"}, {"flops_any", "cpu_load"}, {"cpu_load", "mem_bw"}},
@ -112,7 +112,7 @@ var programConfig ProgramConfig = ProgramConfig{
"job_view_selectedMetrics": []string{"flops_any", "mem_bw", "mem_used"},
"plot_general_colorBackground": true,
"plot_general_colorscheme": []string{"#00bfff", "#0000ff", "#ff00ff", "#ff0000", "#ff8000", "#ffff00", "#80ff00"},
"plot_general_lineWidth": 1,
"plot_general_lineWidth": 3,
"plot_list_hideShortRunningJobs": 5 * 60,
"plot_list_jobsPerPage": 10,
"plot_list_selectedMetrics": []string{"cpu_load", "ipc", "mem_used", "flops_any", "mem_bw"},
@ -124,146 +124,28 @@ var programConfig ProgramConfig = ProgramConfig{
},
}
func setupHomeRoute(i InfoType, r *http.Request) InfoType {
type cluster struct {
Name string
RunningJobs int
TotalJobs int
RecentShortJobs int
}
runningJobs, err := jobRepo.CountGroupedJobs(r.Context(), model.AggregateCluster, []*model.JobFilter{{
State: []schema.JobState{schema.JobStateRunning},
}}, nil)
if err != nil {
log.Errorf("failed to count jobs: %s", err.Error())
runningJobs = map[string]int{}
}
totalJobs, err := jobRepo.CountGroupedJobs(r.Context(), model.AggregateCluster, nil, nil)
if err != nil {
log.Errorf("failed to count jobs: %s", err.Error())
totalJobs = map[string]int{}
}
from := time.Now().Add(-24 * time.Hour)
recentShortJobs, err := jobRepo.CountGroupedJobs(r.Context(), model.AggregateCluster, []*model.JobFilter{{
StartTime: &model.TimeRange{From: &from, To: nil},
Duration: &model.IntRange{From: 0, To: graph.ShortJobDuration},
}}, nil)
if err != nil {
log.Errorf("failed to count jobs: %s", err.Error())
recentShortJobs = map[string]int{}
}
clusters := make([]cluster, 0)
for _, c := range config.Clusters {
clusters = append(clusters, cluster{
Name: c.Name,
RunningJobs: runningJobs[c.Name],
TotalJobs: totalJobs[c.Name],
RecentShortJobs: recentShortJobs[c.Name],
})
}
i["clusters"] = clusters
return i
}
func setupJobRoute(i InfoType, r *http.Request) InfoType {
i["id"] = mux.Vars(r)["id"]
return i
}
func setupUserRoute(i InfoType, r *http.Request) InfoType {
i["id"] = mux.Vars(r)["id"]
i["username"] = mux.Vars(r)["id"]
return i
}
func setupClusterRoute(i InfoType, r *http.Request) InfoType {
vars := mux.Vars(r)
i["id"] = vars["cluster"]
i["cluster"] = vars["cluster"]
from, to := r.URL.Query().Get("from"), r.URL.Query().Get("to")
if from != "" || to != "" {
i["from"] = from
i["to"] = to
}
return i
}
func setupNodeRoute(i InfoType, r *http.Request) InfoType {
vars := mux.Vars(r)
i["cluster"] = vars["cluster"]
i["hostname"] = vars["hostname"]
from, to := r.URL.Query().Get("from"), r.URL.Query().Get("to")
if from != "" || to != "" {
i["from"] = from
i["to"] = to
}
return i
}
func setupAnalysisRoute(i InfoType, r *http.Request) InfoType {
i["cluster"] = mux.Vars(r)["cluster"]
return i
}
func setupTaglistRoute(i InfoType, r *http.Request) InfoType {
var username *string = nil
if user := auth.GetUser(r.Context()); user != nil && !user.HasRole(auth.RoleAdmin) {
username = &user.Username
}
tags, counts, err := jobRepo.CountTags(username)
tagMap := make(map[string][]map[string]interface{})
if err != nil {
log.Errorf("GetTags failed: %s", err.Error())
i["tagmap"] = tagMap
return i
}
for _, tag := range tags {
tagItem := map[string]interface{}{
"id": tag.ID,
"name": tag.Name,
"count": counts[tag.Name],
}
tagMap[tag.Type] = append(tagMap[tag.Type], tagItem)
}
log.Infof("TAGS %+v", tags)
i["tagmap"] = tagMap
return i
}
var routes []Route = []Route{
{"/", "home.tmpl", "ClusterCockpit", false, setupHomeRoute},
{"/config", "config.tmpl", "Settings", false, func(i InfoType, r *http.Request) InfoType { return i }},
{"/monitoring/jobs/", "monitoring/jobs.tmpl", "Jobs - ClusterCockpit", true, func(i InfoType, r *http.Request) InfoType { return i }},
{"/monitoring/job/{id:[0-9]+}", "monitoring/job.tmpl", "Job <ID> - ClusterCockpit", false, setupJobRoute},
{"/monitoring/users/", "monitoring/list.tmpl", "Users - ClusterCockpit", true, func(i InfoType, r *http.Request) InfoType { i["listType"] = "USER"; return i }},
{"/monitoring/projects/", "monitoring/list.tmpl", "Projects - ClusterCockpit", true, func(i InfoType, r *http.Request) InfoType { i["listType"] = "PROJECT"; return i }},
{"/monitoring/tags/", "monitoring/taglist.tmpl", "Tags - ClusterCockpit", false, setupTaglistRoute},
{"/monitoring/user/{id}", "monitoring/user.tmpl", "User <ID> - ClusterCockpit", true, setupUserRoute},
{"/monitoring/systems/{cluster}", "monitoring/systems.tmpl", "Cluster <ID> - ClusterCockpit", false, setupClusterRoute},
{"/monitoring/node/{cluster}/{hostname}", "monitoring/node.tmpl", "Node <ID> - ClusterCockpit", false, setupNodeRoute},
{"/monitoring/analysis/{cluster}", "monitoring/analysis.tmpl", "Analaysis - ClusterCockpit", true, setupAnalysisRoute},
}
func main() {
var flagReinitDB, flagStopImmediately, flagSyncLDAP bool
var flagReinitDB, flagStopImmediately, flagSyncLDAP, flagGops bool
var flagConfigFile, flagImportJob string
var flagNewUser, flagDelUser, flagGenJWT string
flag.BoolVar(&flagReinitDB, "init-db", false, "Go through job-archive and re-initialize `job`, `tag`, and `jobtag` tables")
flag.BoolVar(&flagSyncLDAP, "sync-ldap", false, "Sync the `user` table with ldap")
flag.BoolVar(&flagReinitDB, "init-db", false, "Go through job-archive and re-initialize the 'job', 'tag', and 'jobtag' tables (all running jobs will be lost!)")
flag.BoolVar(&flagSyncLDAP, "sync-ldap", false, "Sync the 'user' table with ldap")
flag.BoolVar(&flagStopImmediately, "no-server", false, "Do not start a server, stop right after initialization and argument handling")
flag.StringVar(&flagConfigFile, "config", "", "Location of the config file for this server (overwrites the defaults)")
flag.BoolVar(&flagGops, "gops", false, "Listen via github.com/google/gops/agent (for debugging)")
flag.StringVar(&flagConfigFile, "config", "", "Overwrite the global config options by those specified in `config.json`")
flag.StringVar(&flagNewUser, "add-user", "", "Add a new user. Argument format: `<username>:[admin,api,user]:<password>`")
flag.StringVar(&flagDelUser, "del-user", "", "Remove user by username")
flag.StringVar(&flagGenJWT, "jwt", "", "Generate and print a JWT for the user specified by the username")
flag.StringVar(&flagDelUser, "del-user", "", "Remove user by `username`")
flag.StringVar(&flagGenJWT, "jwt", "", "Generate and print a JWT for the user specified by its `username`")
flag.StringVar(&flagImportJob, "import-job", "", "Import a job. Argument format: `<path-to-meta.json>:<path-to-data.json>,...`")
flag.Parse()
// See https://github.com/google/gops (Runtime overhead is almost zero)
if flagGops {
if err := agent.Listen(agent.Options{}); err != nil {
log.Fatalf("gops/agent.Listen failed: %s", err.Error())
}
}
if err := loadEnv("./.env"); err != nil && !os.IsNotExist(err) {
log.Fatalf("parsing './.env' file failed: %s", err.Error())
}
@ -281,18 +163,24 @@ func main() {
}
}
// As a special case for `db`, allow using an environment variable instead of the value
// stored in the config. This can be done for people having security concerns about storing
// the password for their mysql database in the config.json.
if strings.HasPrefix(programConfig.DB, "env:") {
envvar := strings.TrimPrefix(programConfig.DB, "env:")
programConfig.DB = os.Getenv(envvar)
}
var err error
var db *sqlx.DB
if programConfig.DBDriver == "sqlite3" {
db, err = sqlx.Open("sqlite3", fmt.Sprintf("%s?_foreign_keys=on", programConfig.DB))
if err != nil {
log.Fatal(err)
}
// sqlite does not multithread. Having more than one connection open would just mean
// waiting for locks.
db.SetMaxOpenConns(1)
} else if programConfig.DBDriver == "mysql" {
db, err = sqlx.Open("mysql", fmt.Sprintf("%s?multiStatements=true", programConfig.DB))
@ -307,7 +195,9 @@ func main() {
log.Fatalf("unsupported database driver: %s", programConfig.DBDriver)
}
// Initialize sub-modules...
// Initialize sub-modules and handle all command line flags.
// The order here is important! For example, the metricdata package
// depends on the config package.
var authentication *auth.Authentication
if !programConfig.DisableAuthentication {
@ -370,7 +260,7 @@ func main() {
}
if flagReinitDB {
if err := initDB(db, programConfig.JobArchive); err != nil {
if err := repository.InitDB(db, programConfig.JobArchive); err != nil {
log.Fatal(err)
}
}
@ -390,11 +280,13 @@ func main() {
return
}
// Build routes...
// Setup the http.Handler/Router used by the server
resolver := &graph.Resolver{DB: db, Repo: jobRepo}
graphQLEndpoint := handler.NewDefaultServer(generated.NewExecutableSchema(generated.Config{Resolvers: resolver}))
if os.Getenv("DEBUG") != "1" {
// Having this handler means that a error message is returned via GraphQL instead of the connection simply beeing closed.
// The problem with this is that then, no more stacktrace is printed to stderr.
graphQLEndpoint.SetRecoverFunc(func(ctx context.Context, err interface{}) error {
switch e := err.(type) {
case string:
@ -407,7 +299,6 @@ func main() {
})
}
graphQLPlayground := playground.Handler("GraphQL playground", "/query")
api := &api.RestApi{
JobRepository: jobRepo,
Resolver: resolver,
@ -415,33 +306,21 @@ func main() {
Authentication: authentication,
}
handleGetLogin := func(rw http.ResponseWriter, r *http.Request) {
templates.Render(rw, r, "login.tmpl", &templates.Page{
Title: "Login",
})
}
r := mux.NewRouter()
r.NotFoundHandler = http.HandlerFunc(func(rw http.ResponseWriter, r *http.Request) {
templates.Render(rw, r, "404.tmpl", &templates.Page{
Title: "Not found",
})
})
r.Handle("/playground", graphQLPlayground)
r.HandleFunc("/login", handleGetLogin).Methods(http.MethodGet)
r.HandleFunc("/login", func(rw http.ResponseWriter, r *http.Request) {
templates.Render(rw, r, "login.tmpl", &templates.Page{Title: "Login"})
}).Methods(http.MethodGet)
r.HandleFunc("/imprint", func(rw http.ResponseWriter, r *http.Request) {
templates.Render(rw, r, "imprint.tmpl", &templates.Page{
Title: "Imprint",
})
templates.Render(rw, r, "imprint.tmpl", &templates.Page{Title: "Imprint"})
})
r.HandleFunc("/privacy", func(rw http.ResponseWriter, r *http.Request) {
templates.Render(rw, r, "privacy.tmpl", &templates.Page{
Title: "Privacy",
})
templates.Render(rw, r, "privacy.tmpl", &templates.Page{Title: "Privacy"})
})
// Some routes, such as /login or /query, should only be accessible to a user that is logged in.
// Those should be mounted to this subrouter. If authentication is enabled, a middleware will prevent
// any unauthenticated accesses.
secured := r.PathPrefix("/").Subrouter()
if !programConfig.DisableAuthentication {
r.Handle("/login", authentication.Login(
@ -480,8 +359,11 @@ func main() {
})
})
}
r.Handle("/playground", playground.Handler("GraphQL playground", "/query"))
secured.Handle("/query", graphQLEndpoint)
// Send a searchId and then reply with a redirect to a user or job.
secured.HandleFunc("/search", func(rw http.ResponseWriter, r *http.Request) {
if search := r.URL.Query().Get("searchId"); search != "" {
job, username, err := api.JobRepository.FindJobOrUser(r.Context(), search)
@ -505,6 +387,7 @@ func main() {
}
})
// Mount all /monitoring/... and /api/... routes.
setupRoutes(secured, routes)
api.MountRoutes(secured)
@ -515,11 +398,18 @@ func main() {
handlers.AllowedHeaders([]string{"X-Requested-With", "Content-Type", "Authorization"}),
handlers.AllowedMethods([]string{"GET", "POST", "HEAD", "OPTIONS"}),
handlers.AllowedOrigins([]string{"*"})))
handler := handlers.CustomLoggingHandler(log.InfoWriter, r, func(w io.Writer, params handlers.LogFormatterParams) {
log.Finfof(w, "%s %s (%d, %.02fkb, %dms)",
handler := handlers.CustomLoggingHandler(io.Discard, r, func(_ io.Writer, params handlers.LogFormatterParams) {
if strings.HasPrefix(params.Request.RequestURI, "/api/") {
log.Infof("%s %s (%d, %.02fkb, %dms)",
params.Request.Method, params.URL.RequestURI(),
params.StatusCode, float32(params.Size)/1024,
time.Since(params.TimeStamp).Milliseconds())
} else {
log.Debugf("%s %s (%d, %.02fkb, %dms)",
params.Request.Method, params.URL.RequestURI(),
params.StatusCode, float32(params.Size)/1024,
time.Since(params.TimeStamp).Milliseconds())
}
})
var wg sync.WaitGroup
@ -537,6 +427,12 @@ func main() {
log.Fatal(err)
}
if !strings.HasSuffix(programConfig.Addr, ":80") && programConfig.RedirectHttpTo != "" {
go func() {
http.ListenAndServe(":80", http.RedirectHandler(programConfig.RedirectHttpTo, http.StatusMovedPermanently))
}()
}
if programConfig.HttpsCertFile != "" && programConfig.HttpsKeyFile != "" {
cert, err := tls.LoadX509KeyPair(programConfig.HttpsCertFile, programConfig.HttpsKeyFile)
if err != nil {

20
startDemo.sh Executable file
View File

@ -0,0 +1,20 @@
#!/bin/sh
mkdir ./var
cd ./var
wget https://hpc-mover.rrze.uni-erlangen.de/HPC-Data/0x7b58aefb/eig7ahyo6fo2bais0ephuf2aitohv1ai/job-archive.tar.xz
tar xJf job-archive.tar.xz
rm ./job-archive.tar.xz
touch ./job.db
cd ../frontend
yarn install
yarn build
cd ..
go get
go build
./cc-backend --init-db --add-user demo:admin:AdminDev --no-server
./cc-backend

View File

@ -79,9 +79,7 @@
listElement.querySelectorAll('button.get-jwt').forEach(e => e.addEventListener('click', event => {
let row = event.target.parentElement.parentElement
let username = row.children[0].innerText
let formData = new FormData()
formData.append('username', username)
fetch('/api/jwt/', { method: 'POST', body: formData })
fetch(`/api/jwt/?username=${username}`)
.then(res => res.text())
.then(text => alert(text))
}))

View File

@ -1,4 +1,4 @@
package main
package test
import (
"bytes"
@ -11,6 +11,7 @@ import (
"path/filepath"
"reflect"
"strconv"
"strings"
"testing"
"github.com/ClusterCockpit/cc-backend/api"
@ -21,18 +22,21 @@ import (
"github.com/ClusterCockpit/cc-backend/schema"
"github.com/gorilla/mux"
"github.com/jmoiron/sqlx"
_ "github.com/mattn/go-sqlite3"
)
func setup(t *testing.T) *api.RestApi {
if db != nil {
panic("prefer using sub-tests (`t.Run`) or implement `cleanup` before calling setup twice.")
}
const testclusterJson = `{
"name": "testcluster",
"partitions": [
"subClusters": [
{
"name": "default",
"name": "sc0",
"nodes": "host120,host121,host122"
},
{
"name": "sc1",
"nodes": "host123,host124,host125",
"processorType": "Intel Core i7-4770",
"socketsPerNode": 1,
"coresPerSocket": 4,
@ -91,17 +95,17 @@ func setup(t *testing.T) *api.RestApi {
}
f.Close()
db, err = sqlx.Open("sqlite3", fmt.Sprintf("%s?_foreign_keys=on", dbfilepath))
db, err := sqlx.Open("sqlite3", fmt.Sprintf("%s?_foreign_keys=on", dbfilepath))
if err != nil {
t.Fatal(err)
}
db.SetMaxOpenConns(1)
if _, err := db.Exec(JOBS_DB_SCHEMA); err != nil {
if _, err := db.Exec(repository.JobsDBSchema); err != nil {
t.Fatal(err)
}
if err := config.Init(db, false, programConfig.UiDefaults, jobarchive); err != nil {
if err := config.Init(db, false, map[string]interface{}{}, jobarchive); err != nil {
t.Fatal(err)
}
@ -141,7 +145,7 @@ func TestRestApi(t *testing.T) {
Timestep: 60,
Series: []schema.Series{
{
Hostname: "testhost",
Hostname: "host123",
Statistics: &schema.MetricStatistics{Min: 0.1, Avg: 0.2, Max: 0.3},
Data: []schema.Float{0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.3, 0.3, 0.3},
},
@ -173,7 +177,7 @@ func TestRestApi(t *testing.T) {
"tags": [{ "type": "testTagType", "name": "testTagName" }],
"resources": [
{
"hostname": "testhost",
"hostname": "host123",
"hwthreads": [0, 1, 2, 3, 4, 5, 6, 7]
}
],
@ -211,6 +215,7 @@ func TestRestApi(t *testing.T) {
job.User != "testuser" ||
job.Project != "testproj" ||
job.Cluster != "testcluster" ||
job.SubCluster != "sc1" ||
job.Partition != "default" ||
job.ArrayJobId != 0 ||
job.NumNodes != 1 ||
@ -219,7 +224,7 @@ func TestRestApi(t *testing.T) {
job.Exclusive != 1 ||
job.MonitoringStatus != 1 ||
job.SMT != 1 ||
!reflect.DeepEqual(job.Resources, []*schema.Resource{{Hostname: "testhost", HWThreads: []int{0, 1, 2, 3, 4, 5, 6, 7}}}) ||
!reflect.DeepEqual(job.Resources, []*schema.Resource{{Hostname: "host123", HWThreads: []int{0, 1, 2, 3, 4, 5, 6, 7}}}) ||
job.StartTime.Unix() != 123456789 {
t.Fatalf("unexpected job properties: %#v", job)
}
@ -291,4 +296,18 @@ func TestRestApi(t *testing.T) {
t.Fatal("unexpected data fetched from archive")
}
})
t.Run("CheckDoubleStart", func(t *testing.T) {
// Starting a job with the same jobId and cluster should only be allowed if the startTime is far appart!
body := strings.Replace(startJobBody, `"startTime": 123456789`, `"startTime": 123456790`, -1)
req := httptest.NewRequest(http.MethodPost, "/api/jobs/start_job/", bytes.NewBuffer([]byte(body)))
recorder := httptest.NewRecorder()
r.ServeHTTP(recorder, req)
response := recorder.Result()
if response.StatusCode != http.StatusUnprocessableEntity {
t.Fatal(response.Status, recorder.Body.String())
}
})
}

View File

@ -1,26 +0,0 @@
package test
import (
"fmt"
"os"
"github.com/jmoiron/sqlx"
_ "github.com/mattn/go-sqlite3"
)
func InitDB() *sqlx.DB {
bp := "./"
ebp := os.Getenv("BASEPATH")
if ebp != "" {
bp = ebp + "test/"
}
db, err := sqlx.Open("sqlite3", bp+"test.db")
if err != nil {
fmt.Println(err)
}
return db
}

Binary file not shown.