read .env automatically, support systemd

This commit is contained in:
Lou Knauer 2022-01-12 11:13:25 +01:00
parent ff24d946fd
commit f185d12078
6 changed files with 303 additions and 20 deletions

View File

@ -28,15 +28,24 @@ ln -s <your-existing-job-archive> ./var/job-archive
# Create empty job.db (Will be initialized as SQLite3 database)
touch ./var/job.db
# EDIT THE .env FILE BEFORE YOU DEPLOY (Change the secrets)!
# If authentication is disabled, it can be empty.
source .env
# This will first initialize the job.db database by traversing all
# `meta.json` files in the job-archive. After that, a HTTP server on
# the port 8080 will be running. The `--init-db` is only needed the first time.
./cc-jobarchive --init-db --add-user <your-username>:admin:<your-password>
# `meta.json` files in the job-archive and add a new user. `--no-server` will cause the
# executable to stop once it has done that instead of starting a server.
./cc-jobarchive --init-db --add-user <your-username>:admin:<your-password> --no-server
# Start a HTTP server (HTTPS can be enabled, the default port is 8080):
./cc-jobarchive
# Show other options:
./cc-jobarchive --help
```
In order to run this program as a deamon, look at [utils/systemd/README.md](./utils/systemd/README.md) where a systemd unit file and more explanation is provided.
### Configuration
A config file in the JSON format can be provided using `--config` to override the defaults. Look at the beginning of `server.go` for the defaults and consequently the format of the configuration file.
@ -45,9 +54,42 @@ A config file in the JSON format can be provided using `--config` to override th
This project uses [gqlgen](https://github.com/99designs/gqlgen) for the GraphQL API. The schema can be found in `./graph/schema.graphqls`. After changing it, you need to run `go run github.com/99designs/gqlgen` which will update `graph/model`. In case new resolvers are needed, they will be inserted into `graph/schema.resolvers.go`, where you will need to implement them.
### Project Structure
- `api/` contains the REST API. The routes defined there should be called whenever a job starts/stops. The API is documented in the OpenAPI 3.0 format in [./api/openapi.yaml](./api/openapi.yaml).
- `auth/` is where the (optional) authentication middleware can be found, which adds the currently authenticated user to the request context. The `user` table is created and managed here as well.
- `auth/ldap.go` contains everything to do with automatically syncing and authenticating users form an LDAP server.
- `config` handles the `cluster.json` files and the user-specific configurations (changeable via GraphQL) for the Web-UI such as the selected metrics etc.
- `frontend` is a submodule, this is where the Svelte based frontend resides.
- `graph/generated` should *not* be touched.
- `graph/model` contains all types defined in the GraphQL schema not manually defined in `schema/`. Manually defined types have to be listed in `gqlgen.yml`.
- `graph/schema.graphqls` contains the GraphQL schema. Whenever you change it, you should call `go run github.com/99designs/gqlgen`.
- `graph/` contains the resolvers and handlers for the GraphQL API. Function signatures in `graph/schema.resolvers.go` are automatically generated.
- `metricdata/` handles getting and archiving the metrics associated with a job.
- `metricdata/metricdata.go` defines the interface `MetricDataRepository` and provides functions to the GraphQL and REST API for accessing a jobs metrics which automatically take care of selecting the source for the metrics (the archive or one of the metric data repositories).
- `metricdata/archive.go` provides functions for fetching metrics from the job-archive and archiving a job to the job-archive.
- `metricdata/cc-metric-store.go` contains an implementation of the `MetricDataRepository` interface which can fetch data from an [cc-metric-store](https://github.com/ClusterCockpit/cc-metric-store)
- `metricdata/influxdb-v2` contains an implementation of the `MetricDataRepository` interface which can fetch data from an InfluxDBv2 database. It is currently disabled and out of date and can not be used as of writing.
- `schema/` contains type definitions used all over this project extracted in this package as Go disallows cyclic dependencies between packages.
- `schema/float.go` contains a custom `float64` type which overwrites JSON and GraphQL Marshaling/Unmarshalling. This is needed because a regular optional `Float` in GraphQL will map to `*float64` types in Go. Wrapping every single metric value in an allocation would be a lot of overhead.
- `schema/job.go` provides the types representing a job and its resources. Those can be used as type for a `meta.json` file and/or a row in the `job` table.
- `templates/` is mostly full of HTML templates and a small helper go module.
- `utils/systemd` describes how to deploy/install this as a systemd service
- `utils/` is mostly outdated. Look at the [cc-util repo](https://github.com/ClusterCockpit/cc-util) for more up-to-date scripts.
- `.env` *must* be changed before you deploy this. It contains a Base64 encoded [Ed25519](https://en.wikipedia.org/wiki/EdDSA) key-pair, the secret used for sessions and the password to the LDAP server if LDAP authentication is enabled.
- `gqlgen.yml` configures the behaviour and generation of [gqlgen](https://github.com/99designs/gqlgen).
- `init-db.go` initializes the `job` (and `tag` and `jobtag`) table if the `--init-db` flag is provided. Not only is the table created in the correct schema, but the job-archive is traversed as well.
- `server.go` contains the main function and starts the actual http server.
### TODO
- [ ] Documentation
- [ ] Write more TODOs
- [ ] Caching
- [ ] Generate JWTs based on the provided keys
- [ ] fix frontend
- [ ] write (unit) tests
- [ ] make tokens and sessions (currently based on cookies) expire after some configurable time
- [ ] when authenticating using a JWT, check if that user still exists
- [ ] allow mysql as database and passing the database uri as environment variable
- [ ] fix InfluxDB MetricDataRepository (new or old line-protocol format? Support node-level metrics only?)
- [ ] support all metric scopes
- [ ] documentation, comments in the code base
- [ ] write more TODOs
- [ ] caching

View File

@ -9,6 +9,7 @@ import (
"net/http"
"os"
"path/filepath"
"sync"
"github.com/ClusterCockpit/cc-jobarchive/config"
"github.com/ClusterCockpit/cc-jobarchive/graph"
@ -20,10 +21,11 @@ import (
)
type RestApi struct {
DB *sqlx.DB
Resolver *graph.Resolver
AsyncArchiving bool
MachineStateDir string
DB *sqlx.DB
Resolver *graph.Resolver
AsyncArchiving bool
MachineStateDir string
OngoingArchivings sync.WaitGroup
}
func (api *RestApi) MountRoutes(r *mux.Router) {
@ -233,6 +235,9 @@ func (api *RestApi) stopJob(rw http.ResponseWriter, r *http.Request) {
}
doArchiving := func(job *schema.Job, ctx context.Context) error {
api.OngoingArchivings.Add(1)
defer api.OngoingArchivings.Done()
job.Duration = int32(req.StopTime - job.StartTime.Unix())
jobMeta, err := metricdata.ArchiveJob(job, ctx)
if err != nil {

201
server.go
View File

@ -1,14 +1,26 @@
package main
import (
"bufio"
"context"
"crypto/tls"
"encoding/json"
"errors"
"flag"
"fmt"
"log"
"net"
"net/http"
"net/url"
"os"
"os/exec"
"os/signal"
"os/user"
"strconv"
"strings"
"sync"
"syscall"
"time"
"github.com/99designs/gqlgen/graphql/handler"
"github.com/99designs/gqlgen/graphql/playground"
@ -33,6 +45,10 @@ type ProgramConfig struct {
// Address where the http (or https) server will listen on (for example: 'localhost:80').
Addr string `json:"addr"`
// Drop root permissions once .env was read and the port was taken.
User string `json:"user"`
Group string `json:"group"`
// Disable authentication (for everything: API, Web-UI, ...)
DisableAuthentication bool `json:"disable-authentication"`
@ -68,7 +84,7 @@ type ProgramConfig struct {
}
var programConfig ProgramConfig = ProgramConfig{
Addr: "0.0.0.0:8080",
Addr: ":8080",
DisableAuthentication: false,
StaticFiles: "./frontend/public",
DB: "./var/job.db",
@ -116,6 +132,10 @@ func main() {
flag.StringVar(&flagGenJWT, "jwt", "", "Generate and print a JWT for the user specified by the username")
flag.Parse()
if err := loadEnv("./.env"); err != nil && !os.IsNotExist(err) {
log.Fatalf("parsing './.env' file failed: %s", err.Error())
}
if flagConfigFile != "" {
data, err := os.ReadFile(flagConfigFile)
if err != nil {
@ -280,15 +300,67 @@ func main() {
handlers.AllowedMethods([]string{"GET", "POST", "HEAD", "OPTIONS"}),
handlers.AllowedOrigins([]string{"*"}))(handlers.LoggingHandler(os.Stdout, handlers.CompressHandler(r)))
// Start http or https server
if programConfig.HttpsCertFile != "" && programConfig.HttpsKeyFile != "" {
log.Printf("HTTPS server running at %s...", programConfig.Addr)
err = http.ListenAndServeTLS(programConfig.Addr, programConfig.HttpsCertFile, programConfig.HttpsKeyFile, handler)
} else {
log.Printf("HTTP server running at %s...", programConfig.Addr)
err = http.ListenAndServe(programConfig.Addr, handler)
var wg sync.WaitGroup
server := http.Server{
ReadTimeout: 10 * time.Second,
WriteTimeout: 10 * time.Second,
Handler: handler,
Addr: programConfig.Addr,
}
log.Fatal(err)
// Start http or https server
listener, err := net.Listen("tcp", programConfig.Addr)
if err != nil {
log.Fatal(err)
}
if programConfig.HttpsCertFile != "" && programConfig.HttpsKeyFile != "" {
cert, err := tls.LoadX509KeyPair(programConfig.HttpsCertFile, programConfig.HttpsKeyFile)
if err != nil {
log.Fatal(err)
}
listener = tls.NewListener(listener, &tls.Config{
Certificates: []tls.Certificate{cert},
})
log.Printf("HTTPS server listening at %s...", programConfig.Addr)
} else {
log.Printf("HTTP server listening at %s...", programConfig.Addr)
}
// Because this program will want to bind to a privileged port (like 80), the listener must
// be established first, then the user can be changed, and after that,
// the actuall http server can be started.
if err := dropPrivileges(); err != nil {
log.Fatalf("error while changing user: %s", err.Error())
}
wg.Add(1)
go func() {
defer wg.Done()
if err := server.Serve(listener); err != nil && err != http.ErrServerClosed {
log.Fatal(err)
}
}()
wg.Add(1)
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
go func() {
defer wg.Done()
<-sigs
systemdNotifiy(false, "shutting down")
// First shut down the server gracefully (waiting for all ongoing requests)
server.Shutdown(context.Background())
// Then, wait for any async archivings still pending...
api.OngoingArchivings.Wait()
}()
systemdNotifiy(true, "running")
wg.Wait()
log.Print("Gracefull shutdown completed!")
}
func monitoringRoutes(router *mux.Router, resolver *graph.Resolver) {
@ -448,3 +520,114 @@ func monitoringRoutes(router *mux.Router, resolver *graph.Resolver) {
})
})
}
func loadEnv(file string) error {
f, err := os.Open(file)
if err != nil {
return err
}
defer f.Close()
s := bufio.NewScanner(bufio.NewReader(f))
for s.Scan() {
line := s.Text()
if strings.HasPrefix(line, "#") || len(line) == 0 {
continue
}
if strings.Contains(line, "#") {
return errors.New("'#' are only supported at the start of a line")
}
line = strings.TrimPrefix(line, "export ")
parts := strings.SplitN(line, "=", 2)
if len(parts) != 2 {
return fmt.Errorf("unsupported line: %#v", line)
}
key := strings.TrimSpace(parts[0])
val := strings.TrimSpace(parts[1])
if strings.HasPrefix(val, "\"") {
if !strings.HasSuffix(val, "\"") {
return fmt.Errorf("unsupported line: %#v", line)
}
runes := []rune(val[1 : len(val)-1])
sb := strings.Builder{}
for i := 0; i < len(runes); i++ {
if runes[i] == '\\' {
i++
switch runes[i] {
case 'n':
sb.WriteRune('\n')
case 'r':
sb.WriteRune('\r')
case 't':
sb.WriteRune('\t')
case '"':
sb.WriteRune('"')
default:
return fmt.Errorf("unsupprorted escape sequence in quoted string: backslash %#v", runes[i])
}
continue
}
sb.WriteRune(runes[i])
}
val = sb.String()
}
os.Setenv(key, val)
}
return s.Err()
}
func dropPrivileges() error {
if programConfig.Group != "" {
g, err := user.LookupGroup(programConfig.Group)
if err != nil {
return err
}
gid, _ := strconv.Atoi(g.Gid)
if err := syscall.Setgid(gid); err != nil {
return err
}
}
if programConfig.User != "" {
u, err := user.Lookup(programConfig.User)
if err != nil {
return err
}
uid, _ := strconv.Atoi(u.Uid)
if err := syscall.Setuid(uid); err != nil {
return err
}
}
return nil
}
// If started via systemd, inform systemd that we are running:
// https://www.freedesktop.org/software/systemd/man/sd_notify.html
func systemdNotifiy(ready bool, status string) {
if os.Getenv("NOTIFY_SOCKET") == "" {
// Not started using systemd
return
}
args := []string{fmt.Sprintf("--pid=%d", os.Getpid())}
if ready {
args = append(args, "--ready")
}
if status != "" {
args = append(args, fmt.Sprintf("--status=%s", status))
}
cmd := exec.Command("systemd-notify", args...)
cmd.Run() // errors ignored on purpose, there is not much to do anyways.
}

30
utils/systemd/README.md Normal file
View File

@ -0,0 +1,30 @@
# How to run this as a systemd deamon
The files in this directory assume that you install the Golang version of ClusterCockpit to `/var/clustercockpit`. If you do not like that, you can choose any other location, but make sure to replace all paths that begin with `/var/clustercockpit` in the `clustercockpit.service` file!
If you have not installed [yarn](https://yarnpkg.com/getting-started/install) and [go](https://go.dev/doc/install) already, do that (Golang is available in most package managers).
The `config.json` can have the optional fields *user* and *group*. If provided, the application will call [setuid](https://man7.org/linux/man-pages/man2/setuid.2.html) and [setgid](https://man7.org/linux/man-pages/man2/setgid.2.html) after having read the config file and having bound to a TCP port (so that it can take a privileged port), but before it starts accepting any connections. This is good for security, but means that the directories `frontend/public`, `var/` and `templates/` must be readable by that user and `var/` writable as well (All paths relative to the repos root). The `.env` and `config.json` files might contain secrets and should not be readable by that user. If those files are changed, the server has to be restarted.
```sh
# 1.: Clone this repository to /var/clustercockpit
git clone git@github.com:ClusterCockpit/cc-specifications.git /var/clustercockpit
# 2.: Install all dependencies and build everything
cd /var/clustercockpit
go get && go build && (cd ./frontend && yarn install && yarn build)
# 3.: Modify the `./config.json` file from the directory which contains this README.md to your liking and put it in the repo root
cp ./utils/systemd/config.json ./config.json
vim ./config.json # do your thing...
# 4.: Add the systemd service unit file
sudo ln -s /var/clustercockpit/utils/systemd/clustercockpit.service /etc/systemd/system/clustercockpit.service
# 5.: Enable and start the server
sudo systemctl enable clustercockpit.service # optional (if done, (re-)starts automatically)
sudo systemctl start clustercockpit.service
# Check whats going on:
sudo journalctl -u clustercockpit.service
```

View File

@ -0,0 +1,16 @@
[Unit]
Description=ClusterCockpit Web Server (Go edition)
Documentation=https://github.com/ClusterCockpit/cc-backend
Wants=network-online.target
After=network-online.target
[Service]
WorkingDirectory=/var/clustercockpit
Type=notify
NotifyAccess=all
Restart=on-failure
TimeoutStopSec=100
ExecStart=/var/clustercockpit/cc-jobarchive --config ./config.json
[Install]
WantedBy=multi-user.target

View File

@ -0,0 +1,7 @@
{
"addr": "0.0.0.0:443",
"https-cert-file": "/etc/letsencrypt/live/<...>/fullchain.pem",
"https-key-file": "/etc/letsencrypt/live/<...>/privkey.pem",
"user": "clustercockpit",
"group": "clustercockpit"
}