diff --git a/README.md b/README.md index 792655f..c92c171 100644 --- a/README.md +++ b/README.md @@ -28,15 +28,24 @@ ln -s ./var/job-archive # Create empty job.db (Will be initialized as SQLite3 database) touch ./var/job.db +# EDIT THE .env FILE BEFORE YOU DEPLOY (Change the secrets)! +# If authentication is disabled, it can be empty. +source .env + # This will first initialize the job.db database by traversing all -# `meta.json` files in the job-archive. After that, a HTTP server on -# the port 8080 will be running. The `--init-db` is only needed the first time. -./cc-jobarchive --init-db --add-user :admin: +# `meta.json` files in the job-archive and add a new user. `--no-server` will cause the +# executable to stop once it has done that instead of starting a server. +./cc-jobarchive --init-db --add-user :admin: --no-server + +# Start a HTTP server (HTTPS can be enabled, the default port is 8080): +./cc-jobarchive # Show other options: ./cc-jobarchive --help ``` +In order to run this program as a deamon, look at [utils/systemd/README.md](./utils/systemd/README.md) where a systemd unit file and more explanation is provided. + ### Configuration A config file in the JSON format can be provided using `--config` to override the defaults. Look at the beginning of `server.go` for the defaults and consequently the format of the configuration file. @@ -45,9 +54,42 @@ A config file in the JSON format can be provided using `--config` to override th This project uses [gqlgen](https://github.com/99designs/gqlgen) for the GraphQL API. The schema can be found in `./graph/schema.graphqls`. After changing it, you need to run `go run github.com/99designs/gqlgen` which will update `graph/model`. In case new resolvers are needed, they will be inserted into `graph/schema.resolvers.go`, where you will need to implement them. +### Project Structure + +- `api/` contains the REST API. The routes defined there should be called whenever a job starts/stops. The API is documented in the OpenAPI 3.0 format in [./api/openapi.yaml](./api/openapi.yaml). +- `auth/` is where the (optional) authentication middleware can be found, which adds the currently authenticated user to the request context. The `user` table is created and managed here as well. + - `auth/ldap.go` contains everything to do with automatically syncing and authenticating users form an LDAP server. +- `config` handles the `cluster.json` files and the user-specific configurations (changeable via GraphQL) for the Web-UI such as the selected metrics etc. +- `frontend` is a submodule, this is where the Svelte based frontend resides. +- `graph/generated` should *not* be touched. +- `graph/model` contains all types defined in the GraphQL schema not manually defined in `schema/`. Manually defined types have to be listed in `gqlgen.yml`. +- `graph/schema.graphqls` contains the GraphQL schema. Whenever you change it, you should call `go run github.com/99designs/gqlgen`. +- `graph/` contains the resolvers and handlers for the GraphQL API. Function signatures in `graph/schema.resolvers.go` are automatically generated. +- `metricdata/` handles getting and archiving the metrics associated with a job. + - `metricdata/metricdata.go` defines the interface `MetricDataRepository` and provides functions to the GraphQL and REST API for accessing a jobs metrics which automatically take care of selecting the source for the metrics (the archive or one of the metric data repositories). + - `metricdata/archive.go` provides functions for fetching metrics from the job-archive and archiving a job to the job-archive. + - `metricdata/cc-metric-store.go` contains an implementation of the `MetricDataRepository` interface which can fetch data from an [cc-metric-store](https://github.com/ClusterCockpit/cc-metric-store) + - `metricdata/influxdb-v2` contains an implementation of the `MetricDataRepository` interface which can fetch data from an InfluxDBv2 database. It is currently disabled and out of date and can not be used as of writing. +- `schema/` contains type definitions used all over this project extracted in this package as Go disallows cyclic dependencies between packages. + - `schema/float.go` contains a custom `float64` type which overwrites JSON and GraphQL Marshaling/Unmarshalling. This is needed because a regular optional `Float` in GraphQL will map to `*float64` types in Go. Wrapping every single metric value in an allocation would be a lot of overhead. + - `schema/job.go` provides the types representing a job and its resources. Those can be used as type for a `meta.json` file and/or a row in the `job` table. +- `templates/` is mostly full of HTML templates and a small helper go module. +- `utils/systemd` describes how to deploy/install this as a systemd service +- `utils/` is mostly outdated. Look at the [cc-util repo](https://github.com/ClusterCockpit/cc-util) for more up-to-date scripts. +- `.env` *must* be changed before you deploy this. It contains a Base64 encoded [Ed25519](https://en.wikipedia.org/wiki/EdDSA) key-pair, the secret used for sessions and the password to the LDAP server if LDAP authentication is enabled. +- `gqlgen.yml` configures the behaviour and generation of [gqlgen](https://github.com/99designs/gqlgen). +- `init-db.go` initializes the `job` (and `tag` and `jobtag`) table if the `--init-db` flag is provided. Not only is the table created in the correct schema, but the job-archive is traversed as well. +- `server.go` contains the main function and starts the actual http server. + ### TODO -- [ ] Documentation -- [ ] Write more TODOs -- [ ] Caching -- [ ] Generate JWTs based on the provided keys +- [ ] fix frontend +- [ ] write (unit) tests +- [ ] make tokens and sessions (currently based on cookies) expire after some configurable time +- [ ] when authenticating using a JWT, check if that user still exists +- [ ] allow mysql as database and passing the database uri as environment variable +- [ ] fix InfluxDB MetricDataRepository (new or old line-protocol format? Support node-level metrics only?) +- [ ] support all metric scopes +- [ ] documentation, comments in the code base +- [ ] write more TODOs +- [ ] caching diff --git a/api/rest.go b/api/rest.go index 018d25a..725aa84 100644 --- a/api/rest.go +++ b/api/rest.go @@ -9,6 +9,7 @@ import ( "net/http" "os" "path/filepath" + "sync" "github.com/ClusterCockpit/cc-jobarchive/config" "github.com/ClusterCockpit/cc-jobarchive/graph" @@ -20,10 +21,11 @@ import ( ) type RestApi struct { - DB *sqlx.DB - Resolver *graph.Resolver - AsyncArchiving bool - MachineStateDir string + DB *sqlx.DB + Resolver *graph.Resolver + AsyncArchiving bool + MachineStateDir string + OngoingArchivings sync.WaitGroup } func (api *RestApi) MountRoutes(r *mux.Router) { @@ -233,6 +235,9 @@ func (api *RestApi) stopJob(rw http.ResponseWriter, r *http.Request) { } doArchiving := func(job *schema.Job, ctx context.Context) error { + api.OngoingArchivings.Add(1) + defer api.OngoingArchivings.Done() + job.Duration = int32(req.StopTime - job.StartTime.Unix()) jobMeta, err := metricdata.ArchiveJob(job, ctx) if err != nil { diff --git a/server.go b/server.go index 00e5c96..288eb7a 100644 --- a/server.go +++ b/server.go @@ -1,14 +1,26 @@ package main import ( + "bufio" + "context" + "crypto/tls" "encoding/json" + "errors" "flag" "fmt" "log" + "net" "net/http" "net/url" "os" + "os/exec" + "os/signal" + "os/user" "strconv" + "strings" + "sync" + "syscall" + "time" "github.com/99designs/gqlgen/graphql/handler" "github.com/99designs/gqlgen/graphql/playground" @@ -33,6 +45,10 @@ type ProgramConfig struct { // Address where the http (or https) server will listen on (for example: 'localhost:80'). Addr string `json:"addr"` + // Drop root permissions once .env was read and the port was taken. + User string `json:"user"` + Group string `json:"group"` + // Disable authentication (for everything: API, Web-UI, ...) DisableAuthentication bool `json:"disable-authentication"` @@ -68,7 +84,7 @@ type ProgramConfig struct { } var programConfig ProgramConfig = ProgramConfig{ - Addr: "0.0.0.0:8080", + Addr: ":8080", DisableAuthentication: false, StaticFiles: "./frontend/public", DB: "./var/job.db", @@ -116,6 +132,10 @@ func main() { flag.StringVar(&flagGenJWT, "jwt", "", "Generate and print a JWT for the user specified by the username") flag.Parse() + if err := loadEnv("./.env"); err != nil && !os.IsNotExist(err) { + log.Fatalf("parsing './.env' file failed: %s", err.Error()) + } + if flagConfigFile != "" { data, err := os.ReadFile(flagConfigFile) if err != nil { @@ -280,15 +300,67 @@ func main() { handlers.AllowedMethods([]string{"GET", "POST", "HEAD", "OPTIONS"}), handlers.AllowedOrigins([]string{"*"}))(handlers.LoggingHandler(os.Stdout, handlers.CompressHandler(r))) - // Start http or https server - if programConfig.HttpsCertFile != "" && programConfig.HttpsKeyFile != "" { - log.Printf("HTTPS server running at %s...", programConfig.Addr) - err = http.ListenAndServeTLS(programConfig.Addr, programConfig.HttpsCertFile, programConfig.HttpsKeyFile, handler) - } else { - log.Printf("HTTP server running at %s...", programConfig.Addr) - err = http.ListenAndServe(programConfig.Addr, handler) + var wg sync.WaitGroup + server := http.Server{ + ReadTimeout: 10 * time.Second, + WriteTimeout: 10 * time.Second, + Handler: handler, + Addr: programConfig.Addr, } - log.Fatal(err) + + // Start http or https server + + listener, err := net.Listen("tcp", programConfig.Addr) + if err != nil { + log.Fatal(err) + } + + if programConfig.HttpsCertFile != "" && programConfig.HttpsKeyFile != "" { + cert, err := tls.LoadX509KeyPair(programConfig.HttpsCertFile, programConfig.HttpsKeyFile) + if err != nil { + log.Fatal(err) + } + listener = tls.NewListener(listener, &tls.Config{ + Certificates: []tls.Certificate{cert}, + }) + log.Printf("HTTPS server listening at %s...", programConfig.Addr) + } else { + log.Printf("HTTP server listening at %s...", programConfig.Addr) + } + + // Because this program will want to bind to a privileged port (like 80), the listener must + // be established first, then the user can be changed, and after that, + // the actuall http server can be started. + if err := dropPrivileges(); err != nil { + log.Fatalf("error while changing user: %s", err.Error()) + } + + wg.Add(1) + go func() { + defer wg.Done() + if err := server.Serve(listener); err != nil && err != http.ErrServerClosed { + log.Fatal(err) + } + }() + + wg.Add(1) + sigs := make(chan os.Signal, 1) + signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM) + go func() { + defer wg.Done() + <-sigs + systemdNotifiy(false, "shutting down") + + // First shut down the server gracefully (waiting for all ongoing requests) + server.Shutdown(context.Background()) + + // Then, wait for any async archivings still pending... + api.OngoingArchivings.Wait() + }() + + systemdNotifiy(true, "running") + wg.Wait() + log.Print("Gracefull shutdown completed!") } func monitoringRoutes(router *mux.Router, resolver *graph.Resolver) { @@ -448,3 +520,114 @@ func monitoringRoutes(router *mux.Router, resolver *graph.Resolver) { }) }) } + +func loadEnv(file string) error { + f, err := os.Open(file) + if err != nil { + return err + } + + defer f.Close() + s := bufio.NewScanner(bufio.NewReader(f)) + for s.Scan() { + line := s.Text() + if strings.HasPrefix(line, "#") || len(line) == 0 { + continue + } + + if strings.Contains(line, "#") { + return errors.New("'#' are only supported at the start of a line") + } + + line = strings.TrimPrefix(line, "export ") + parts := strings.SplitN(line, "=", 2) + if len(parts) != 2 { + return fmt.Errorf("unsupported line: %#v", line) + } + + key := strings.TrimSpace(parts[0]) + val := strings.TrimSpace(parts[1]) + if strings.HasPrefix(val, "\"") { + if !strings.HasSuffix(val, "\"") { + return fmt.Errorf("unsupported line: %#v", line) + } + + runes := []rune(val[1 : len(val)-1]) + sb := strings.Builder{} + for i := 0; i < len(runes); i++ { + if runes[i] == '\\' { + i++ + switch runes[i] { + case 'n': + sb.WriteRune('\n') + case 'r': + sb.WriteRune('\r') + case 't': + sb.WriteRune('\t') + case '"': + sb.WriteRune('"') + default: + return fmt.Errorf("unsupprorted escape sequence in quoted string: backslash %#v", runes[i]) + } + continue + } + sb.WriteRune(runes[i]) + } + + val = sb.String() + } + + os.Setenv(key, val) + } + + return s.Err() +} + +func dropPrivileges() error { + if programConfig.Group != "" { + g, err := user.LookupGroup(programConfig.Group) + if err != nil { + return err + } + + gid, _ := strconv.Atoi(g.Gid) + if err := syscall.Setgid(gid); err != nil { + return err + } + } + + if programConfig.User != "" { + u, err := user.Lookup(programConfig.User) + if err != nil { + return err + } + + uid, _ := strconv.Atoi(u.Uid) + if err := syscall.Setuid(uid); err != nil { + return err + } + } + + return nil +} + +// If started via systemd, inform systemd that we are running: +// https://www.freedesktop.org/software/systemd/man/sd_notify.html +func systemdNotifiy(ready bool, status string) { + if os.Getenv("NOTIFY_SOCKET") == "" { + // Not started using systemd + return + } + + args := []string{fmt.Sprintf("--pid=%d", os.Getpid())} + if ready { + args = append(args, "--ready") + } + + if status != "" { + args = append(args, fmt.Sprintf("--status=%s", status)) + } + + cmd := exec.Command("systemd-notify", args...) + cmd.Run() // errors ignored on purpose, there is not much to do anyways. +} diff --git a/utils/systemd/README.md b/utils/systemd/README.md new file mode 100644 index 0000000..5b59d04 --- /dev/null +++ b/utils/systemd/README.md @@ -0,0 +1,30 @@ +# How to run this as a systemd deamon + +The files in this directory assume that you install the Golang version of ClusterCockpit to `/var/clustercockpit`. If you do not like that, you can choose any other location, but make sure to replace all paths that begin with `/var/clustercockpit` in the `clustercockpit.service` file! + +If you have not installed [yarn](https://yarnpkg.com/getting-started/install) and [go](https://go.dev/doc/install) already, do that (Golang is available in most package managers). + +The `config.json` can have the optional fields *user* and *group*. If provided, the application will call [setuid](https://man7.org/linux/man-pages/man2/setuid.2.html) and [setgid](https://man7.org/linux/man-pages/man2/setgid.2.html) after having read the config file and having bound to a TCP port (so that it can take a privileged port), but before it starts accepting any connections. This is good for security, but means that the directories `frontend/public`, `var/` and `templates/` must be readable by that user and `var/` writable as well (All paths relative to the repos root). The `.env` and `config.json` files might contain secrets and should not be readable by that user. If those files are changed, the server has to be restarted. + +```sh +# 1.: Clone this repository to /var/clustercockpit +git clone git@github.com:ClusterCockpit/cc-specifications.git /var/clustercockpit + +# 2.: Install all dependencies and build everything +cd /var/clustercockpit +go get && go build && (cd ./frontend && yarn install && yarn build) + +# 3.: Modify the `./config.json` file from the directory which contains this README.md to your liking and put it in the repo root +cp ./utils/systemd/config.json ./config.json +vim ./config.json # do your thing... + +# 4.: Add the systemd service unit file +sudo ln -s /var/clustercockpit/utils/systemd/clustercockpit.service /etc/systemd/system/clustercockpit.service + +# 5.: Enable and start the server +sudo systemctl enable clustercockpit.service # optional (if done, (re-)starts automatically) +sudo systemctl start clustercockpit.service + +# Check whats going on: +sudo journalctl -u clustercockpit.service +``` diff --git a/utils/systemd/clustercockpit.service b/utils/systemd/clustercockpit.service new file mode 100644 index 0000000..82199a5 --- /dev/null +++ b/utils/systemd/clustercockpit.service @@ -0,0 +1,16 @@ +[Unit] +Description=ClusterCockpit Web Server (Go edition) +Documentation=https://github.com/ClusterCockpit/cc-backend +Wants=network-online.target +After=network-online.target + +[Service] +WorkingDirectory=/var/clustercockpit +Type=notify +NotifyAccess=all +Restart=on-failure +TimeoutStopSec=100 +ExecStart=/var/clustercockpit/cc-jobarchive --config ./config.json + +[Install] +WantedBy=multi-user.target diff --git a/utils/systemd/config.json b/utils/systemd/config.json new file mode 100644 index 0000000..e20c19d --- /dev/null +++ b/utils/systemd/config.json @@ -0,0 +1,7 @@ +{ + "addr": "0.0.0.0:443", + "https-cert-file": "/etc/letsencrypt/live/<...>/fullchain.pem", + "https-key-file": "/etc/letsencrypt/live/<...>/privkey.pem", + "user": "clustercockpit", + "group": "clustercockpit" +}