Introduce a sync mechanism for EPSS scores
# Introduction The goal is to add EPSS support to the `package_metadata` flow in the GitLab backend. See [an overview of the flow](https://youtu.be/e48Zgl_9_x4). ## Notes * Initially, delta mechanisms will not be used and all EPSS data will be uploaded daily to the PMDB bucket. As such, checkpoints may be redundant. See https://gitlab.com/gitlab-org/gitlab/-/issues/467672#note_1982236484. * The only relevant file in the bucket will be `<bucket>/v2/epss/<timestamp>/000000000.ndjson`. # Implementation ## Overview The flow of `package_metadata` on the GitLab side is: 1. Cronjob executes the relevant data type **worker** (licenses, advisories, epss). 1. The **worker** runs the `SyncService` which handles the `package_metadata` flow for each purl type. Since EPSS is its own type, we need to consider how it may look different in this area. 1. `SyncService` retrieves a `SyncConfiguration` for the relevant data type. 1. `SyncService` uses the relevant **connector** (offline or GCP) to iterate over all new files (chunks) in the bucket since the last **checkpoint**. 1. `SyncService` executes `IngestionService` for the given data type. 1. The `IngestionService` runs a set of `IngestionTask`. 1. Each `IngestionTask` parses and upserts the given data. 1. The **checkpoint** is updated to reflect that we have progressed and data has been ingested. 1. Continue until all data has been inserted or a stop signal is received. This issue focuses on the `SyncService` and `SyncConfiguration` which execute the ingestion. ## Tasks ### Sync - [x] Add CVE Enrichment support to [`ee/app/models/package_metadata/sync_configuration.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/package_metadata/sync_configuration.rb). - [x] Add `cve_enrichment` to `configs_for`. - [x] Implement `self.cve_enrichment_configs` similar to `self.advisory_configs`. - [x] Add a `cve_enrichment?` function. - [x] Add support for CVE Enrichment in [`ee/app/services/package_metadata/sync_service.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/services/package_metadata/sync_service.rb). - [x] Add a `cve_enrichment` flow under `ingest` - [ ] Following https://gitlab.com/gitlab-org/gitlab/-/issues/467672#note_1982236484, return with a nil checkpoint value from `checkpoint` to ingest all existing data. - [x] Test! You may create a CVE Enrichment object in `ee/spec/factories/package_metadata` similarly to `ee/spec/factories/package_metadata/advisory_data_objects.rb`. - [x] Add a CVE Enrichment context to `ee/spec/models/package_metadata/sync_configuration_spec.rb`. - [x] Add CVE Enrichment flows to `ee/spec/services/package_metadata/sync_service_spec.rb` ### Execution - [x] Create a [feature flag](https://docs.gitlab.com/ee/operations/feature_flags.html) for EPSS syncing. - [x] Create `cve_enrichment_sync_worker.rb` under `ee/app/workers/package_metadata`, similarly to [`ee/app/workers/package_metadata/advisories_sync_worker.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/package_metadata/advisories_sync_worker.rb) to execute the `SyncService`. - [x] The worker should only run if the feature flag is enabled. - [x] Test! You may create a CVE Enrichment object in `ee/spec/factories/package_metadata` similarly to `ee/spec/factories/package_metadata/advisory_data_objects.rb`. - [x] Implement `ee/spec/workers/package_metadata/cve_enrichment_sync_worker_spec.rb` - [x] Add cronjob for CVE Enrichment sync worker to [`config/initializers/1_settings.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/initializers/1_settings.rb?ref_type=heads), similar to `package_metadata_advisories_sync_worker`. The cronjob specifies the worker to run every 5 minutes. - [x] Regenerate `ee/app/workers/all_queues.yml` with new cronjob changes (see [sidekiq queues](https://docs.gitlab.com/ee/development/sidekiq/#sidekiq-queues))
issue