Skip to content

feat: pom fetcher (CM-1210)#4179

Draft
ulemons wants to merge 24 commits into
mainfrom
feat/pom-fetcher
Draft

feat: pom fetcher (CM-1210)#4179
ulemons wants to merge 24 commits into
mainfrom
feat/pom-fetcher

Conversation

@ulemons

@ulemons ulemons commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Changes

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Performance improvement
  • Chore / dependency update
  • Documentation

JIRA ticket

mbani01 and others added 24 commits June 4, 2026 16:19
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…ved to packages_worker)

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
…by run mode

Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
@ulemons ulemons self-assigned this Jun 8, 2026
Copilot AI review requested due to automatic review settings June 8, 2026 16:19
@CLAassistant

CLAassistant commented Jun 8, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ulemons
❌ mbani01
You have signed the CLA already but the status is still pending? Let us recheck it.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Maven POM–based enrichment to packages_worker, along with new OSS packages (osspckgs) Data Access Layer helpers to persist Maven package, version, maintainer, and repo metadata into the packages DB.

Changes:

  • Added a Maven Temporal workflow/activity pipeline plus scheduling, including a standalone backfill entrypoint.
  • Introduced new osspckgs DAL modules (packages, versions, maintainers, repos) with upsert + change-detection/audit helpers.
  • Added Maven parsing/normalization utilities (POM fetch + inheritance resolution, metadata version selection) and unit tests for key normalizers.

Reviewed changes

Copilot reviewed 22 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
services/libs/data-access-layer/src/osspckgs/versions.ts Bulk upsert versions via UNNEST, with change-field reporting.
services/libs/data-access-layer/src/osspckgs/types.ts Introduces DB upsert shapes for osspckgs (packages, maintainers, versions, repos).
services/libs/data-access-layer/src/osspckgs/repos.ts Adds repo and package↔repo upsert helpers with change-field reporting.
services/libs/data-access-layer/src/osspckgs/packages.ts Adds Maven “to-sync” paging query, package touch, audit logging, and package upsert.
services/libs/data-access-layer/src/osspckgs/maintainers.ts Adds maintainer upsert + package maintainer link replacement helper.
services/libs/data-access-layer/src/osspckgs/index.ts Barrel exports for osspckgs DAL module.
services/libs/data-access-layer/src/index.ts Re-exports osspckgs maintainer/version helpers from the DAL package entrypoint.
services/apps/packages_worker/src/workflows/index.ts Exposes Maven workflows from the worker workflows index.
services/apps/packages_worker/src/maven/workflows.ts Defines Temporal workflows for critical/non-critical Maven processing.
services/apps/packages_worker/src/maven/schedule.ts Registers the maven-critical Temporal schedule (with recreate-on-exists behavior).
services/apps/packages_worker/src/maven/runMavenEnrichmentLoop.ts Implements Maven batch processing, extraction, persistence, and audit logging.
services/apps/packages_worker/src/maven/normalize.ts Adds prerelease detection and repo URL parsing.
services/apps/packages_worker/src/maven/metadata.ts Fetches/parses maven-metadata.xml and selects stable release version.
services/apps/packages_worker/src/maven/extract.ts Fetches/parses POMs with parent inheritance + in-process caching.
services/apps/packages_worker/src/maven/activities.ts Wires Maven batch processing into Temporal activities.
services/apps/packages_worker/src/maven/tests/normalize.test.ts Adds tests for prerelease detection, stable version selection, and SCM/repo URL normalization.
services/apps/packages_worker/src/config.ts Adds getMavenConfig() env parsing.
services/apps/packages_worker/src/bin/packages-worker.ts Registers the Maven-critical schedule in the main packages worker entrypoint.
services/apps/packages_worker/src/bin/maven-worker.ts Adds Maven-only worker entrypoint for local dev.
services/apps/packages_worker/src/bin/maven-backfill.ts Adds one-shot backfill runner for Maven critical queue.
services/apps/packages_worker/src/activities.ts Re-exports Maven activities.
services/apps/packages_worker/package.json Adds Maven scripts + dependencies (axios, fast-xml-parser) and adjusts local monitor script.
scripts/services/maven-worker.yaml Adds docker-compose service definitions for running Maven worker/dev worker.
scripts/builders/packages-worker.env Builder config update for packages-worker image/services.
pnpm-lock.yaml Locks new dependencies (axios, fast-xml-parser) and transitive updates.
backend/.env.dist.local Adds Maven-related env defaults and sets local osspckgs GCP env placeholders.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"monitor:osspckgs:local": "bash -c 'set -a && . ../../../backend/.env.dist.local && . ../../../backend/.env.override.local && set +a && SERVICE=monitor tsx src/scripts/monitorOsspckgs.ts'",
"trigger-bootstrap": "SERVICE=deps-dev-ingest tsx src/scripts/triggerBootstrap.ts",
"trigger-bootstrap:local": "set -a && . ../../../backend/.env.dist.local && . ../../../backend/.env.override.local && set +a && SERVICE=deps-dev-ingest tsx src/scripts/triggerBootstrap.ts",
"monitor:osspckgs:local": "bash -c 'set -a && . ../../../backend/.env.dist.local && . ../../../backend/.env.override.local && set +a && node ../../../scripts/monitor-osspckgs.mjs'",
Comment on lines +284 to +305
const maintainerLinks: Array<{ maintainerId: number; role: 'author' | 'maintainer' }> = []
for (const person of allPeople) {
const username = person.username ?? person.email ?? person.displayName
if (!username) continue
const emailHash = person.email
? crypto.createHash('sha256').update(person.email.toLowerCase().trim()).digest('hex')
: null
const { id: maintainerId, changedFields: mChanged } = await upsertMaintainer(t, {
ecosystem: 'maven',
username,
displayName: person.displayName,
url: person.url,
emailHash,
})
mChanged.forEach((f) => changed.add(f))
maintainerLinks.push({ maintainerId, role: person.role })
}

if (maintainerLinks.length > 0) {
const pmChanged = await replacePackageMaintainers(t, packageId, maintainerLinks)
pmChanged.forEach((f) => changed.add(f))
}
Comment on lines +8 to +10
* Returns null when the artifact is not found (404) or the metadata is
* malformed.
*/
Comment on lines +402 to +417
if (!rootPom) {
return {
groupId,
artifactId,
version,
purl,
description: null,
licenses: [],
licensesRaw: null,
scmUrl: null,
homepageUrl: null,
developers: [],
contributors: [],
parentHops: 0,
error: `POM not found: ${pomUrl}`,
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants