feat: add Tinybird datasources for packages-db tables (CM-1219)#4180
feat: add Tinybird datasources for packages-db tables (CM-1219)#4180joanagmaia wants to merge 21 commits into
Conversation
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
There was a problem hiding this comment.
Pull request overview
This PR adds a set of new Tinybird datasource definitions under services/libs/tinybird/datasources/ to replicate packages-db domain tables (packages, versions, repos, advisory graph, maintainers, and relationship tables) into ClickHouse/Tinybird for analytics.
Changes:
- Add 11 new Tinybird
.datasourceschemas for packages-db tables (advisories, packages, versions, repos, etc.). - Use
ReplacingMergeTreeacross the new datasources with partition/sort keys tuned per entity and a designated “version” column for deduplication. - Align datasource columns to the latest packages-db schema fields (e.g. maintainer
email, package dependent counts/impact).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/tinybird/datasources/advisories.datasource | Defines advisory records (OSV/CVE/GHSA metadata) for Tinybird replication. |
| services/libs/tinybird/datasources/advisoryAffectedRanges.datasource | Defines per-package affected version ranges for advisories. |
| services/libs/tinybird/datasources/advisoryPackages.datasource | Defines mapping between advisories and affected packages. |
| services/libs/tinybird/datasources/maintainers.datasource | Defines registry maintainer identities (incl. email) for analytics. |
| services/libs/tinybird/datasources/packageDependencies.datasource | Defines the package dependency graph edges for analytics queries. |
| services/libs/tinybird/datasources/packageMaintainers.datasource | Defines package↔maintainer relationship rows. |
| services/libs/tinybird/datasources/packageRepos.datasource | Defines package↔repo provenance mapping rows. |
| services/libs/tinybird/datasources/packages.datasource | Defines package-level metadata including criticality/dependency metrics. |
| services/libs/tinybird/datasources/repoScorecardChecks.datasource | Defines per-check OpenSSF Scorecard signals per repo. |
| services/libs/tinybird/datasources/repos.datasource | Defines repository metadata and enrichment signals for analytics. |
| services/libs/tinybird/datasources/versions.datasource | Defines per-version metadata (publish time, prerelease, licenses, etc.). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b9c833b. Configure here.
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>

Summary
advisories,advisoryAffectedRanges,advisoryPackages,repos,repoScorecardChecks,maintainers,packages,packageMaintainers,packageDependencies,versions,packageRepos.ReplacingMergeTreewithENGINE_PARTITION_KEYandENGINE_VERaligned to the same timestamp column (the effectiveupdated_atfor each table).emailinstead ofemail_hashon maintainers,dependent_count/transitive_dependent_count/impacton packages.Notes
updated_atuse their semantic equivalent:last_synced_atforpackages,versions,repos;verified_atforpackage_repos.packages_universeis intentionally excluded (pending deprecation).🤖 Generated with Claude Code
Note
Medium Risk
Touches production packages-db logical replication and ranking SQL; incorrect watermark handling could stale analytics or over-publish CDC events.
Overview
Adds 11 Tinybird datasources for packages-db (packages, versions, dependencies, repos, advisories, maintainers, etc.) using
ReplacingMergeTreewith per-table watermark columns (lastSyncedAt,updatedAt, orverifiedAt).A packages-db migration enables CDC for that pipeline: creates
sequin_pubover those tables withpublish_via_partition_root, setsREPLICA IDENTITY FULLon roots and hash-partition leaves, and updatesrank_packages()to bumplast_synced_atwhenever ranking/criticality fields change so Tinybird’sENGINE_VERstays correct.Sync semantics are tightened in workers/DAL: deps.dev repo seeding and Maven
upsertRepono longer setrepos.last_synced_at(GitHub enricher owns it); OSVhas_critical_vulnerabilityflips now bumppackages.last_synced_at.Reviewed by Cursor Bugbot for commit 49aa2ff. Bugbot is set up for automated code reviews on this repo. Configure here.