Skip to content

Run RAT check only on files tracked by git#4499

Open
janhoy wants to merge 4 commits into
apache:mainfrom
janhoy:RAT-sources-only-scm-files
Open

Run RAT check only on files tracked by git#4499
janhoy wants to merge 4 commits into
apache:mainfrom
janhoy:RAT-sources-only-scm-files

Conversation

@janhoy
Copy link
Copy Markdown
Contributor

@janhoy janhoy commented Jun 5, 2026

Now, if you have a file that in your repo that is not tracked by git, and is even in .gitignore, the RAT source scan will still complain about license header. This is annoying if you have e.g. AI tool that stores some content in the repo folder but which is ignored.

This PR filters so that RAT only cares about tracked files.

Comment thread gradle/validation/rat-sources.gradle Outdated
@janhoy janhoy requested a review from dsmiley June 5, 2026 00:39
Copy link
Copy Markdown
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this problem -- it annoys me too.
I forgot if Lucene has moved onto a newer RAT release that addresses this natively... 50/50 chance I checked and saw but Lucene's build has move don a way from us.

Comment thread gradle/validation/rat-sources.gradle Outdated
Comment thread gradle/validation/rat-sources.gradle Outdated
@dsmiley dsmiley requested a review from malliaridis June 5, 2026 00:55
- use JGit to find tracked files
@janhoy
Copy link
Copy Markdown
Contributor Author

janhoy commented Jun 5, 2026

I forgot if Lucene has moved onto a newer RAT release that addresses this natively.

Lucene moved away from RAT to a custom build-src java class, see apache/lucene#15195. If anyone feels there is a reason for us to so similar, let that be a separate effort.

Also, RAT has as a newer version which includes tracked file handling as a native feature, but the way we hold it (we use the Ant task programatically) we'd not benefit from that automatically. If RAT only would publish a gradle plugin, not just a maven one. Update There is a 3rd party RAT gradle plugin we could also consider, but it does not natively support excluding non-tracked files either.

This PR now

  • Replace shell-out with JGit: Use JGit's DirCache API (already a build dependency) instead of shelling out to git ls-files --cached to determine tracked files
  • Read the git index once and cache it via rootProject.ext, avoiding redundant I/O across ~40 subproject rat tasks
  • Detect git worktrees (.git is a file) with a clear warning, handle non-git directories by disabling the filter (RAT scans everything as before), and use finally to prevent repository resource leaks
  • Normalize path separators (\ → /) when computing subproject prefixes, since JGit always uses forward slashes
  • Replaces deprecated APIs in the rest of the file (exclude last commit if you want to hide those changes)

- XmlParser
- project.buildDir
- task()
- @name[0]
@epugh
Copy link
Copy Markdown
Contributor

epugh commented Jun 5, 2026

will this work then in tne cherrypick script? that is where i get bit by this a lot!

@janhoy
Copy link
Copy Markdown
Contributor Author

janhoy commented Jun 5, 2026

will this work then in tne cherrypick script? that is where i get bit by this a lot!

Yes, we explicitly enabled RAT in cherry-pick script and it requires a completely clean working area which is stupid, for the task it does. We have other checks that confirm that a git checkout is clean.

This should fix it..

Comment on lines 122 to 124
// Don't check under the project's build folder.
exclude project.buildDir.name
exclude project.layout.buildDirectory.get().asFile.name

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a redundant check now

Comment thread gradle/validation/rat-sources.gradle Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should no longer have to exclude .idea .muse .git

Comment thread gradle/validation/rat-sources.gradle Outdated
Comment on lines 145 to 147
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we can now omit eclipse config files (not git tracked)

Comment thread gradle/validation/rat-sources.gradle Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.gradle isn't git-tracked

Comment thread gradle/validation/rat-sources.gradle Outdated
// At the module scope we only check selected file patterns as folks have various .gitignore-d resources
// generated by IDEs, etc.
// The git index filter above excludes untracked files. These patterns
// further narrow the scan to file types that should carry license headers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does "include" accomplish narrow-ing? ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants