Skip to content

feat: support custom Hugging Face endpoints via HF_ENDPOINT env var#31

Merged
pipe1os merged 4 commits into
pipe1os:mainfrom
molloyzak13:fix/hf-endpoint-env
Jun 18, 2026
Merged

feat: support custom Hugging Face endpoints via HF_ENDPOINT env var#31
pipe1os merged 4 commits into
pipe1os:mainfrom
molloyzak13:fix/hf-endpoint-env

Conversation

@molloyzak13

@molloyzak13 molloyzak13 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

The Hugging Face parser hardcodes https://huggingface.co in five separate URLs across fetch_huggingface_repo() and _fetch_safetensors_header(). This breaks for anyone behind a proxy or mirror, and there's no way to redirect requests without modifying the source.

Added a _get_hf_endpoint() helper that reads HF_ENDPOINT from the environment, falling back to https://huggingface.co with a trailing-slash strip so users can set either https://hf-mirror.com or https://hf-mirror.com/. Replaced all five hardcoded base URLs with calls to it: the API models endpoint, config.json, model.safetensors.index.json, single model.safetensors, and the safetensors header fetch inside _fetch_safetensors_header.

Tested with an unset HF_ENDPOINT (default behaviour preserved, fetches metadata as before) and with HF_ENDPOINT=https://hf-mirror.com (requests route to the mirror). No new dependencies, no API changes, no breaking changes.

Summary by CodeRabbit

  • New Features
    • Added support for a configurable Hugging Face endpoint via the HF_ENDPOINT environment variable, enabling use of custom Hugging Face instances.
    • Validates and normalizes the configured endpoint (rejects non-HTTPS, ensures it’s non-empty, and removes trailing slashes).
    • Updated Hugging Face model metadata and artifact request URLs to consistently use the configured endpoint.
  • Tests
    • Added unit tests covering default behavior and validation cases for HF_ENDPOINT.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

Adds a _get_hf_endpoint() helper function to huggingface.py that reads the HF_ENDPOINT environment variable (defaulting to https://huggingface.co), strips trailing slashes, and validates the protocol and hostname. All five hardcoded Hugging Face base URL constructions are replaced with calls to this helper. Test cases verify the helper's default behavior, environment override, and validation logic.

Changes

HF Endpoint Centralization

Layer / File(s) Summary
HF endpoint helper with validation
src/modelinfo/parsers/huggingface.py
Introduces _get_hf_endpoint() that reads HF_ENDPOINT from the environment with default fallback to https://huggingface.co, trailing-slash strip, and protocol/hostname validation.
URL construction sites using endpoint helper
src/modelinfo/parsers/huggingface.py
Updates _fetch_safetensors_header and fetch_huggingface_repo to call _get_hf_endpoint() for all URL construction: SafeTensors header resolution, api/models metadata, config.json, model.safetensors.index.json, and model.safetensors HEAD request.
Test validation for endpoint helper
tests/test_parsers.py
Tests confirm that _get_hf_endpoint() accepts valid https:// endpoints, defaults to https://huggingface.co when HF_ENDPOINT is unset, and raises ValueError for http://, empty strings, and URLs lacking a hostname.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers the problem, solution, and testing approach, but lacks structured sections matching the template format required by the repository. Reorganize the description to match the template sections (Summary, Motivation & Context, Type of Change, How Has This Been Tested, Checklist) for clarity and consistency.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main feature: adding support for custom Hugging Face endpoints via the HF_ENDPOINT environment variable.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/modelinfo/parsers/huggingface.py`:
- Around line 9-10: The _get_hf_endpoint() function accepts any value from the
HF_ENDPOINT environment variable without validation, which can result in empty
strings or unsafe URL schemes like file:// that bypass HTTP security. Add
validation logic to ensure the endpoint is a non-empty string with only http or
https schemes, and raise an appropriate exception if validation fails. This
validation should happen in _get_hf_endpoint() itself to prevent invalid
endpoints from being used in subsequent request operations throughout the
codebase.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 549e9692-e5ed-4385-ab60-a75117c22fa1

📥 Commits

Reviewing files that changed from the base of the PR and between ade289a and b9b6055.

📒 Files selected for processing (1)
  • src/modelinfo/parsers/huggingface.py

Comment thread src/modelinfo/parsers/huggingface.py Outdated
Add validation to _get_hf_endpoint() to reject empty values
and non-http(s) schemes. The function now raises ValueError
when HF_ENDPOINT is set to an empty string or uses a scheme
other than http:// or https://.

How to test:
- Set HF_ENDPOINT='' and run modelinfo; should raise ValueError
- Set HF_ENDPOINT='file:///etc' and run modelinfo; should raise ValueError
- Set HF_ENDPOINT='https://huggingface-proxy.example.com' and run modelinfo; should work
- Run without HF_ENDPOINT set; should default to https://huggingface.co

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/modelinfo/parsers/huggingface.py (1)

10-17: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add .strip() and validate netloc for robust edge-case handling.

The current validation has two gaps:

  1. Whitespace: Input like " https://hf-mirror.com " fails the scheme check with a confusing error because the leading space prevents the startswith match.
  2. Missing netloc: Input like "http:///path" passes all checks (starts with http://) but produces malformed URLs like http:///path/repo/resolve/....

The past review suggested using urllib.parse.urlparse to validate both scheme and netloc, which catches these edge cases cleanly.

🛡️ Recommended fix using urlparse
+from urllib.parse import urlparse
+
 def _get_hf_endpoint() -> str:
-    endpoint = os.environ.get("HF_ENDPOINT", "https://huggingface.co")
+    endpoint = os.environ.get("HF_ENDPOINT", "https://huggingface.co").strip()
     if not endpoint:
-        raise ValueError("HF_ENDPOINT is set but empty; expected a valid HTTP(S) URL")
+        endpoint = "https://huggingface.co"
     endpoint = endpoint.rstrip("/")
-    if not endpoint.startswith(("http://", "https://")):
+    
+    parsed = urlparse(endpoint)
+    if parsed.scheme not in {"http", "https"} or not parsed.netloc:
         raise ValueError(
-            f"HF_ENDPOINT must use http:// or https:// scheme, got: {endpoint}"
+            "HF_ENDPOINT must be an absolute http(s) URL, e.g. https://huggingface.co"
         )
     return endpoint
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/modelinfo/parsers/huggingface.py` around lines 10 - 17, The endpoint
validation logic has gaps in handling whitespace and missing netloc validation.
After the initial HF_ENDPOINT retrieval, add a `.strip()` call to remove leading
and trailing whitespace before the scheme check. Then replace the simple
`startswith` scheme validation with `urllib.parse.urlparse` to validate both the
scheme and netloc components, ensuring the URL is well-formed and has a valid
hostname. This approach catches edge cases like "  https://hf-mirror.com  "
(whitespace) and "http:///path" (missing netloc) cleanly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/modelinfo/parsers/huggingface.py`:
- Around line 10-17: The endpoint validation logic has gaps in handling
whitespace and missing netloc validation. After the initial HF_ENDPOINT
retrieval, add a `.strip()` call to remove leading and trailing whitespace
before the scheme check. Then replace the simple `startswith` scheme validation
with `urllib.parse.urlparse` to validate both the scheme and netloc components,
ensuring the URL is well-formed and has a valid hostname. This approach catches
edge cases like "  https://hf-mirror.com  " (whitespace) and "http:///path"
(missing netloc) cleanly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ddefd313-f896-4545-a066-a4789e4633e5

📥 Commits

Reviewing files that changed from the base of the PR and between b9b6055 and 0a4fec4.

📒 Files selected for processing (1)
  • src/modelinfo/parsers/huggingface.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/modelinfo/parsers/huggingface.py`:
- Around line 15-18: The endpoint validation in _get_hf_endpoint() currently
allows both http:// and https:// schemes, which can expose HF tokens over
unencrypted connections when the Authorization header is attached in
_make_request(). Modify the startswith() condition to only allow https:// by
changing the tuple from ("http://", "https://") to check exclusively for
"https://", and update the error message to reflect that only https:// is
permitted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b475957-d445-460b-8e37-431a5e995fa2

📥 Commits

Reviewing files that changed from the base of the PR and between 0a4fec4 and e226cf3.

📒 Files selected for processing (1)
  • src/modelinfo/parsers/huggingface.py

Comment thread src/modelinfo/parsers/huggingface.py Outdated
@pipe1os pipe1os linked an issue Jun 18, 2026 that may be closed by this pull request

@pipe1os pipe1os left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

Please address the two CodeRabbit review points before we merge:

  1. Update the scheme validation to check only for https://. Allowing http:// risks exposing Hugging Face tokens in plaintext when the script attaches the authorization header.
  2. Add unit tests for _get_hf_endpoint() to verify:
    • The default fallback works without the environment variable.
    • The function strips whitespace.
    • The code raises a ValueError for invalid endpoints (missing scheme or netloc).

@molloyzak13

Copy link
Copy Markdown
Contributor Author

reply_target: #31

Good catch on the plaintext risk with bearer tokens — _get_hf_endpoint() now rejects http:// with a ValueError. Also added five tests covering valid https, the default endpoint, http rejection, empty string, and URLs without a hostname. Tightened up the scheme check on the trailing-slash normalization that the bot flagged too.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_parsers.py (1)

55-58: ⚡ Quick win

Consider adding a test for trailing slash normalization.

The PR objectives state that the helper "handles trailing slashes so users can set either https://hf-mirror.com or https://hf-mirror.com/ formats interchangeably." However, there is no test validating this behavior. Adding a test case would document this feature and prevent regressions.

📋 Suggested test case
def test_hf_endpoint_strips_trailing_slash(monkeypatch):
    """Trailing slash is stripped from endpoint."""
    monkeypatch.setenv("HF_ENDPOINT", "https://hf-mirror.com/")
    assert _get_hf_endpoint() == "https://hf-mirror.com"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_parsers.py` around lines 55 - 58, The test suite is missing a test
case for the trailing slash normalization feature mentioned in the PR
objectives. Add a new test function called
test_hf_endpoint_strips_trailing_slash that follows the same pattern as
test_hf_endpoint_valid_https, where you use monkeypatch to set the HF_ENDPOINT
environment variable to a URL with a trailing slash (e.g.,
"https://hf-mirror.com/") and assert that calling _get_hf_endpoint() returns the
URL without the trailing slash, documenting and validating this interchangeable
format handling behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_parsers.py`:
- Around line 55-58: The test suite is missing a test case for the trailing
slash normalization feature mentioned in the PR objectives. Add a new test
function called test_hf_endpoint_strips_trailing_slash that follows the same
pattern as test_hf_endpoint_valid_https, where you use monkeypatch to set the
HF_ENDPOINT environment variable to a URL with a trailing slash (e.g.,
"https://hf-mirror.com/") and assert that calling _get_hf_endpoint() returns the
URL without the trailing slash, documenting and validating this interchangeable
format handling behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c662530b-3d25-4aa6-b4b8-3c8e00c8555b

📥 Commits

Reviewing files that changed from the base of the PR and between e226cf3 and b47161c.

📒 Files selected for processing (2)
  • src/modelinfo/parsers/huggingface.py
  • tests/test_parsers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/modelinfo/parsers/huggingface.py

@pipe1os pipe1os left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I will proceed to merge this.

@pipe1os pipe1os merged commit 99a7a5f into pipe1os:main Jun 18, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support Custom Hugging Face Endpoints via HF_ENDPOINT

2 participants