feat: support custom Hugging Face endpoints via HF_ENDPOINT env var#31
Conversation
WalkthroughAdds a ChangesHF Endpoint Centralization
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/modelinfo/parsers/huggingface.py`:
- Around line 9-10: The _get_hf_endpoint() function accepts any value from the
HF_ENDPOINT environment variable without validation, which can result in empty
strings or unsafe URL schemes like file:// that bypass HTTP security. Add
validation logic to ensure the endpoint is a non-empty string with only http or
https schemes, and raise an appropriate exception if validation fails. This
validation should happen in _get_hf_endpoint() itself to prevent invalid
endpoints from being used in subsequent request operations throughout the
codebase.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 549e9692-e5ed-4385-ab60-a75117c22fa1
📒 Files selected for processing (1)
src/modelinfo/parsers/huggingface.py
Add validation to _get_hf_endpoint() to reject empty values and non-http(s) schemes. The function now raises ValueError when HF_ENDPOINT is set to an empty string or uses a scheme other than http:// or https://. How to test: - Set HF_ENDPOINT='' and run modelinfo; should raise ValueError - Set HF_ENDPOINT='file:///etc' and run modelinfo; should raise ValueError - Set HF_ENDPOINT='https://huggingface-proxy.example.com' and run modelinfo; should work - Run without HF_ENDPOINT set; should default to https://huggingface.co
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/modelinfo/parsers/huggingface.py (1)
10-17:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd
.strip()and validatenetlocfor robust edge-case handling.The current validation has two gaps:
- Whitespace: Input like
" https://hf-mirror.com "fails the scheme check with a confusing error because the leading space prevents thestartswithmatch.- Missing netloc: Input like
"http:///path"passes all checks (starts withhttp://) but produces malformed URLs likehttp:///path/repo/resolve/....The past review suggested using
urllib.parse.urlparseto validate both scheme andnetloc, which catches these edge cases cleanly.🛡️ Recommended fix using
urlparse+from urllib.parse import urlparse + def _get_hf_endpoint() -> str: - endpoint = os.environ.get("HF_ENDPOINT", "https://huggingface.co") + endpoint = os.environ.get("HF_ENDPOINT", "https://huggingface.co").strip() if not endpoint: - raise ValueError("HF_ENDPOINT is set but empty; expected a valid HTTP(S) URL") + endpoint = "https://huggingface.co" endpoint = endpoint.rstrip("/") - if not endpoint.startswith(("http://", "https://")): + + parsed = urlparse(endpoint) + if parsed.scheme not in {"http", "https"} or not parsed.netloc: raise ValueError( - f"HF_ENDPOINT must use http:// or https:// scheme, got: {endpoint}" + "HF_ENDPOINT must be an absolute http(s) URL, e.g. https://huggingface.co" ) return endpoint🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/modelinfo/parsers/huggingface.py` around lines 10 - 17, The endpoint validation logic has gaps in handling whitespace and missing netloc validation. After the initial HF_ENDPOINT retrieval, add a `.strip()` call to remove leading and trailing whitespace before the scheme check. Then replace the simple `startswith` scheme validation with `urllib.parse.urlparse` to validate both the scheme and netloc components, ensuring the URL is well-formed and has a valid hostname. This approach catches edge cases like " https://hf-mirror.com " (whitespace) and "http:///path" (missing netloc) cleanly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/modelinfo/parsers/huggingface.py`:
- Around line 10-17: The endpoint validation logic has gaps in handling
whitespace and missing netloc validation. After the initial HF_ENDPOINT
retrieval, add a `.strip()` call to remove leading and trailing whitespace
before the scheme check. Then replace the simple `startswith` scheme validation
with `urllib.parse.urlparse` to validate both the scheme and netloc components,
ensuring the URL is well-formed and has a valid hostname. This approach catches
edge cases like " https://hf-mirror.com " (whitespace) and "http:///path"
(missing netloc) cleanly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ddefd313-f896-4545-a066-a4789e4633e5
📒 Files selected for processing (1)
src/modelinfo/parsers/huggingface.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/modelinfo/parsers/huggingface.py`:
- Around line 15-18: The endpoint validation in _get_hf_endpoint() currently
allows both http:// and https:// schemes, which can expose HF tokens over
unencrypted connections when the Authorization header is attached in
_make_request(). Modify the startswith() condition to only allow https:// by
changing the tuple from ("http://", "https://") to check exclusively for
"https://", and update the error message to reflect that only https:// is
permitted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 7b475957-d445-460b-8e37-431a5e995fa2
📒 Files selected for processing (1)
src/modelinfo/parsers/huggingface.py
pipe1os
left a comment
There was a problem hiding this comment.
Thank you for the contribution!
Please address the two CodeRabbit review points before we merge:
- Update the scheme validation to check only for
https://. Allowinghttp://risks exposing Hugging Face tokens in plaintext when the script attaches the authorization header. - Add unit tests for
_get_hf_endpoint()to verify:- The default fallback works without the environment variable.
- The function strips whitespace.
- The code raises a
ValueErrorfor invalid endpoints (missing scheme or netloc).
|
reply_target: #31 Good catch on the plaintext risk with bearer tokens — |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_parsers.py (1)
55-58: ⚡ Quick winConsider adding a test for trailing slash normalization.
The PR objectives state that the helper "handles trailing slashes so users can set either
https://hf-mirror.comorhttps://hf-mirror.com/formats interchangeably." However, there is no test validating this behavior. Adding a test case would document this feature and prevent regressions.📋 Suggested test case
def test_hf_endpoint_strips_trailing_slash(monkeypatch): """Trailing slash is stripped from endpoint.""" monkeypatch.setenv("HF_ENDPOINT", "https://hf-mirror.com/") assert _get_hf_endpoint() == "https://hf-mirror.com"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_parsers.py` around lines 55 - 58, The test suite is missing a test case for the trailing slash normalization feature mentioned in the PR objectives. Add a new test function called test_hf_endpoint_strips_trailing_slash that follows the same pattern as test_hf_endpoint_valid_https, where you use monkeypatch to set the HF_ENDPOINT environment variable to a URL with a trailing slash (e.g., "https://hf-mirror.com/") and assert that calling _get_hf_endpoint() returns the URL without the trailing slash, documenting and validating this interchangeable format handling behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_parsers.py`:
- Around line 55-58: The test suite is missing a test case for the trailing
slash normalization feature mentioned in the PR objectives. Add a new test
function called test_hf_endpoint_strips_trailing_slash that follows the same
pattern as test_hf_endpoint_valid_https, where you use monkeypatch to set the
HF_ENDPOINT environment variable to a URL with a trailing slash (e.g.,
"https://hf-mirror.com/") and assert that calling _get_hf_endpoint() returns the
URL without the trailing slash, documenting and validating this interchangeable
format handling behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: c662530b-3d25-4aa6-b4b8-3c8e00c8555b
📒 Files selected for processing (2)
src/modelinfo/parsers/huggingface.pytests/test_parsers.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/modelinfo/parsers/huggingface.py
pipe1os
left a comment
There was a problem hiding this comment.
Thank you. I will proceed to merge this.
The Hugging Face parser hardcodes
https://huggingface.coin five separate URLs acrossfetch_huggingface_repo()and_fetch_safetensors_header(). This breaks for anyone behind a proxy or mirror, and there's no way to redirect requests without modifying the source.Added a
_get_hf_endpoint()helper that readsHF_ENDPOINTfrom the environment, falling back tohttps://huggingface.cowith a trailing-slash strip so users can set eitherhttps://hf-mirror.comorhttps://hf-mirror.com/. Replaced all five hardcoded base URLs with calls to it: the API models endpoint, config.json, model.safetensors.index.json, single model.safetensors, and the safetensors header fetch inside_fetch_safetensors_header.Tested with an unset
HF_ENDPOINT(default behaviour preserved, fetches metadata as before) and withHF_ENDPOINT=https://hf-mirror.com(requests route to the mirror). No new dependencies, no API changes, no breaking changes.Summary by CodeRabbit
HF_ENDPOINTenvironment variable, enabling use of custom Hugging Face instances.HF_ENDPOINT.