Skip to content

[Feature] : Add REST API endpoints for access through AI #1117

Open
pulk17 wants to merge 2 commits into
CCExtractor:masterfrom
pulk17:Feature/REST-API-Endpoints
Open

[Feature] : Add REST API endpoints for access through AI #1117
pulk17 wants to merge 2 commits into
CCExtractor:masterfrom
pulk17:Feature/REST-API-Endpoints

Conversation

@pulk17

@pulk17 pulk17 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used the project.
  • I have used the project briefly.
  • I have used the project extensively, but have not contributed previously.
  • I am an active contributor to the project.

Feature: JSON REST API for the CCExtractor CI Platform (mod_api)

Summary

This PR introduces a complete, spec-driven JSON REST API for the CCExtractor Sample Platform, mounted at /api/v1. It replaces the need for scrapers or direct database access by exposing every CI data point (runs, samples, results, diffs, logs, errors, artifacts, queue status) through authenticated, validated, and rate-limited endpoints.

The API is fully described by the openapi-ci-api.yaml specification and validated against it using Schemathesis property-based testing. It integrates cleanly with the existing data model (mod_test, mod_regression, mod_sample, mod_auth) without modifying any of those modules.


Motivation

The existing platform serves server-rendered HTML pages and lacks a programmatic interface. This makes it impossible for:

  • External dashboards and monitoring tools to query CI status
  • Automated pipelines to trigger test runs or approve baselines
  • Frontend rewrites (e.g., a React or Vue SPA) to consume structured data
  • Third-party integrations to pull regression test results

This API solves all of the above by providing a standard REST interface with proper authentication, authorization, pagination, filtering, and error handling.


Architecture

The module is organized into five layers, each with a single responsibility:

mod_api/
├── __init__.py          # Blueprint registration + import ordering
├── utils.py             # Pagination helpers, sort utilities
├── models/
│   └── api_token.py     # ApiToken SQLAlchemy model (argon2 hashed)
├── middleware/
│   ├── auth.py          # Bearer token auth + scope/role decorators
│   ├── error_handler.py # Centralized JSON error responses
│   ├── rate_limit.py    # Per-client sliding window rate limiter
│   ├── security.py      # HSTS, CSP, X-Frame-Options headers
│   └── validation.py    # Body, pagination, path ID, date range validators
├── schemas/
│   ├── common.py        # Base schema config
│   ├── auth.py          # Token create/list schemas
│   ├── errors.py        # Error item/summary schemas
│   ├── results.py       # Baseline approval request schema
│   ├── runs.py          # Run list/create/progress schemas
│   ├── samples.py       # Sample history entry schema
│   └── system.py        # Health/queue response schemas
├── services/
│   ├── status.py        # Run/sample status derivation (single source of truth)
│   ├── diff_service.py  # Structured diff computation (JSON hunks)
│   ├── error_service.py # Error classification from TestProgress/TestResult
│   ├── log_service.py   # Build log reader with cursor pagination
│   └── storage.py       # GCS signed URL + local fallback resolution
└── routes/
    ├── auth.py          # Token create, list, revoke
    ├── runs.py          # Run list, create, detail, summary, progress, config, cancel
    ├── samples.py       # Run samples, media catalog, sample history, regression tests
    ├── results.py       # Expected/actual output, structured diffs, baseline approval
    ├── errors_logs.py   # Test errors, infra errors, error summary, build logs
    └── system.py        # Health check, queue status, artifact listing

Why this structure matters

  • Middleware runs as before_request/after_request hooks, so route handlers never contain auth, validation, or rate-limit boilerplate
  • Services encapsulate all business logic — route handlers are thin orchestrators that call services and return formatted responses
  • Schemas (Marshmallow) handle both input validation (load) and output serialization (dump), ensuring the API contract matches the OpenAPI spec
  • Status derivation is centralized in services/status.py to prevent inconsistent status logic across endpoints

Endpoints

Authentication (/auth/tokens)

Method Path Auth Description
POST /auth/tokens None (email/password in body) Authenticate and issue a scoped Bearer token
GET /auth/tokens Bearer + tokens:manage List your tokens (admins can pass ?all=true)
DELETE /auth/tokens/current Bearer Revoke the token in the current request
DELETE /auth/tokens/{id} Bearer Revoke a token by ID (admins can revoke anyone's)

Runs (/runs)

Method Path Scope Description
GET /runs runs:read List runs with filters: platform, branch, commit_sha, repository, status, created_after/before, sort
POST /runs runs:write Trigger a new test run for a commit + platform
GET /runs/{id} runs:read Get a single run's details
GET /runs/{id}/summary runs:read Pass/fail/skip/error counts
GET /runs/{id}/progress runs:read Timeline of CI worker progress events
GET /runs/{id}/config runs:read Run configuration and regression test matrix
POST /runs/{id}/cancel runs:write Cancel a queued or running test (idempotent)

Samples (/samples, /runs/{id}/samples, /regression-tests)

Method Path Scope Description
GET /runs/{id}/samples runs:read Per-sample regression test results for a run
GET /runs/{id}/samples/{sid} runs:read Single regression test result in a run
GET /samples runs:read Media sample catalog with filters: name, extension, tag, sha256, status
GET /samples/{id} runs:read Single media sample detail
GET /samples/{id}/history runs:read Cross-run history with failure signatures
GET /regression-tests runs:read Regression test definitions with active, category, tag, sample_id filters

Results (/runs/{id}/samples/{sid}/regression-tests/{rid}/outputs/{oid})

Method Path Scope Description
GET .../expected results:read Expected output file (text or base64)
GET .../actual results:read Actual output file (303 redirect if identical to expected)
GET .../diff results:read Structured JSON diff or unified diff between expected and actual
POST /runs/{id}/samples/{sid}/baseline-approval baselines:write + admin/contributor Approve actual output as the new expected baseline

Errors & Logs (/runs/{id}/errors, /runs/{id}/logs)

Method Path Scope Description
GET /runs/{id}/errors results:read Test-level errors for a run
GET /runs/{id}/infrastructure-errors system:read Infra errors (VM, build, worker failures)
GET /runs/{id}/error-summary results:read Grouped error counts by type/severity/sample
GET /runs/{id}/logs system:read Build log with cursor-based pagination and level, source, contains filters
GET /runs/{id}/samples/{sid}/logs system:read Reserved placeholder endpoint for per-sample logs (currently returns 404 as the CI worker doesn't support this yet)

System (/system)

Method Path Auth Description
GET /system/health None Health check with DB, local storage, and GCS dependency status
GET /system/queue system:read Queue depth, running count, and job listing
GET /runs/{id}/artifacts results:read List all artifacts for a run (GCS + local, with signed download URLs)

Security

Token Authentication

  • Tokens are prefixed with spci_ for easy identification in logs and secret scanners
  • Only the argon2id hash is stored in the database — the plaintext token is returned exactly once at creation time and never logged
  • Token lookup uses a prefix index for O(1) candidate narrowing, then verifies the full hash against each candidate
  • Expired and revoked tokens are rejected at the middleware layer before any route handler executes
  • Failed authentication always performs a dummy hash verification to prevent timing-based user enumeration

Scope-Based Authorization

Six scopes control access granularity:

Scope Grants
runs:read List/view runs, samples, regression tests
runs:write Trigger new runs, cancel existing runs
results:read View expected/actual output, diffs, errors
baselines:write Approve new baselines (admin/contributor only)
system:read View queue, infrastructure errors, build logs
tokens:manage List and revoke tokens

Non-admin users cannot request baselines:write. The scope check returns 403 with details about which scopes are missing and which scopes the token holds, making debugging straightforward.

Role Enforcement

Certain operations additionally require specific user roles:

  • POST /runs/{id}/cancel — admin, contributor, or tester
  • POST /runs/{id}/samples/{sid}/baseline-approval — admin or contributor
  • POST /runs (Triggering runs):
    • Scope: You ALWAYS need the runs:write scope, regardless of the repository.
    • Role: To trigger runs for the main repository, you must be an admin, contributor, or tester. To trigger runs for your own fork repository, any authenticated user (who owns the fork and has runs:write scope) can do so.

Rate Limiting

Per-client sliding window rate limits protect against abuse:

Endpoint Limit Window Key
POST /auth/tokens 5 requests 15 min Client IP
Mutating endpoints (POST/DELETE/PUT/PATCH) 20 requests 1 min Token ID
Read endpoints (GET) 120 requests 1 min Token ID

Rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After) are included on every response. A background eviction process prunes stale entries every 100 requests to bound memory usage.

Caveat: Rate limit state is stored in-memory (thread-safe dict). This works perfectly for single-process/single-server deployments and multi-threaded Gunicorn. However, if the platform is ever scaled to multiple Gunicorn workers or multiple servers behind a load balancer, the rate limiter should be migrated to Redis so that counters are shared across processes. This is a known limitation documented in the code.

Rate limiting is automatically disabled when app.config['TESTING'] is True to prevent interference with test suites. In production, the TESTING flag is never set.

Security Headers

Every API response includes:

  • Strict-Transport-Security: max-age=31536000; includeSubDomains
  • Content-Security-Policy: default-src 'none'; frame-ancestors 'none'
  • X-Content-Type-Options: nosniff
  • X-Frame-Options: DENY

Path Traversal Protection

All file-serving endpoints (/expected, /actual, /diff) use _safe_resolve() which calls os.path.realpath() and verifies the resolved path is within the TestResults base directory. File extensions from the database are sanitized to remove /, \, and .. sequences before being appended to filenames.


Input Validation

All validation happens at the middleware layer through reusable decorators:

  • @validate_body(Schema) — Validates JSON body against a Marshmallow schema. Returns 415 for non-JSON content types, 400 for malformed JSON, and 400 with field-level error details for schema violations
  • @validate_offset_pagination() — Extracts and validates limit (1–100) and offset (≥0, ≤2147483647). Rejects requests that mix cursor and offset pagination
  • @validate_cursor_pagination() — Extracts and validates limit and cursor for log endpoints
  • @validate_path_id('param') — Ensures URL path parameters are positive integers within int32 range
  • @validate_date_range — Parses created_after/created_before ISO 8601 params and rejects inverted ranges
  • @validate_sort() — Validates the sort query parameter against a whitelist (created_at, -created_at, run_id, -run_id)
  • LIKE wildcard escaping — The GET /samples name filter escapes % and _ to prevent SQL LIKE injection

Status Derivation

Status logic is centralized in services/status.py to prevent inconsistencies:

Run Status

The existing test.failed property only checks for TestStatus.canceled and is not reliable for determining whether regression tests passed. Instead, we derive status from the latest TestProgress row and, for completed runs, count actual failures from TestResult rows:

Derived Status Condition
queued No TestProgress rows
running Latest progress is preparation or testing
canceled Latest progress is canceled
pass Latest is completed + zero failures
fail Latest is completed + one or more failures
incomplete Any other raw status

Sample Status

Derived Status Condition
pass Exit code matches expected AND all outputs match
fail Exit code mismatch OR any output has a diff
missing_output Dummy sentinel row detected
not_started No TestResult row exists

Dummy Row Detection

The CI worker writes a sentinel TestResultFile row with regression_test_output_id = -1 and got = 'error' when a test produces no output. This row is filtered from all API responses and drives the missing_output status.

Deployment prerequisite: Before deploying to production, run this query to verify no old-format sentinel rows exist that would be missed:

SELECT COUNT(*) FROM test_result_file
WHERE (test_id = -1 OR regression_test_id = -1)
  AND NOT (regression_test_output_id = -1 AND got = 'error');

If the count is > 0, a data migration is needed first.


Pagination

Two pagination strategies are used:

Offset Pagination (most endpoints)

{
  "data": [...],
  "pagination": {
    "limit": 50,
    "offset": 0,
    "total": 142,
    "next_offset": 50
  }
}

Cursor Pagination (build logs)

{
  "data": [...],
  "pagination": {
    "limit": 100,
    "next_cursor": 3847
  }
}

Cursor pagination is used for logs because they are read from files, not from the database, and offset-based access would require reading and discarding lines on every request.


Error Response Format

All errors follow a consistent structure:

{
  "code": "validation_error",
  "message": "Request failed schema validation.",
  "details": {
    "fields": {
      "email": ["Missing data for required field."],
      "password": ["Shorter than minimum length 1."]
    }
  }
}

HTTP exceptions (404, 405, 500) are caught and reformatted into this structure so that API clients never receive HTML error pages.


Performance Considerations

Batch Status Derivation

batch_get_run_data() in services/status.py preloads all TestProgress, TestResult, and TestResultFile rows for a batch of runs in three queries (instead of N+1), then computes statuses in-memory. This is critical for the GET /runs list endpoint.

Status Filtering Caveat

When GET /runs?status=running is requested, the API must load up to 1,000 runs and compute their derived status in-memory (because status is not a database column). A truncated: true flag is included in the pagination response when the 1,000-run cap is hit. This is documented in the response schema.

Result File Preloading

Endpoints that iterate over TestResult rows (run summary, run samples, sample history) use defaultdict-based preloading of TestResultFile rows to avoid N+1 queries.

Diff Size Limits

The diff endpoint rejects files larger than 10 MiB and directs clients to the download URL instead. Output files are truncated at 1 MiB with a truncated: true flag and a GCS download link for the full file.


Database Changes

New Table: api_token

Column Type Notes
id INT PRIMARY KEY Auto-increment
user_id INT FK → user.id CASCADE on delete/update
token_name VARCHAR(50) Unique per user (uq_user_token_name)
token_hash VARCHAR(255) Argon2id hash
token_prefix VARCHAR(16) First 16 chars, indexed for fast lookup
scopes_json TEXT JSON array of scope strings
created_at DATETIME(tz) UTC timestamp
expires_at DATETIME(tz) UTC timestamp
revoked_at DATETIME(tz) NULL until revoked

No existing tables are modified.


Test Coverage

Test Organization (265 tests total)

tests/api/
├── test_middleware_auth.py           (13 tests)  — Token validation, public endpoints, scope/role checks
├── test_middleware_error_handler.py  (9 tests)   — JSON error formatting, HTTP exception handling
├── test_middleware_rate_limit.py     (8 tests)   — Sliding window, eviction, header injection
├── test_middleware_validation.py     (13 tests)  — Body, pagination, path ID, date range, sort validation
├── test_models_api_token.py          (3 tests)   — Token generation, hashing, expiry, revocation
├── test_routes_auth.py               (17 tests)  — Token lifecycle: create, list, revoke, edge cases
├── test_routes_errors_logs.py        (12 tests)  — Error listing, infra errors, error summary, log reading
├── test_routes_results.py            (12 tests)  — Expected/actual output, diff, baseline approval
├── test_routes_runs.py               (19 tests)  — Run CRUD, filtering, sorting, cancellation
├── test_routes_samples.py            (10 tests)  — Sample catalog, history, regression test listing
├── test_routes_system.py             (8 tests)   — Health check, queue status, artifact listing
├── test_schemathesis.py              (79 tests)  — OpenAPI contract fuzzing against every endpoint
├── test_services_diff_service.py     (10 tests)  — Structured diff, hunk parsing, edge cases
├── test_services_error_service.py    (8 tests)   — Error classification from TestProgress/TestResult
├── test_services_log_service.py      (10 tests)  — Log reading, cursor pagination, filtering
├── test_services_status.py           (14 tests)  — Run/sample status derivation, batch computation
├── test_services_storage.py          (8 tests)   — GCS signed URLs, local fallback, path resolution
└── test_utils.py                     (12 tests)  — Pagination helpers, sort column resolution

Schemathesis (Property-Based Testing)

The test_schemathesis.py module validates every endpoint against the OpenAPI specification using Schemathesis, which generates hundreds of edge-case inputs per endpoint. It covers:

  • Broad schema fuzzing across all endpoints
  • Per-endpoint targeted validation (runs, samples, auth, results, diffs)
  • Negative security testing (invalid auth, missing headers)
  • Response invariant checks
  • Boundary and edge-case coverage

Hypothesis is configured with max_examples=5 and deadline=None to keep CI runtime reasonable (~7 minutes locally) while still providing meaningful coverage. Rate limiting is bypassed during testing via app.config['TESTING'].

What the tests mock

  • config_parser.parse_config → returns an in-memory SQLite config
  • google.cloud.storage.Client.from_service_account_json → returns a mock GCS client
  • Individual service functions are mocked at the route level to isolate handler logic from DB state

New Files

Path Lines Purpose
mod_api/__init__.py 28 Blueprint setup, middleware/route registration
mod_api/utils.py 73 Pagination and sort helpers
mod_api/models/api_token.py 142 ApiToken model with argon2 hashing
mod_api/middleware/auth.py 132 Bearer auth + scope/role decorators
mod_api/middleware/error_handler.py ~110 Centralized error response formatting
mod_api/middleware/rate_limit.py ~130 Sliding window rate limiter
mod_api/middleware/security.py 12 Security response headers
mod_api/middleware/validation.py 321 Input validation decorators
mod_api/schemas/*.py ~400 Marshmallow schemas for all endpoints
mod_api/services/status.py 211 Status derivation logic
mod_api/services/diff_service.py 202 Structured diff computation
mod_api/services/error_service.py ~230 Error classification
mod_api/services/log_service.py ~100 Build log reader
mod_api/services/storage.py 65 GCS + local artifact resolution
mod_api/routes/*.py ~1,900 All endpoint handlers
tests/api/*.py ~3,000 265 tests across 19 test files
openapi-ci-api.yaml ~2,700 OpenAPI 3.0 specification

Total new code: ~5,500 lines (source) + ~3,000 lines (tests) + ~2,700 lines (OpenAPI spec)


Dependencies

New pip dependencies:

  • argon2-cffi — Token hashing (argon2id)
  • marshmallow — Schema validation and serialization
  • schemathesis — OpenAPI contract testing (test dependency only)
  • hypothesis — Property-based testing engine (test dependency only, pulled in by schemathesis)

All other dependencies (flask, sqlalchemy, passlib, google-cloud-storage) were already in requirements.txt.


How to Test Locally

# Activate virtual environment
source .venv/bin/activate  # or .\.venv\Scripts\activate on Windows

# Run all API tests
pytest tests/api/ -v

# Run only unit tests (skip schemathesis)
pytest tests/api/ -v --ignore=tests/api/test_schemathesis.py

# Run linters against API code
python run_checks.py mod_api tests/api --skip-nose

Known Limitations & Future Work

  1. Rate limit storage: In-memory only (see Security section above). Migrate to Redis if scaling to multiple workers.
  2. Status filtering: GET /runs?status=X requires in-memory filtering with a 1,000-run cap. A future improvement would be to materialize run status as a database column updated by a trigger or the CI worker.
  3. Per-sample logs: GET /runs/{id}/samples/{sid}/logs returns 404 because the CI worker doesn't produce per-sample log files. The endpoint is implemented and documented as a placeholder for when the worker adds this capability.
  4. Dummy row detection: See the deployment prerequisite SQL query in the Status Derivation section above.
  5. GCS blob existence: The artifact endpoint does not call blob.exists() before generating signed URLs (to avoid latency). Clients should handle 404s from the signed URL gracefully.

@pulk17 pulk17 marked this pull request as draft June 5, 2026 13:55
@pulk17 pulk17 changed the title [Feature] : Add REST API endpoints for access thorough AI (Pending !) [Feature] : Add REST API endpoints for access through AI (Pending !) Jun 5, 2026
@pulk17 pulk17 force-pushed the Feature/REST-API-Endpoints branch 2 times, most recently from ab72a13 to 1441393 Compare June 8, 2026 14:39
@pulk17 pulk17 changed the title [Feature] : Add REST API endpoints for access through AI (Pending !) [Feature] : Add REST API endpoints for access through AI (WIP !) Jun 11, 2026
@pulk17 pulk17 force-pushed the Feature/REST-API-Endpoints branch from 1f3c1d5 to 094f3af Compare June 17, 2026 07:24
@pulk17 pulk17 changed the title [Feature] : Add REST API endpoints for access through AI (WIP !) [Feature] : Add REST API endpoints for access through AI Jun 17, 2026
@pulk17 pulk17 marked this pull request as ready for review June 21, 2026 07:47
@sonarqubecloud

Copy link
Copy Markdown

@cfsmp3

cfsmp3 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

While @canihavesomecoffee has a bit of time to review, some notes:

  • This PR is huge. It should be split into easy to review and merge ones - start with one that does the basic scaffolding and then ones that add the different endpoints (in a way that they can be reviewed and merged in order).
  • Other than that it well organized so splitting it should be easy since not a lot of refactoring will be needed (or maybe nothing at all)

I asked claude to review it. Take a look:

CRITICAL

C1 — Missing-output detection is broken; failing runs report pass. (verified)
services/status.py:71 is_dummy_row() keys on a TestResultFile row with regression_test_output_id == -1, got == 'error'. That row is never written to the DB. The only persisted TestResultFile rows (mod_ci/controllers.py:2588,2670) use a real rto.id. The TestResultFile(-1,-1,-1,'',got) at mod_test/controllers.py:118 is an in-memory object for HTML rendering only — never db.add-ed. The real "missing output" condition is absence of TestResultFile rows while the regression test has a non-ignored RegressionTestOutput (see reference logic at mod_test/controllers.py:107-118).

Consequence in derive_sample_status (status.py:57-69): empty result_files → both loops skip → if exit code matches → returns pass. Cascades into run status, /summary counts, /errors, /error-summary. The whole point of this platform is catching regressions; this silently passes tests that produced no output. The "DEPLOYMENT PREREQUISITE" SQL in the docstring is also misleading — it checks for old rows that were likewise never persisted. Must rewrite missing-output derivation against expected RegressionTestOutputs.


HIGH

H1 — Privilege escalation: any user can revoke anyone's tokens. (verified) routes/auth.py:64 puts tokens:manage in allowed_scopes for every role (only baselines:write is admin-gated). revoke_specific_token (auth.py:175) has no @require_roles; its only cross-user guard is has_scope('tokens:manage') — which a plain user now holds. → low-priv user can revoke admins' tokens (lockout/DoS). Fix: gate tokens:manage to admins, or restrict non-admins to revoking their own tokens regardless of scope.

H2 — POST /runs ownership bypass → arbitrary-repo CI builds. routes/runs.py:116 _validate_run_permissions: is_staff (admin/tester/contributor) short-circuits the ownership check, and repository is only regex-validated, never confirmed to exist/be-owned. A contributor can enqueue a run against any owner/repo string, creating Fork rows and making build VMs clone/build attacker-chosen repos. SSRF/resource-abuse surface.

H3 — 'testuser' short-circuit poisons real data. mod_auth/controllers.py: fetch_username_from_token returns 'testuser' whenever config['TESTING'] is truthy, and github_callback now persists github_login = 'testuser' to the real user row. TESTING is env-var controlled (run.py:109). If ever set in a prod-ish env, it corrupts the value H2's owner-check trusts. Don't couple security-relevant data to TESTING, and never write the stub to the DB.

H4 — API 500s return HTML, not JSON. middleware/error_handler.py:101 @mod_api.errorhandler(500) never fires — Flask doesn't dispatch the 500 code to blueprint handlers; the app-level handler renders 500.html. after_app_request only JSON-ifies 404/405. Any unhandled exception breaks the "always JSON" contract. The unit test calls the handler directly so it passes. Fix at app level for /api/v1 paths. (The MarshmallowValidationError/SQLAlchemyError class handlers do work — only the 500 one is dead.)

H5 — verify_schemathesis.py (938 lines) is dead. CI uses nose2 which collects test*.py — this file (verify_*) is never collected, and it imports schemathesis/hypothesis which aren't in requirements (ImportError if run). The headline "contract test" provides zero coverage. Wire it in or drop it.

H6 — N+1 in get_sample_history. routes/samples.py:318 calls get_run_timestamps → batch_get_run_data([test]) per row = 3 queries × N runs, re-querying already-loaded data. Defeats the PR's own batching claim. Batch once over the distinct tests.

H7 — POST /runs/{id}/cancel check-then-act race. routes/runs.py:481 reads status, then inserts a canceled TestProgress with no lock/re-check → duplicate cancels, or canceled written after completed (flips a finished run). "Idempotent" only holds on the already-terminal fast path. Use with_for_update() and re-check.


MEDIUM (condensed)

  • Rate limiter (middleware/rate_limit.py): it's a fixed window mislabeled "sliding" (allows 2× burst at boundary); per-process so N× limits under multiple gunicorn workers; unbounded global dict = memory-exhaustion DoS (eviction only >900s, no cap); remote_addr keying self-DoSes the 5/15min token endpoint if behind a proxy without ProxyFix.
  • Flask-Limiter declared but never imported — dead dependency (the limiter is hand-rolled).
  • Scope creep: ~20 unrelated dep bumps (sqlalchemy, flask, lxml, google libs, pytz==2026.2) + test-dep bumps (mypy 2.1, isort 8) bundled into a feature PR. Split them; couples the API merge to a possible lint break. Confirm pytz==2026.2/mypy==2.1.0 actually exist on PyPI.
  • Loose pins marshmallow>=3.21 (allows breaking 4.x), argon2-cffi>=23.0, Flask-Limiter>=3.5 in an otherwise all-== file. Pin exactly.
  • diff_service.read_lines does f.readlines() (whole file) with the 10MiB guard only in the route, not the service.
  • storage.resolve_artifact local fallback has no traversal guard (defense-in-depth gap; current callers feed DB hashes).
  • GCS signed URLs minted without blob.exists() → broken links / bucket-layout leak.
  • validate_date_range can compare naive vs aware datetimes → TypeError → 500 (HTML, per H4).
  • Status filter 1000-run cap silently drops matches past position 1000 and reports an inconsistent total.
  • /summary total_samples (planned set) never reconciles with pass/fail/missing counts (existing TestResult rows only).
  • DELETE token scope spec mismatch; conftest.py pytest fixture dead under nose2.

LOW/NIT

No Cache-Control: no-store on the one-time token response; timing oracle on the no-candidate auth path (no dummy verify, unlike the login path); no unique constraint on token_prefix; extension sanitization brittle (_safe_resolve saves it); progress.step hardcoded None; HSTS sent unconditionally.


Bottom line: C1 alone blocks merge — the API would misreport CI results, defeating the platform's purpose. H1–H4 are real security/contract bugs. Architecture and tests are solid; this is a fixable PR, not a rewrite. Recommend: request changes, lead with C1, require the dep bumps be split out.

Want me to (a) post this as a PR review on GitHub, (b) save it to plans/PR_1117_REVIEW.md, or (c) deep-dive/repro any specific finding (e.g. write a failing test that proves C1)?

@cfsmp3 cfsmp3 self-requested a review June 23, 2026 19:56

@cfsmp3 cfsmp3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants