Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: CI

on:
push:
branches: [master]
pull_request:

jobs:
build:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ['3.10', '3.11', '3.12', '3.13']

steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"

- name: isort
run: isort . --check-only --diff

- name: pycodestyle
run: pycodestyle .

- name: pydocstyle
run: pydocstyle sp_cli

- name: mypy
run: mypy sp_cli

- name: pytest
run: pytest
23 changes: 23 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Python
__pycache__/
*.py[cod]
*.egg-info/
.eggs/
build/
dist/

# Virtual environments
venv/
.venv/
env/

# Tooling caches
.pytest_cache/
.mypy_cache/
.coverage
htmlcov/

# OS / editor
.DS_Store
.idea/
.vscode/
71 changes: 71 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# sp — CCExtractor Sample Platform CLI

`sp` is a command-line client for the [CCExtractor Sample Platform](https://github.com/CCExtractor/sample-platform)
REST API. It lets a developer **or an AI agent** investigate CI runs end-to-end
from the terminal — no web frontend required.

Output defaults to **JSON** (ideal for agents and scripts), with a human-friendly
`-o table` view.

## Install

```bash
pip install -e .
```

This installs the `sp` command.

## Configure

`sp` needs to know where the API lives and (optionally) a bearer token:

```bash
export SP_BASE_URL=https://sampleplatform.ccextractor.org/api/v1 # or your instance
export SP_API_TOKEN=<your-token> # if the API requires auth
```

Both can also be passed per-command with `--base-url` and `--token`.

## Usage

```bash
sp # banner / help
sp health # API + dependency health
sp run ls # list CI runs
sp run summary <run_id> # pass/fail summary for a run
sp run failures <run_id> # failing tests, each auto-classified
sp run diff <run_id> <id> # expected-vs-actual diff for a result
sp run logs <run_id> # raw run logs
sp investigate <run_id> # one-shot triage: info + counts + classified failures
```

Add `-o table` to any command for a human-readable view (default is JSON):

```bash
sp -o table investigate 9299
```

### The classifier

`sp` labels each failure with a stable code — `SEGFAULT`, `ABORT`, `TIMEOUT`,
`EXIT_CODE_MISMATCH`, `MISSING_OUTPUT`, `OUTPUT_DIFF`, `PASS` — so a person or an
agent gets a straight answer about *why* a test failed, without reading logs.

## Development

```bash
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

isort . --check-only # import order
pycodestyle . # style
pydocstyle sp_cli # docstrings
mypy sp_cli # types
pytest # tests
```

## Relationship to the platform

`sp` is a **client**: it talks to the Sample Platform's REST API over HTTP. It is
deliberately kept in its own repository, separate from the platform server that
gets deployed on the VM. Point it at any deployment via `SP_BASE_URL`.
22 changes: 22 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[build-system]
requires = ["setuptools>=61"]
build-backend = "setuptools.build_meta"

[project]
name = "sp-cli"
version = "0.1.0"
description = "AI-friendly CLI for the CCExtractor CI / Sample Platform"
requires-python = ">=3.10"
dependencies = ["click", "requests"]

[project.optional-dependencies]
dev = ["pytest", "pycodestyle", "pydocstyle", "isort", "mypy"]

[project.scripts]
sp = "sp_cli.main:cli"

[tool.setuptools]
packages = ["sp_cli", "sp_cli.commands"]

[tool.pytest.ini_options]
testpaths = ["tests"]
15 changes: 15 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[pycodestyle]
max-line-length = 120
ignore = E701
exclude = .git,.venv,venv,build,dist,*.egg-info

[pydocstyle]
convention = numpy
add-ignore = D100,D104

[isort]
skip = .venv,venv,build,dist

[mypy]
python_version = 3.10
ignore_missing_imports = True
9 changes: 9 additions & 0 deletions sp_cli/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""``sp`` — an AI-friendly command-line client for the CCExtractor Sample Platform.
The CLI is a thin layer over the Sample Platform JSON API (``/api/v1``). It is
designed to be driven by AI agents as well as humans: it emits machine-readable
JSON by default and uses non-zero exit codes plus a consistent error envelope on
failure, so it can be scripted without screen-scraping the web UI.
"""

__version__ = "0.1.0"
6 changes: 6 additions & 0 deletions sp_cli/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Allow the CLI to be run as ``python -m sp_cli``."""

from sp_cli.main import cli

if __name__ == '__main__':
cli()
54 changes: 54 additions & 0 deletions sp_cli/banner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
"""Branded welcome screen for the ``sp`` CLI.

Shown only when ``sp`` is invoked with no subcommand. Never emitted on command
output, so machine consumers (agents parsing JSON) are unaffected. Colors are
applied via :func:`click.style` and are auto-stripped when output is piped.
"""

import click

from sp_cli import __version__

#: Figlet-style "sp" wordmark.
LOGO = r""" ___ _ __
/ __| '_ \
\__ \ |_) |
|___/ .__/
|_|"""

_GROUPS = [
('TRIAGE', 'sp investigate <run> ← one-shot: what failed and why'),
('RUNS', 'sp run ls · show · summary · failures · results · result · diff · artifacts · logs · errors'),
('SAMPLES', 'sp sample ls · show · history'),
('TESTS', 'sp regression ls'),
('SYSTEM', 'sp health · queue'),
('AUTH', 'sp auth login · logout'),
]

_EXAMPLES = [
('sp investigate 9299', 'triage a run end-to-end'),
('sp run failures 9299', 'failing tests, each labeled with why'),
('sp run diff 9299 137', 'expected-vs-actual diff (ids auto-resolved)'),
]


def show_welcome() -> None:
"""Print the branded welcome screen (banner, command map, examples)."""
click.echo()
click.echo(click.style(LOGO, fg='cyan'))
click.echo(f" {click.style('CCExtractor CI', bold=True)} · AI-friendly CLI · v{__version__}")
click.echo(" drive CI investigations from the terminal — no UI, no HTML scraping")
click.echo()

for name, line in _GROUPS:
click.echo(f" {click.style(name.ljust(8), fg='green', bold=True)} {line}")
click.echo()

click.echo(f" {click.style('Examples', bold=True)}")
for command, note in _EXAMPLES:
click.echo(f" {command.ljust(28)} {click.style('# ' + note, fg='bright_black')}")
click.echo()

click.echo(f" {click.style('Help', bold=True)} sp COMMAND --help"
f" {click.style('Config', bold=True)} SP_BASE_URL · SP_API_TOKEN")
click.echo()
110 changes: 110 additions & 0 deletions sp_cli/classifier.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
"""Rule-based classification of regression-test failures into stable codes.

Deterministic, no ML: maps the raw signals a test run exposes (exit code,
expected return code, output presence, pass-history) onto a small, stable
taxonomy so an agent can branch on *why* a test failed instead of parsing
prose. Platform differences are normalized — e.g. a segfault surfaces as ``139``
on Linux and ``-1073741819`` (0xC0000005) on Windows; both classify as
``SEGFAULT``.

Each classification returns a ``code`` (stable, machine-readable), a
``confidence`` (``high`` for unambiguous exit-code rules, ``medium`` for
output-based ones), a human ``reason``, and ``regression`` (True if the test was
passing before — a real regression; False if it never passed; None if unknown).
"""

from typing import Any, Dict, Optional

# --- Failure codes (stable; downstream tools may pin on these) ---------------
CODE_PASS = "PASS"
CODE_SEGFAULT = "SEGFAULT"
CODE_ABORT = "ABORT"
CODE_TIMEOUT = "TIMEOUT"
CODE_MISSING_OUTPUT = "MISSING_OUTPUT"
CODE_EXIT_CODE_MISMATCH = "EXIT_CODE_MISMATCH"
CODE_OUTPUT_DIFF = "OUTPUT_DIFF"
CODE_UNKNOWN = "UNKNOWN"

# --- Exit codes that denote a crash, normalized across platforms -------------
#: SIGSEGV (128+11) on Linux, raw -11, and 0xC0000005 access violation on Windows.
_SEGFAULT_CODES = frozenset({139, -11, -1073741819})
#: SIGABRT (128+6) on Linux and raw -6.
_ABORT_CODES = frozenset({134, -6})
#: `timeout` exit (124) and SIGTERM (143 / -15).
_TIMEOUT_CODES = frozenset({124, 143, -15})


def classify(exit_code: Optional[int], expected_rc: Optional[int], *,
has_output_diff: bool = False, missing_output: bool = False,
has_ever_passed: Optional[bool] = None) -> Dict[str, Any]:
"""
Classify a single regression-test result into a stable failure code.

Rules are evaluated most-severe first (crash > timeout > missing output >
exit-code mismatch > output diff), so the most actionable signal wins.

:param exit_code: The process exit code observed for the test.
:type exit_code: Optional[int]
:param expected_rc: The exit code the test was expected to return.
:type expected_rc: Optional[int]
:param has_output_diff: True if a differing output file was recorded.
:type has_output_diff: bool
:param missing_output: True if output was expected but none was produced.
:type missing_output: bool
:param has_ever_passed: Whether this test has ever passed (history), if known.
:type has_ever_passed: Optional[bool]
:return: ``{code, confidence, reason, regression}``.
:rtype: Dict[str, Any]
"""
regression = _regression_state(has_ever_passed)

if exit_code in _SEGFAULT_CODES:
return _result(CODE_SEGFAULT, "high",
f"Crash (segfault / access violation), exit {exit_code}", regression)
if exit_code in _ABORT_CODES:
return _result(CODE_ABORT, "high", f"Aborted (SIGABRT), exit {exit_code}", regression)
if exit_code in _TIMEOUT_CODES:
return _result(CODE_TIMEOUT, "high", f"Timed out / terminated, exit {exit_code}", regression)
if missing_output:
return _result(CODE_MISSING_OUTPUT, "high",
"No output was produced but one was expected", regression)
if exit_code != expected_rc:
return _result(CODE_EXIT_CODE_MISMATCH, "high",
f"Exited {exit_code}, expected {expected_rc}", regression)
if has_output_diff:
return _result(CODE_OUTPUT_DIFF, "medium",
"Exit code matched but output differs from expected", regression)

return _result(CODE_PASS, "high", "Exit code matched and no output diff recorded", regression)


def _regression_state(has_ever_passed: Optional[bool]) -> Optional[bool]:
"""
Translate pass-history into the ``regression`` flag.

:param has_ever_passed: Whether the test has ever passed, if known.
:type has_ever_passed: Optional[bool]
:return: True if a real regression, False if never worked, None if unknown.
:rtype: Optional[bool]
"""
if has_ever_passed is None:
return None
return bool(has_ever_passed)


def _result(code: str, confidence: str, reason: str, regression: Optional[bool]) -> Dict[str, Any]:
"""
Assemble a classification result dict.

:param code: The stable failure code.
:type code: str
:param confidence: ``high`` or ``medium``.
:type confidence: str
:param reason: Human-readable explanation.
:type reason: str
:param regression: Regression flag (see :func:`_regression_state`).
:type regression: Optional[bool]
:return: The assembled result.
:rtype: Dict[str, Any]
"""
return {"code": code, "confidence": confidence, "reason": reason, "regression": regression}
Loading