-
Notifications
You must be signed in to change notification settings - Fork 1
Fix FLASHDeconv Sequence View ions not updating on scan change #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,171 +1,115 @@ | ||
| # OpenMS Streamlit WebApp Template | ||
| # CLAUDE.md | ||
|
|
||
| ## What This Is | ||
|
|
||
| **This is the standard framework for building web applications for mass spectrometry (MS) data analysis**, used across the OpenMS ecosystem for proteomics and metabolomics research. When a researcher or developer needs a web-based tool for MS data processing, visualization, or analysis — whether for label-free quantification, untargeted metabolomics, top-down proteomics, or any other MS workflow — this template is how it gets built. | ||
|
|
||
| The template wraps **OpenMS/pyOpenMS** (the leading open-source C++/Python library for computational mass spectrometry) and its **TOPP tools** (a suite of ~200 command-line tools for MS data processing pipelines) into interactive Streamlit web applications. | ||
|
|
||
| ### Production Apps Built From This Template | ||
|
|
||
| - **OpenMS/quantms-web** — quantitative proteomics (DDA-LFQ, DDA-ISO, DIA-LFQ quantification) | ||
| - **OpenMS/umetaflow** — untargeted metabolomics (feature detection, alignment, annotation, GNPS molecular networking) | ||
| - **OpenMS/FLASHApp** — top-down proteomics (FLASHDeconv deconvolution result visualization) | ||
|
|
||
| ### Mass Spectrometry Domain Context | ||
|
|
||
| - **Input data** is typically mzML (raw MS spectra), featureXML (detected features), consensusXML (linked features across samples), idXML (peptide/protein identifications), traML (targeted transitions) | ||
| - **Typical workflows chain TOPP tools**: e.g., `FeatureFinderMetabo` (detect LC-MS features) → `FeatureLinkerUnlabeledKD` (align features across runs) → custom Python post-processing | ||
| - **Proteomics** focuses on peptide/protein identification and quantification (tools like `MSGFPlusAdapter`, `FidoAdapter`, `ProteinQuantifier`) | ||
| - **Metabolomics** focuses on feature detection, annotation, and statistical analysis (tools like `FeatureFinderMetabo`, `MetaboliteAdductDecharger`, `SiriusAdapter`) | ||
| - **pyOpenMS** provides Python bindings for programmatic MS data access — reading mzML files, manipulating spectra/chromatograms, computing molecular properties, etc. | ||
| - **MS-specific visualizations**: mass spectra (m/z vs intensity), chromatograms (RT vs intensity), peak maps (RT vs m/z 2D heatmaps), isotope patterns, fragment ion annotations, volcano plots for differential expression | ||
|
|
||
| ## Architecture | ||
|
|
||
| ``` | ||
| app.py # Entry point — registers pages via st.Page() in a dict | ||
| settings.json # App config: name, version, deployment mode, threading | ||
| default-parameters.json # Default workspace parameters (tracked via widget keys) | ||
| presets.json # Parameter presets for TOPP workflows | ||
| content/ # Streamlit pages (one .py per page) | ||
| src/ | ||
| common/common.py # Utilities: page_setup(), save_params(), show_fig(), show_table() | ||
| Workflow.py # Example WorkflowManager subclass (TOPP workflow) | ||
| workflow/ | ||
| WorkflowManager.py # Base class: upload/configure/execution/results pattern | ||
| StreamlitUI.py # Widget library: upload_widget, input_TOPP, input_python, etc. | ||
| ParameterManager.py # JSON parameter persistence + TOPP .ini generation | ||
| CommandExecutor.py # Runs TOPP tools and Python scripts as subprocesses | ||
| FileManager.py # Workspace file organization | ||
| Logger.py # Structured workflow logging | ||
| QueueManager.py # Redis queue for online deployments | ||
| python-tools/ # Custom Python analysis scripts (with DEFAULTS dicts) | ||
| Dockerfile # Full build: OpenMS + TOPP tools + pyOpenMS | ||
| Dockerfile_simple # Lightweight: pyOpenMS only | ||
| docker-compose.yml # Deployment config | ||
| ``` | ||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Key Patterns | ||
| ## What This Is | ||
|
|
||
| ### Pages | ||
| **FLASHApp** is a Streamlit web application for visualizing **top-down proteomics** results from the OpenMS FLASH* tool family. It is built on the [OpenMS streamlit-template](https://github.com/OpenMS/streamlit-template) and bundles three independent sub-applications, each registered as a section in `app.py`: | ||
|
|
||
| Every page starts with `page_setup()` which handles workspace initialization, sidebar rendering, and parameter loading: | ||
| - **FLASHDeconv** (⚡️) — spectral deconvolution: raw MS ion peaks → neutral monoisotopic masses (proteoforms), with isotope/charge resolution and FDR scoring. | ||
| - **FLASHTnT** (🧨) — tag-and-track top-down identification: runs FLASHDeconv, then matches short sequence "tags" against a protein FASTA database to identify proteins (PrSMs), with target/decoy FDR. | ||
| - **FLASHQuant** (📊) — proteoform quantification from FLASHDeconv mass traces (view-only; no run step). | ||
|
|
||
| ```python | ||
| from src.common.common import page_setup, save_params | ||
| params = page_setup() | ||
| ``` | ||
| The heavy lifting is done by **TOPP command-line tools** (`FLASHDeconv`, `FLASHTnT`, `DecoyDatabase`) shipped in the Docker image; the app drives them, parses their output into pandas DataFrames, caches them per workspace, and renders them through a custom **Vue.js Streamlit component** (`flash_viewer_grid`). | ||
|
|
||
| Pages are registered in `app.py` under named sections: | ||
| ## Commands | ||
|
|
||
| ```python | ||
| pages = { | ||
| "Section Name": [ | ||
| st.Page(Path("content", "my_page.py"), title="My Page", icon="🔬"), | ||
| ], | ||
| } | ||
| ``` | ||
| ```bash | ||
| # Run locally (online_deployment=false in settings.json → always "local" mode) | ||
| streamlit run app.py local | ||
|
|
||
| ### Parameters | ||
| # Unit tests (pytest; uses fakeredis, needs pyopenms importable) | ||
| pytest tests/ -v | ||
| pytest tests/test_selection_clear.py -v # single file | ||
| pytest tests/test_selection_clear.py::test_name -v # single test | ||
|
|
||
| Parameters are tracked via widget keys that match entries in `default-parameters.json`. The `save_params(params)` call at the end of a page persists any widget state changes: | ||
| # Lint (errors-only; mirrors .github/workflows/pylint.yml, which runs on `main`) | ||
| pylint $(git ls-files '*.py') --errors-only --disable=C0103,C0114,C0301,C0411,W0212,W0631,W0602,W1514,W2402,E0401,E1101,F0001,R1732 | ||
|
|
||
| ```python | ||
| params = page_setup() | ||
| st.number_input("X", value=params["my-param"], key="my-param") | ||
| save_params(params) | ||
| # Docker (full image with OpenMS + TOPP tools + Vue build) | ||
| docker build -f Dockerfile --no-cache -t flashapp:latest --build-arg GITHUB_TOKEN=<gh-token> . | ||
| docker run -p 8501:8501 flashapp:latest # → http://localhost:8501 | ||
| # Dockerfile.arm is the linux/arm64 variant (swaps miniforge installer to aarch64). | ||
| ``` | ||
|
|
||
| ### TOPP Workflows (WorkflowManager) | ||
|
|
||
| Complex workflows subclass `WorkflowManager` and implement 4 methods: | ||
| - `upload()` — file upload widgets via `self.ui.upload_widget()` | ||
| - `configure()` — TOPP params via `self.ui.input_TOPP()`, Python tool params via `self.ui.input_python()` | ||
| - `execution()` — run tools via `self.executor.run_topp()` and `self.executor.run_python()` | ||
| - `results()` — display outputs | ||
|
|
||
| Each workflow gets 4 content pages (upload, configure, run, results) that call `wf.show_*_section()`. | ||
| Python is pinned to **3.11** (matches the Docker runtime). `GITHUB_TOKEN` is required at build time to fetch the private `openms-streamlit-vue-component` submodule and OpenMS resources. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win Clarify Python version scope (runtime vs CI). Line 35 states Python is pinned to 3.11, but CI lint workflow uses Python 3.10 ( 🤖 Prompt for AI Agents |
||
|
|
||
| ### Prerequisites before any production / Docker build | ||
|
|
||
| These are already the defaults on the `main`/release branches; verify them when building or debugging a blank viewer: | ||
|
|
||
| 1. **Submodule present:** `git submodule init && git submodule update` (update to latest: `git submodule update --remote`). | ||
| 2. **Vue component built & copied:** the bundle in `js-component/dist/` is produced from the `openms-streamlit-vue-component/` submodule (a Vite/Vue project). **Always prefer building the bundle via Docker, never a local Node toolchain** — the repo `Dockerfile` `js-build` stage (`node:21` → `npm install && npm run build`) is the canonical, reproducible build. To rebuild the committed bundle from *local* submodule source (e.g. after editing a `.vue` file), use a small Docker stage that `COPY`s the local submodule and runs `npm run build`, then export and copy `dist/` over `js-component/dist/`: | ||
| ```dockerfile | ||
| FROM node:21 AS build | ||
| WORKDIR /openms-streamlit-vue-component | ||
| COPY . . | ||
| RUN npm install && npm run build | ||
| FROM scratch AS export | ||
| COPY --from=build /openms-streamlit-vue-component/dist / | ||
| ``` | ||
| ```bash | ||
| docker build -f vue-build.Dockerfile --target export \ | ||
| --output type=local,dest=./vue-dist openms-streamlit-vue-component | ||
| # then replace js-component/dist/ with ./vue-dist/ | ||
| ``` | ||
| Only the prebuilt `js-component/dist/` is committed; the submodule source is fetched separately. (A bare local `npm install && npm run build` also works but is **not** preferred — Docker guarantees the toolchain.) | ||
| 3. **`src/render/components.py` → `_RELEASE = True`** (loads the bundle from `js-component/dist/`). When `False`, the component is loaded from the Vite dev server at `http://localhost:5173` for live Vue development. | ||
| 4. **`.streamlit/config.toml` → `developmentMode = false`**. | ||
|
|
||
| > Build order matters: the OpenMS/TOPP build must precede the Vue build in the Dockerfiles (see recent commits reordering this). `--no-cache` is recommended for the full image. | ||
|
|
||
| Decorate `configure()` and `results()` with `@st.fragment` for partial reruns. | ||
| ## Architecture | ||
|
|
||
| ### Python Tools | ||
| ### The data pipeline (the core mental model) | ||
|
|
||
| Custom scripts in `src/python-tools/` define a `DEFAULTS` list for auto-generated UI: | ||
| Every sub-app follows the same path; understanding it requires reading `src/Workflow.py`, `src/parse/`, `src/workflow/FileManager.py`, and `src/render/`: | ||
|
|
||
| ```python | ||
| DEFAULTS = [ | ||
| {"key": "in", "value": [], "hide": True}, | ||
| {"key": "my-param", "value": 5, "name": "My Parameter", "help": "Description", | ||
| "min": 1, "max": 100, "step_size": 1, "widget_type": "slider"}, | ||
| ] | ||
| ``` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win Add a language identifier to the fenced code block. Line 68 opens a fenced block without a language, which triggers markdownlint MD040. Use a language hint (for example, 🧰 Tools🪛 markdownlint-cli2 (0.22.1)[warning] 68-68: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🤖 Prompt for AI AgentsSource: Linters/SAST tools |
||
|
|
||
| ### Presets | ||
|
|
||
| Parameter presets in `presets.json` map workflow names (lowercase, hyphens) to named parameter sets: | ||
|
|
||
| ```json | ||
| { | ||
| "workflow-name": { | ||
| "Preset Name": { | ||
| "_description": "Tooltip text", | ||
| "TOPPToolName": {"algorithm:section:param": value}, | ||
| "_general": {"custom-key": value} | ||
| } | ||
| } | ||
| } | ||
| mzML upload ──► WorkflowManager.execution() | ||
| └─ executor.run_topp('FLASHDeconv' / 'FLASHTnT' / 'DecoyDatabase') | ||
| └─ writes out_deconv.mzML, annotated.mzML, *.tsv, *.feature, *.msalign | ||
| └─ src/parse/* turns those into pandas DataFrames | ||
| └─ FileManager.store_data(dataset_id, key, df) ──► workspace cache | ||
| │ | ||
| Viewer page ◄── render_grid() (src/render/render.py) ◄── reads cached dfs + layout ◄──┘ | ||
| └─ get_component_function() ──► Vue `flash_viewer_grid` component | ||
| ``` | ||
|
|
||
| ## Visualization Libraries | ||
| - **Workflows:** `src/Workflow.py` defines `DeconvWorkflow`, `TagWorkflow` (FLASHTnT), and `QuantWorkflow`, all subclasses of `WorkflowManager` (`src/workflow/WorkflowManager.py`). Each implements `upload()`, `configure()`, `execution()`, `results()`. `execution()` runs TOPP tools, stores every output file via `FileManager.store_file()`, then calls the parsers. `TagWorkflow` chains `DecoyDatabase` → `FLASHDeconv` → `FLASHTnT`. | ||
| - **Parsers (`src/parse/`):** `deconv.py::parseDeconv`, `tnt.py::parseTnT`, `quant.py::parseQuant` / `flashquant.py::parseFLASHQuantOutput` are the entry points. The mzML→DataFrame heavy lifting lives in `masstable.py` (`parseFLASHDeconvOutput`, `parseFLASHTaggerOutput`) using a **multiprocessing pool**; `tag_resolution.py` maps tags ↔ proteoforms. | ||
| - **Renderers (`src/render/`):** `render.py` (`render_grid`, `render_component`) pushes state into the Vue component and reads user selections back out. `components.py` declares the component and defines the per-cell component classes: `PlotlyHeatmap`, `Tabulator` (Scan/Mass/Protein/Tag tables), `PlotlyLineplot`, `Plotly3Dplot`, `FDRPlotly`, `SequenceView`, `InternalFragmentMap`, `FLASHQuant`. `compression.py` compresses payloads sent to the browser; `StateTracker.py` tracks selection state across cells (cross-component linking via shared scan/mass identifiers). | ||
|
|
||
| Two libraries are commonly used in template-based apps for MS data visualization: | ||
| ### Pages and the Layout Manager | ||
|
|
||
| ### pyopenms-viz | ||
| Each sub-app's pages (in `content/<SubApp>/`) are registered in `app.py`. The distinctive FLASHApp concept is the **Layout Manager** (`FLASHDeconvLayoutManager.py`, `FLASHTnTLayoutManager.py`): a grid editor where the user composes which visualization components appear in which cells (≤5 experiments, ≤3 columns/row). The chosen layout is persisted to the workspace cache and the **Viewer** page renders it (falling back to a built-in default if none is saved). | ||
|
|
||
| Pandas DataFrame extension for MS visualization. Use the plotly backend in Streamlit: | ||
| - Layout is stored via `FileManager.store_data` under dataset key **`'layout'`** for FLASHDeconv and **`'flashtnt_layout'`** for FLASHTnT (separate namespaces — they share the underlying `deconv_dfs`/`anno_dfs` data but keep independent layouts). It is JSON-importable/exportable. | ||
| - **Sequence Input** (`FLASHDeconvSequenceInput.py`) saves a proteoform sequence + fixed modifications to the `'sequence'` dataset; doing so unlocks the `Sequence view` and `Internal fragment map` components in the Layout Manager. | ||
| - **FLASHQuant** is simpler: File Upload + a single fixed-layout Viewer, no Layout Manager, and uses a separate cache subdirectory. | ||
|
|
||
| ```python | ||
| import pyopenms_viz | ||
| df.plot.ms_spectrum(backend="plotly") # mass spectrum (m/z vs intensity) | ||
| df.plot.peak_map(backend="plotly") # 2D peak map (RT vs m/z heatmap) | ||
| df.plot.chromatogram(backend="plotly") # chromatogram (RT vs intensity) | ||
| df.plot.mobilogram(backend="plotly") # ion mobility trace | ||
| ``` | ||
| ### Workspaces & FileManager | ||
|
|
||
| Best for: publication-quality static/interactive plots, small-medium datasets, standard MS plot types. | ||
| State lives in per-session **workspaces** (`enable_workspaces: true`, `workspaces_dir: ".."` → `../workspaces-FLASHApp/`). `FileManager` (`src/workflow/FileManager.py`) is the single gateway to the workspace's `cache/`: `store_file`, `store_data`, `get_results`, `result_exists`, `get_results_list`, `get_files`, keyed by a `dataset_id` (typically `<filename>_<timestamp>`). Demo workspaces are seeded from `example-data/workspaces/` (`demo_workspaces` in `settings.json`). | ||
|
|
||
| ### OpenMS-Insight (t0mdavid-m/openms-insight) | ||
| ### Parameters | ||
|
|
||
| Vue.js-backed interactive Streamlit components for large MS datasets: | ||
| `ParameterManager` (`src/workflow/ParameterManager.py`) persists widget state to JSON and generates TOPP `.ini` files. `configure()` exposes TOPP tool parameters via `self.ui.input_TOPP('FLASHDeconv', exclude_parameters=[...], custom_defaults={...})`. **Widget keys must match keys in `default-parameters.json`.** `presets.json` holds named parameter bundles (`test_parameter_presets.py` guards this). | ||
|
|
||
| - `Table` — server-side pagination with Tabulator.js | ||
| - `LinePlot` — stick-style mass spectra via Plotly | ||
| - `Heatmap` — 2D scatter handling millions of points | ||
| - `VolcanoPlot` — differential expression visualization | ||
| - `SequenceView` — peptide sequence with fragment ion matching | ||
| ### Deployment & runtime (`entrypoint.sh`, `k8s/`) | ||
|
|
||
| Components support cross-linking via shared identifiers. Best for: large datasets (millions of points), cross-component interactivity, server-side pagination. | ||
| The container entrypoint starts **Redis** + one or more **RQ workers** (queue `openms-workflows`) + Streamlit. When `STREAMLIT_SERVER_COUNT > 1`, it runs N Streamlit instances behind an **nginx** load balancer with sticky-cookie session routing. `QueueManager` (`src/workflow/QueueManager.py`) offloads `execution()` to RQ when `online_deployment` is set. The entrypoint is written to tolerate **Apptainer/Singularity** read-only rootfs (all runtime state goes under `$RUNTIME_DIR`, default `/tmp/opendiakiosk`). `k8s/` is a kustomize base + `overlays/prod` deploying to namespace `openms` as `ghcr.io/openms/flashapp:latest` behind nginx/traefik ingress; `clean-up-workspaces.py` runs via cron for periodic GC. | ||
|
|
||
| ## Commands | ||
| ### CI (`.github/workflows/`) | ||
|
|
||
| ```bash | ||
| # Run locally | ||
| pip install -r requirements.txt | ||
| streamlit run app.py | ||
|
|
||
| # Run tests | ||
| python -m pytest tests/ | ||
|
|
||
| # Docker | ||
| docker-compose up --build | ||
| ``` | ||
| - `build-and-test.yml` — multi-arch (amd64 + arm64) Docker build → merged manifest, kustomize/kubeconform lint, and **container smoke tests** under apptainer / nginx-on-kind / traefik-on-kind, then publishes images + an ORAS SIF to GHCR. (Its "test" jobs are deployment smoke tests, **not** pytest.) | ||
| - `unit-tests.yml` — the pytest suite. `pylint.yml` — lint. `build-windows-executable-app.yaml` + `test-win-exe-w-embed-py.yaml` — the PyInstaller desktop build. `ghcr-cleanup.yml` — registry GC. | ||
|
|
||
| ## Conventions | ||
| ## Conventions & gotchas | ||
|
|
||
| - Page files go in `content/`, source logic in `src/` | ||
| - Widget keys must match parameter keys in `default-parameters.json` | ||
| - Workflow names use lowercase with hyphens: "My Workflow" -> "my-workflow" | ||
| - Use `show_fig()` and `show_table()` from `src/common/common.py` for consistent display | ||
| - Use `@st.fragment` on methods that should partially rerun (configure, results) | ||
| - TOPP tool parameters use colon-separated paths: `"algorithm:section:param_name"` | ||
| - **`app.py` sets multiprocessing start method to `spawn`** (polars + Unix fork are incompatible) and imports `pyopenms` early (required for the Windows build). Don't remove these. | ||
| - **Running workflows locally needs the TOPP binaries** (`FLASHDeconv`, `FLASHTnT`) on `PATH` — they ship only in the full Docker image. The upload/viewer/download paths work on pre-computed result files without them. | ||
| - **Windows packaged build:** `run_app.py` + `run_app_temp.spec` (PyInstaller). A `windows` arg in `sys.argv` triggers a working-directory `chdir` in `page_setup()`. | ||
| - Pages start with `page_setup()` from `src/common/common.py`, which initializes the workspace, sidebar, and params; call `save_params(params)` at the end. Use `show_fig()` / `show_table()` for consistent display. | ||
| - Decorate `configure()` / page sections with `@st.fragment` for partial reruns. | ||
| - Workflow display names map to lowercase-hyphenated keys ("FLASHTnT" → workflow dir / preset keys); TOPP params use colon paths (`tag:min_length`). | ||
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| +41 −29 | src/components/sequence/SequenceView.vue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Avoid documenting raw token pasting in shell commands.
Line 30 currently suggests placing a token directly on the command line, which risks credential leakage via shell history/process lists. Prefer env-var based usage in the example (for example,
--build-arg GITHUB_TOKEN="$GITHUB_TOKEN"after exporting it).🤖 Prompt for AI Agents