Architecture
What this tool does
code-to-module converts a script or Git repository into a submission-ready
nf-core module directory. The conversion pipeline is a linear sequence: ingest the
source, discover distinct CLI entry points, assess complexity, call Claude to infer
channel names and the shell command, resolve a container, choose a test data strategy,
and render the module files from Jinja2 templates. The result is a main.nf,
meta.yml, environment.yml, and nf-test spec that the author reviews before
submitting to nf-core.
The tool is LLM-assisted, not fully autonomous. Claude handles only the inference step — reading source code and optional documentation to determine what the module's inputs, outputs, and shell command should be. Everything else is deterministic: rule- based discovery, API-backed container resolution, schema-driven generation. The output is best-effort; structural invariants (meta as first input, eval() version capture, ext.args wiring) are enforced by post-processing guards, but output glob patterns and process labels need human review before submission.
Module map
| File | Purpose | Key exports |
|---|---|---|
ingest.py |
Accepts file paths, directories, Git URLs; fetches --docs content |
ingest() → CodeSource |
discover.py |
Rule-based then LLM-based functionality detection; interactive selection UI | discover(), select_functionalities() → DiscoveryResult |
assess.py |
Assigns Tier 1–5 complexity to each FunctionalitySpec | assess() → (tier, confidence, warnings) |
infer.py |
Calls Claude API on one FunctionalitySpec; enforces post-processing invariants | infer() → ModuleSpec |
container.py |
Two-phase: discover all container options in parallel, then select one | discover(), select() → ContainerOption |
bioconda.py |
Checks Bioconda for existing packages; generates meta.yaml recipe scaffolds | check_bioconda(), generate_recipe() |
test_data_match.py |
Strategy 1: matches channel specs to nf-core/test-datasets files | match_test_data() → list[ChannelTestData] |
test_data_derive.py |
Strategies 2a/2b: derives test data or chains an upstream nf-core module | derive_test_data(), chain_test_data() |
test_gen.py |
Orchestrates test data strategy selection; produces TestSpec and derive_test_data.sh |
generate_test_spec() → TestSpec |
generate.py |
Renders Jinja2 templates into module files; applies post-processing fixes | generate_module() → list[Path] |
quick_lint.py |
Fast structural checks after generation (no subprocess) | quick_lint() → list[LintWarning] |
validate.py |
Runs nf-core modules lint and nf-test; classifies failures A/B/C |
run_validation() → TestReport |
fix.py |
Proposes rule-based (A) and LLM-assisted (B) fixes as diffs; applies on approval | propose_fixes(), apply_approved_fixes() |
review.py |
Static analysis against nf-core style conventions | review_module() → ReviewReport |
models.py |
All Pydantic v2 data models shared across the pipeline | CodeSource, ModuleSpec, TestSpec, ContainerOption, … |
api.py |
Clean programmatic API wrapping the full pipeline; no Rich output | convert() → dict |
cli.py |
code-to-module Click group: convert, assess-only, containers, bioconda-recipe, update-standards |
— |
validate_cli.py |
validate-module Click group: test, fix, review |
— |
regression.py |
Parses generated modules and scores them against nf-core reference modules | parse_module(), score_module() |
standards/loader.py |
Loads and caches nf_core_standards.json; exposes the Standards singleton |
get_standards() → Standards |
The conversion pipeline
flowchart TD
subgraph convert ["code-to-module convert"]
A[script or repo] --> B[ingest]
B --> C[discover]
C --> D{≥2 functionalities?}
D -- yes --> E[selection UI]
D -- no --> F[assess]
E --> F
F --> G[infer via Claude]
G --> H[container discovery]
H --> I[container selection]
I --> J[test data strategy]
J --> K[generate]
K --> L[quick_lint]
end
subgraph validate ["validate-module"]
M[test] --> N{failures?}
N -- yes --> O[fix]
O --> M
N -- no --> P[review]
end
L --> M
P --> Q[submit PR to nf-core]
ingest accepts a local file path, directory, or Git URL, clones if necessary, and
builds a CodeSource containing the repo manifest and any documentation fetched via
--docs. It also parses --existing-modules directories so downstream stages can
align container URLs and channel conventions to modules already in use.
discover is deliberately split from assess and infer. Rule-based detectors run
first (Click/Typer decorators, argparse subparsers, shell case dispatch, multiple
top-level scripts) because they are fast and deterministic. The LLM fallback only fires
when rule-based detection finds zero or one functionality. This keeps discovery
reproducible: two runs on the same repo produce the same FunctionalitySpec list
regardless of LLM non-determinism.
assess assigns a complexity tier (1–5) to each FunctionalitySpec. Tier drives
the default container strategy and determines how much of the module can be completed
automatically — Tier 5 means generation cannot proceed and the user is told why.
infer is the only step that calls the Claude API. It sends the relevant code
section plus any documentation to Claude and receives back channel names, types, and
the shell command. Post-processing guards then enforce structural invariants that Claude
is not expected to get right every time: val(meta) as the first input, the eval()
version capture pattern, and no TODO placeholders in required fields.
container discovery and selection run as two separate phases. Discovery runs all
checks in parallel (Dockerfile in repo, environment.yml, requirements.txt,
Singularity.def, Bioconda/BioContainers API) and collects every available option. No
check is skipped because a better one was found. Selection then applies the tier-aware
default order — BioContainers ranks first for Tier 1–2 tools, repo files rank first for
Tier 3–5 — or presents an interactive menu on a TTY.
test data strategy tries four strategies in priority order until one succeeds for each input channel. See Test data strategies below.
generate renders the module files from Jinja2 templates, then quick_lint runs
a fast structural check in-process. Any warnings are surfaced in the CLI summary
without requiring nf-core lint to be installed.
The validation pipeline
validate-module test runs nf-core modules lint and nf-test against a module
directory, captures their output, and classifies each failure into one of three
classes. Class A failures have deterministic structure — wrong emit name, missing
topic: versions tag, container URL prefix error, missing ext.args pattern — and
can be fixed by a deterministic rule. Class B failures require reading the module
in context — wrong output glob, wrong process label — and are addressed by an
LLM-assisted fix with a lower trust signal. Class C failures cannot be resolved
automatically; the tool explains what is wrong and stops.
validate-module fix presents each fixable failure as a coloured diff panel labelled
with its source ([rule] or [llm]) and waits for explicit approval before writing
anything. This is non-negotiable: silent file modification would make the fix command
a liability rather than an aid. After applying approved fixes, validation re-runs
automatically to confirm the failures are resolved.
validate-module review performs static analysis only — no subprocess calls, no
nf-test execution. It checks channel naming conventions, process label
appropriateness, ext.args usage, meta.yml completeness, versions channel structure,
and EDAM ontology coverage, returning a ReviewReport with ERROR / WARNING / INFO
severity levels.
The import boundary (critical for contributors)
The codebase is split into two strictly separated halves:
┌─────────────────────────────────┐ ┌──────────────────────────────────┐
│ Conversion pipeline │ │ Validation suite │
│ │ │ │
│ ingest → discover → assess │ │ validate.py │
│ → infer → container │ │ fix.py │
│ → test_gen → generate │ │ review.py │
│ │ │ validate_cli.py │
│ cli.py api.py │ │ │
└─────────────────────────────────┘ └──────────────────────────────────┘
↑ ↑
│ standards/ │
└──────────── (shared) ─────────────────┘
validate_cli.py must never import from ingest, discover, assess, infer,
container, test_gen, or generate. standards/ must never import from any
sibling module in either half. This boundary exists so the validation suite can be
extracted into a standalone nf-module-tools package later without renaming
anything. A test enforces it: pytest tests/test_import_boundaries.py. If you add
an import and this test fails, the boundary has been violated — restructure rather
than loosen the rule.
nf-core conventions enforced by the generator
These are applied automatically. Contributors do not need to implement them manually, but should know they exist when reviewing generated output.
| Convention | Where enforced | Reference |
|---|---|---|
val(meta) as first input channel |
infer.py post-processing guard |
nf-core module template |
eval() version capture pattern |
main.nf.j2 template |
fastqc, samtools/sort modules |
| Both Singularity and Docker container URLs | container.py + main.nf.j2 |
nf-core container policy |
topic: versions on versions output channel |
main.nf.j2 template |
nf-core 3.5+ convention |
All params via ext.args (none hardcoded) |
main.nf.j2 template |
nf-core module guidelines |
Full specification: nf-co.re/docs/guidelines/components/modules
The standards schema
src/code_to_module/standards/data/nf_core_standards.json is the single source of
truth for all nf-core conventions used by the generator — valid process labels,
container registry URLs, required meta.yml fields, EDAM ontology mappings, test
dataset paths, and derivation templates. Nothing in Python code or Jinja2 templates
should hardcode an nf-core convention directly; it must come from the schema via
get_standards(). This makes convention updates a one-file change. To update the
bundled schema, edit nf_core_standards.json, bump schema_version, add an entry
to STANDARDS_CHANGELOG.md, and run code-to-module update-standards to check
whether the remote schema has a newer version.
Test data strategies
The generator tries strategies in this order and stops at the first match for each input channel.
| Strategy | When used | Output |
|---|---|---|
| 1 — Match | A file of the right format already exists in nf-core/test-datasets | Path reference in main.nf.test |
| 2a — Derive | No exact match; file can be subsetted or transformed from existing nf-core data | derive_test_data.sh + path reference |
| 2b — Chain | Input is the natural output of a known upstream nf-core module | setup {} block in main.nf.test |
| 3 — Stub | Data is too large, proprietary, or cannot be derived | stub: block; tests run with nf-test --stub |
Derive (2a) is intentionally ranked above chain (2b). A stored file keeps CI test
runtime predictable; a setup {} block runs an upstream module on every test
invocation, adding latency that nf-core reviewers notice.
Adding a new feature
The most common contribution patterns, in order:
-
Adding a new review check — add a function to
review.pythat accepts a module path andStandardsobject and returns a list ofReviewItem. Add the function to thereview_module()call chain. Write a test intests/test_review.pycovering both the pass and fail cases. -
Adding a new generated file — add a
.j2template tosrc/code_to_module/templates/. Add any new data fields to the relevant Pydantic model inmodels.py. Updategenerate.pyto render the template. Add a test fixture and a test intests/test_generate.pythat asserts the file is written and contains the expected content. -
Extending the standards schema — add the new convention to
src/code_to_module/standards/data/nf_core_standards.json. Bumpschema_version. Add a changelog entry inSTANDARDS_CHANGELOG.md. Update any code that reads from the schema to use the new field viaget_standards(). Never duplicate the value in Python code. -
All of the above — run
pytest -x -q -m "not network and not llm"after each step, andpytest tests/test_import_boundaries.pybefore opening a PR.
Known limitations and planned improvements
- Library-only tools with no CLI entry point (e.g. tools designed to be called as
import foo; foo.run(...)) are not yet supported — discovery finds no entry points and generation stops. Alibrary-to-modulepath that generates a CLI wrapper first is planned. - Perl wrappers and non-Python CLI tools (TrimGalore, for example) have no Python AST to analyse and tend to assess as Tier 4–5. The tool reports the tier honestly rather than producing a misleading partial module.
- Domain-specific test data (
.h5ad,.pkl,.mzML, large databases) is not in nf-core/test-datasets, so those input channels fall back to stub strategy. The generated test is syntactically valid but cannot test real data flow.
Open issues and the development roadmap are tracked at github.com/svigneau/code-to-module/issues.