Architecture

What this tool does

code-to-module converts a script or Git repository into a submission-ready nf-core module directory. The conversion pipeline is a linear sequence: ingest the source, discover distinct CLI entry points, assess complexity, call Claude to infer channel names and the shell command, resolve a container, choose a test data strategy, and render the module files from Jinja2 templates. The result is a main.nf, meta.yml, environment.yml, and nf-test spec that the author reviews before submitting to nf-core.

The tool is LLM-assisted, not fully autonomous. Claude handles only the inference step — reading source code and optional documentation to determine what the module's inputs, outputs, and shell command should be. Everything else is deterministic: rule- based discovery, API-backed container resolution, schema-driven generation. The output is best-effort; structural invariants (meta as first input, eval() version capture, ext.args wiring) are enforced by post-processing guards, but output glob patterns and process labels need human review before submission.

Module map

File	Purpose	Key exports
`ingest.py`	Accepts file paths, directories, Git URLs; fetches `--docs` content	`ingest() → CodeSource`
`discover.py`	Rule-based then LLM-based functionality detection; interactive selection UI	`discover(), select_functionalities() → DiscoveryResult`
`assess.py`	Assigns Tier 1–5 complexity to each FunctionalitySpec	`assess() → (tier, confidence, warnings)`
`infer.py`	Calls Claude API on one FunctionalitySpec; enforces post-processing invariants	`infer() → ModuleSpec`
`container.py`	Two-phase: discover all container options in parallel, then select one	`discover(), select() → ContainerOption`
`bioconda.py`	Checks Bioconda for existing packages; generates meta.yaml recipe scaffolds	`check_bioconda(), generate_recipe()`
`test_data_match.py`	Strategy 1: matches channel specs to nf-core/test-datasets files	`match_test_data() → list[ChannelTestData]`
`test_data_derive.py`	Strategies 2a/2b: derives test data or chains an upstream nf-core module	`derive_test_data(), chain_test_data()`
`test_gen.py`	Orchestrates test data strategy selection; produces TestSpec and `derive_test_data.sh`	`generate_test_spec() → TestSpec`
`generate.py`	Renders Jinja2 templates into module files; applies post-processing fixes	`generate_module() → list[Path]`
`quick_lint.py`	Fast structural checks after generation (no subprocess)	`quick_lint() → list[LintWarning]`
`validate.py`	Runs `nf-core modules lint` and `nf-test`; classifies failures A/B/C	`run_validation() → TestReport`
`fix.py`	Proposes rule-based (A) and LLM-assisted (B) fixes as diffs; applies on approval	`propose_fixes(), apply_approved_fixes()`
`review.py`	Static analysis against nf-core style conventions	`review_module() → ReviewReport`
`models.py`	All Pydantic v2 data models shared across the pipeline	`CodeSource, ModuleSpec, TestSpec, ContainerOption, …`
`api.py`	Clean programmatic API wrapping the full pipeline; no Rich output	`convert() → dict`
`cli.py`	`code-to-module` Click group: convert, assess-only, containers, bioconda-recipe, update-standards	—
`validate_cli.py`	`validate-module` Click group: test, fix, review	—
`regression.py`	Parses generated modules and scores them against nf-core reference modules	`parse_module(), score_module()`
`standards/loader.py`	Loads and caches `nf_core_standards.json`; exposes the Standards singleton	`get_standards() → Standards`

The conversion pipeline

flowchart TD
    subgraph convert ["code-to-module convert"]
        A[script or repo] --> B[ingest]
        B --> C[discover]
        C --> D{≥2 functionalities?}
        D -- yes --> E[selection UI]
        D -- no --> F[assess]
        E --> F
        F --> G[infer via Claude]
        G --> H[container discovery]
        H --> I[container selection]
        I --> J[test data strategy]
        J --> K[generate]
        K --> L[quick_lint]
    end

    subgraph validate ["validate-module"]
        M[test] --> N{failures?}
        N -- yes --> O[fix]
        O --> M
        N -- no --> P[review]
    end

    L --> M
    P --> Q[submit PR to nf-core]

ingest accepts a local file path, directory, or Git URL, clones if necessary, and builds a CodeSource containing the repo manifest and any documentation fetched via --docs. It also parses --existing-modules directories so downstream stages can align container URLs and channel conventions to modules already in use.

discover is deliberately split from assess and infer. Rule-based detectors run first (Click/Typer decorators, argparse subparsers, shell case dispatch, multiple top-level scripts) because they are fast and deterministic. The LLM fallback only fires when rule-based detection finds zero or one functionality. This keeps discovery reproducible: two runs on the same repo produce the same FunctionalitySpec list regardless of LLM non-determinism.

assess assigns a complexity tier (1–5) to each FunctionalitySpec. Tier drives the default container strategy and determines how much of the module can be completed automatically — Tier 5 means generation cannot proceed and the user is told why.

infer is the only step that calls the Claude API. It sends the relevant code section plus any documentation to Claude and receives back channel names, types, and the shell command. Post-processing guards then enforce structural invariants that Claude is not expected to get right every time: val(meta) as the first input, the eval() version capture pattern, and no TODO placeholders in required fields.

container discovery and selection run as two separate phases. Discovery runs all checks in parallel (Dockerfile in repo, environment.yml, requirements.txt, Singularity.def, Bioconda/BioContainers API) and collects every available option. No check is skipped because a better one was found. Selection then applies the tier-aware default order — BioContainers ranks first for Tier 1–2 tools, repo files rank first for Tier 3–5 — or presents an interactive menu on a TTY.

test data strategy tries four strategies in priority order until one succeeds for each input channel. See Test data strategies below.

generate renders the module files from Jinja2 templates, then quick_lint runs a fast structural check in-process. Any warnings are surfaced in the CLI summary without requiring nf-core lint to be installed.

The validation pipeline

validate-module test runs nf-core modules lint and nf-test against a module directory, captures their output, and classifies each failure into one of three classes. Class A failures have deterministic structure — wrong emit name, missing topic: versions tag, container URL prefix error, missing ext.args pattern — and can be fixed by a deterministic rule. Class B failures require reading the module in context — wrong output glob, wrong process label — and are addressed by an LLM-assisted fix with a lower trust signal. Class C failures cannot be resolved automatically; the tool explains what is wrong and stops.

validate-module fix presents each fixable failure as a coloured diff panel labelled with its source ([rule] or [llm]) and waits for explicit approval before writing anything. This is non-negotiable: silent file modification would make the fix command a liability rather than an aid. After applying approved fixes, validation re-runs automatically to confirm the failures are resolved.

validate-module review performs static analysis only — no subprocess calls, no nf-test execution. It checks channel naming conventions, process label appropriateness, ext.args usage, meta.yml completeness, versions channel structure, and EDAM ontology coverage, returning a ReviewReport with ERROR / WARNING / INFO severity levels.

The import boundary (critical for contributors)

The codebase is split into two strictly separated halves:

┌─────────────────────────────────┐   ┌──────────────────────────────────┐
│     Conversion pipeline         │   │      Validation suite            │
│                                 │   │                                  │
│  ingest → discover → assess     │   │  validate.py                     │
│  → infer → container            │   │  fix.py                          │
│  → test_gen → generate          │   │  review.py                       │
│                                 │   │  validate_cli.py                 │
│  cli.py  api.py                 │   │                                  │
└─────────────────────────────────┘   └──────────────────────────────────┘
         ↑                                        ↑
         │            standards/                  │
         └──────────── (shared) ─────────────────┘

validate_cli.py must never import from ingest, discover, assess, infer, container, test_gen, or generate. standards/ must never import from any sibling module in either half. This boundary exists so the validation suite can be extracted into a standalone nf-module-tools package later without renaming anything. A test enforces it: pytest tests/test_import_boundaries.py. If you add an import and this test fails, the boundary has been violated — restructure rather than loosen the rule.

nf-core conventions enforced by the generator

These are applied automatically. Contributors do not need to implement them manually, but should know they exist when reviewing generated output.

Convention	Where enforced	Reference
`val(meta)` as first input channel	`infer.py` post-processing guard	nf-core module template
`eval()` version capture pattern	`main.nf.j2` template	fastqc, samtools/sort modules
Both Singularity and Docker container URLs	`container.py` + `main.nf.j2`	nf-core container policy
`topic: versions` on versions output channel	`main.nf.j2` template	nf-core 3.5+ convention
All params via `ext.args` (none hardcoded)	`main.nf.j2` template	nf-core module guidelines

Full specification: nf-co.re/docs/guidelines/components/modules

The standards schema

src/code_to_module/standards/data/nf_core_standards.json is the single source of truth for all nf-core conventions used by the generator — valid process labels, container registry URLs, required meta.yml fields, EDAM ontology mappings, test dataset paths, and derivation templates. Nothing in Python code or Jinja2 templates should hardcode an nf-core convention directly; it must come from the schema via get_standards(). This makes convention updates a one-file change. To update the bundled schema, edit nf_core_standards.json, bump schema_version, add an entry to STANDARDS_CHANGELOG.md, and run code-to-module update-standards to check whether the remote schema has a newer version.

Test data strategies

The generator tries strategies in this order and stops at the first match for each input channel.

Strategy	When used	Output
1 — Match	A file of the right format already exists in nf-core/test-datasets	Path reference in `main.nf.test`
2a — Derive	No exact match; file can be subsetted or transformed from existing nf-core data	`derive_test_data.sh` + path reference
2b — Chain	Input is the natural output of a known upstream nf-core module	`setup {}` block in `main.nf.test`
3 — Stub	Data is too large, proprietary, or cannot be derived	`stub:` block; tests run with `nf-test --stub`

Derive (2a) is intentionally ranked above chain (2b). A stored file keeps CI test runtime predictable; a setup {} block runs an upstream module on every test invocation, adding latency that nf-core reviewers notice.

Adding a new feature

The most common contribution patterns, in order:

Adding a new review check — add a function to review.py that accepts a module path and Standards object and returns a list of ReviewItem. Add the function to the review_module() call chain. Write a test in tests/test_review.py covering both the pass and fail cases.
Adding a new generated file — add a .j2 template to src/code_to_module/templates/. Add any new data fields to the relevant Pydantic model in models.py. Update generate.py to render the template. Add a test fixture and a test in tests/test_generate.py that asserts the file is written and contains the expected content.
Extending the standards schema — add the new convention to src/code_to_module/standards/data/nf_core_standards.json. Bump schema_version. Add a changelog entry in STANDARDS_CHANGELOG.md. Update any code that reads from the schema to use the new field via get_standards(). Never duplicate the value in Python code.
All of the above — run pytest -x -q -m "not network and not llm" after each step, and pytest tests/test_import_boundaries.py before opening a PR.

Known limitations and planned improvements

Library-only tools with no CLI entry point (e.g. tools designed to be called as import foo; foo.run(...)) are not yet supported — discovery finds no entry points and generation stops. A library-to-module path that generates a CLI wrapper first is planned.
Perl wrappers and non-Python CLI tools (TrimGalore, for example) have no Python AST to analyse and tend to assess as Tier 4–5. The tool reports the tier honestly rather than producing a misleading partial module.
Domain-specific test data (.h5ad, .pkl, .mzML, large databases) is not in nf-core/test-datasets, so those input channels fall back to stub strategy. The generated test is syntactically valid but cannot test real data flow.

Open issues and the development roadmap are tracked at github.com/svigneau/code-to-module/issues.