Known Limitations

Tools that assess as Tier 4–5

Perl wrappers, tools with no detectable CLI structure, and tools that require large proprietary databases (BLAST, Kraken2) typically assess as Tier 4 or 5. The tool reports the tier honestly and generates what it can; Tier 5 tools require manual module authoring.

TrimGalore is a representative example: it is a Perl wrapper around cutadapt and FastQC with no Python AST to analyse. It correctly assesses as Tier 5 and the tool stops with an explanation rather than producing a misleading partial module. For these tools, use the assessment output to understand what the tool found, then author the module manually using nf-core modules create as a starting point.

Library-only tools (no CLI entry point)

Tools like decoupler and liana-py expose a Python API but no command-line interface. These are not yet supported. code-to-module works by analysing CLI structure — console_scripts entry points, argparse/Click argument parsers, shell case statements — and cannot generate a meaningful module for code that is designed to be called as import foo; foo.run(...) rather than foo --input file --output dir.

Library-to-module support is planned — see the architecture doc for the proposed architecture. The intended approach is to generate a thin CLI wrapper script first, then feed that into the standard conversion pipeline.

Domain-specific test data

Formats not in nf-core/test-datasets (h5ad, pkl, mzML, Visium directories) fall back to stub mode. Real test data must be added manually and PRed to nf-core/test-datasets before submission.

Celltypist is a representative example: its inputs (.pkl model files and .h5ad or .csv count matrices) are not in nf-core/test-datasets. All input channels in tests/main.nf.test receive stub strategy, which produces a syntactically valid test that passes lint and confirms channel wiring — but it cannot test real data flow. The TODO comments in the generated test file describe exactly what data would be needed.

LLM non-determinism

Running convert twice on the same tool may produce slightly different channel names, output globs, or script arguments. This is expected behaviour — always review the generated module before submitting. The post-processing guards in infer.py enforce structural invariants (meta as first input, no duplicate versions emit, no TODO placeholders) but do not guarantee identical output across runs.

If two runs produce significantly different modules, the tool-specific context is ambiguous and both outputs deserve manual review. Passing --docs with the tool's documentation URL typically reduces variation by giving the LLM more signal to work with.

Planned improvements

The following are known future directions. Contributions are welcome — see GitHub Issues for open items.

Library-to-module: generate a CLI wrapper for library-only Python tools, then pipe the result through the standard conversion pipeline.
Strategy 2c (test data from tool docs): if tool documentation links to example data files of appropriate size, use them directly rather than deriving from nf-core/test-datasets.
Snakemake-to-Nextflow: convert Snakemake rules to nf-core module format, using the rule's input/output blocks to infer channel structure instead of CLI analysis.