Guillaume Louvel

Evolutionary genomics & bioinformatics

One year of Nextflow

Automating large analyses (workflows) is the epitome of bioinformatics. I started using the workflow manager and language Nextflow a little bit more than one year ago, so I would like to make a report of my usage and opinion with it.

I had previously used Snakemake, but felt some limitations with the target-based model (see below) and when the execution graph is large. It may have improved since, but I was curious about trying that new cool kid everyone talked about, Nextflow.

This article is not a tutorial, it’s more interesting if you have some knowledge of Nextflow or other workflow managers.

Comparisons to other workflow managers

There are too many to make a comprehensive overview. However just to summarize, there are two main types of execution logic:

  1. target-based (e.g. GNU Make, Snakemake, R:::targets): you execute the workflow by indicating the end-product, from which all necessary steps that precede are deduced. This is quite predictible but can feel constrained: for example it may be difficult or impossible to code a step that produces an unknown number of outputs.
  2. data-flow (e.g. Nextflow, SciPipe): it’s more natural and flexible, as you define steps in the order they execute, not backwards as with target-based workflows.

Other key aspects of workflow languages:

  • which language they are based on:
    1. Snakemake is Python-based;
    2. Nextflow is Groovy-based (a derivative of Java).
    3. SciPipe is Go-based.
    4. Some like OpenWDL were designed as an entirely new language.
  • whether the language and the program to run it are jointly developed: OpenWDL is just a language specification, for which several runners have been developed by independent teams. This has the advantage that the language aims to be standard and implementation-agnostic, but may have less features that workflow languages backed up by full-fledged languages.

What I like about Nextflow

The data-flow paradigm and syntax

The data-flow model is awesome. Intuitive and flexible.

It is structured around two complementary concepts, the “channel” (think of a tube transmitting data), and “processes” that transform channel contents.

More specifically, Nextflow’s syntax is rather intuitive, and provides easy constructs to build workflows. It has a pipe operator, and a collection of named “operators” (more like builtin processes actually) to apply often-sought operations like joins, transposes, sorts, etc.

Although I didn’t know any Groovy or Java, it was not really blocking as the documentation is well-made, and you only occasionally need to resort to Groovy code. However I liked the possibility this offers, when pure Nextflow’s features are not enough. Groovy’s syntax is rather quick to learn as well. However, a recent decision was made to forbid almost all groovy’s syntax to make Nextflow a standalone language.

Isolated execution (temporary directories)

Each process is executed in its own separate directory, where input files are made available (by symlinks usually). In theory, this spares you from bothering about file names: files from a process cannot overwrite files from another one. You can just name them input.file and output.file and move on.

Nf-core

nf-core is a separate organization that curates a collection of standardized workflows. This helps with sharing standardized analyses and “modules” to the community, and enforcing good code quality through clear guidelines (style, metadata, unit-testing, etc).

The community

Nextflow is supported by a dynamic community. First core developers are very available for support on diverse channels (the Seqera forum, Github, a Slack space), and there are many knowledgeable people that can provide help there too. I must say I regret the move away from the forum to the Slack space, as questions on the forum are available for anyone publicly and on the long term.

There is a “Nextflow Summit” event quite regularly (every six months or so), in person and online, and I took that opportunity to participate in a hackathon, which was super interesting. I choose to work on a specific workflow new to me, and although I was a bit lost at the beginning the developers spent time explaining things so I was actually able to contribute in the end.


What I dislike about Nextflow

I think Nextflow has very solid foundations to stay a prominent choice for developing workflows. However I have come across some annoying points. Those may be subject to discussion and I have been also posting in Github issues in addition to complaining in front of my screen 😁

Cryptic error messages

This is actively being worked on with the separation of Nextflow from Groovy, so hopefully this will improve. Currently the number of layers in the stack trace make errors sometimes hard to debug.

Finicky resumability

The ability to “resume”, i.e. starting from previous half-computed runs instead of restarting from scratch is a fundamental feature of workflow managers. Nextflow can do this, but it’s rather finicky. Because the criteria for considering that a step must not be redone are very strict, and quite out of control, you often end up rerunning steps you would have rather avoided to. This causes significant loss of time and energy, and is the main reason I might actually look for another workflow manager.

This is especially annoying since there is no real “dry-run” (next criticism).

No dry-run

It’s very strange to me that a workflow manager does not have a proper dry-run. That’s the case of Nextflow. It stems in part from the data-flow model, which makes it hard to know what will be done without actually executing the step; however some middle ground could have been implemented, such as easily showing which steps will be resumed from cache, and which ones redone. Some pull-requests have been made but this does not appear to be a priority.

Breaking changes to the syntax and speed of evolution

That’s a major criticism I want to make; although who am I to judge? Anyway I started Nextflow only one year ago and I cannot run my workflow with today’s version without major changes. Backward compatibility never seemed to be a priority, which is weird for a software that should promote reproducibility. Decisions about major breaking changes seemed to have been made in a rather top-down manner.

The introduction of the “strict syntax” and separation from Groovy plays a big role in this.

Related to this new syntax, a code linter was developed, which checks for that syntax, decoupled from what actually works, leading to a strange experience of having the linter signaling errors everywhere but the code running fine; but yes one day I will update my workflow to the new syntax…

I think we are on the edge of actually seeing a quite different language emerge, but with the same name.


I learnt a lot by getting into Nextflow for my current project, and by having interesting exchanges with the community. Now, this made me think, I am gonna develop my own, obviously better workflow manager! (or not)