fastp: an ultra-fast all-in-one FASTQ preprocessor
Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu
2018-04-09 — bioRxiv
Submitted a month ago by dec to bioinformatics
Motivation: Quality control (QC) and preprocessing of FASTQ files are necessary steps to provide clean data for downstream analysis. Traditionally, for each operation, such as QC, adapter trimming and quality filtering, a different tool is used. These tools are usually not fast enough since they are mostly developed in high-level programming languages like Python and Java, and provide limited multi-threading support. Also, the necessity to read and load data for multiple times makes the preprocessing slow and I/O inefficient. Results: We developed fastp as an ultra-fast FASTQ preprocessor with most useful QC and data filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality cutting and lots of other operations within a single scan of the FASTQ data. It also supports unique molecular identifier (UMI) preprocessing, poly tail trimming, output splitting, and base correction for paired-end data. It can automatically detect the adapters for both single-end and paired-end FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2~5 times faster than other FASTQ preprocessing tools like Trimmomatic or Cutadapt, in spite of that fastp performs much more operations than the latter ones. Availability and Implementation: The open-source code and corresponding instructions are available at: https://github.com/OpenGene/fastp
OpenGene / fastp