Skip to content

raw2one-hot

The container uses bwa-mem2 mem to align raw reads to the M tuberculosis H37Rv reference genome (ASM19595v2) and afterwards extracts one-hot-encoded consensus sequences of a list of target loci. The start and end coordinates of the target sequences are read from a CSV file which is required and must have the header line locus,start,end. The sequences are concatenated without gaps.

Usage

FASTQ files with forward and reverse M tuberculosis reads and a CSV file specifying the target loci are required as input. The container uses /data as working directory and will create the output file there.

docker run -v $PWD:/data \
    julibeg/tb-ml-one-hot-encoded-seqs-from-raw-reads:v0.2.0 \
    -r target_loci.csv \
    -o one_hot_seqs.csv \
    my-sample_1.fastq.gz \
    my-sample_2.fastq.gz