minimod

Modkit consistency

Tool versions we used for comparisons are modkit 0.5.1 and minimod 0.5.0

Minimod view vs Modkit extract full

Following pair of commands using minimod v0.5.0 and modkit 0.5.1 should give the same output. Let’s assume reads.bam contains mapped and unmapped, primary, secondary and supplementary alignments.

NOTE:
test/compare_view_mkbed_mmtsv.sh modkit_extract.bed minimod_view.tsvscript takes modkit’s extract bed and minimod’s view tsv and creates four files in the out_dir.

There are similar scripts we provide for view comparison between different file formats.


When the context and modification type is unknown

minimod view -c '*' --skip-supplementary  ref.fa reads.bam > mm_view.tsv

modkit extract full --mapped-only reads.bam mk_extract.bed

test/compare_view_mkbed_mmtsv.sh mk_extract.bed mm_view.tsv out_dir

When the context is known and modification type is unknown

minimod view -c '*[A]' --skip-supplementary ref.fa reads.bam > mm_view_A.tsv

modkit extract full --motif A 0 --mapped-only --reference ref.fa reads.bam mk_extract_A.bed

test/compare_view_mkbed_mmtsv.sh mk_extract_A.bed mm_view_A.tsv out_dir
# alternative way to filter context from modkit output using awk instead of using --motif A 0 
modkit extract full --mapped-only --kmer-size 1 --reference ref.fa reads.bam mk_extract.bed
awk -F '\t' 'function c(b){b=toupper(b);return b=="A"?"T":b=="T"?"A":b=="C"?"G":b=="G"?"C":b} NR==1 || $16=="A" && ($21==16 ? toupper($16)==c($17) : toupper($16)==toupper($17))' mk_extract.bed > mk_extract_A.bed

When the context and modification type is known

minimod view -c 'a[A]' --skip-supplementary ref.fa reads.bam > mm_view_aA.tsv

modkit extract full --motif A 0 --mapped-only --reference ref.fa reads.bam mk_extract_A.bed
awk 'NR==1 || $14=="a"' mk_extract_A.bed > mk_extract_aA.bed

test/compare_view_mkbed_mmtsv.sh mk_extract_aA.bed mm_view_aA.tsv out_dir
# alternative way to filter context from modkit output using awk instead of using --motif A 0 
modkit extract full --mapped-only --kmer-size 1 --reference ref.fa reads.bam mk_extract.bed
awk 'NR==1 || $14=="a"' mk_extract.bed > mk_extract_a.bed
awk -F '\t' 'function c(b){b=toupper(b);return b=="A"?"T":b=="T"?"A":b=="C"?"G":b=="G"?"C":b} NR==1 || $16=="A" && ($21==16 ? toupper($16)==c($17) : toupper($16)==toupper($17))' mk_extract_a.bed > mk_extract_aA.bed

Minimod freq vs Modkit pileup

While the default behaviours mentioned above are applied to minimod freq and modkit pileup tools, following default behaviours can cause decrepencies between the outputs from two tools.

Following pair of commands using minimod v0.5.0 and modkit 0.5.1 should give similar outputs. Let’s assume reads.bam contains mapped and unmapped, primary, secondary and supplementary alignments.

NOTE:
compare.py file1.bed file2.bed outputs Pearson correlation coefficient of modification frequencies in two bedmethyl files.


When the context and modification type is unknown

minimod freq -b -c '*' --skip-supplementary ref.fa reads.bam > mm_freq.bed

modkit pileup --reference ref.fa reads.bam mk_pileup.bed

test/compare.py mk_pileup.bed mm_freq.bed

test/compare.py python script compute Pearson correlation coefficient between two bed files.

When the context is known and modification type is unknown

minimod freq -b -c '*[A]' --skip-supplementary ref.fa reads.bam > mm_freq_A.bed

modkit pileup --motif A 0 --reference ref.fa reads.bam mk_pileup_A.bed

test/compare.py mk_pileup_A.bed mm_freq_A.bed

When the context and modification type is known

minimod freq -b -c 'a[A]' --skip-supplementary ref.fa reads.bam > mm_freq_aA.bed

modkit pileup --motif A 0 --reference ref.fa reads.bam mk_pileup_A.bed
awk 'NR==1 || $4=="a"' mk_pileup_A.bed > mk_pileup_aA.bed # not needed if mod+basecall model is for only type a modifications (ex: m6A_DRACH)

test/compare.py mk_pileup_A.bed mm_freq_aA.bed

NOTE:
test/compare_freq_bed_bed.sh modkit_pileup.bed minimod_freq.bedscript takes modkit’s pileup bed and minimod’s pileup bed and creates four files in the out_dir.


When mod+basecalled using a 2-way classification model such as m6A_DRACH (classifies into modified as m6A or canonical A), minimod’s and modkit’s filtering methods results in same outputs if the modification threshold is matched using -m in minimod or –filter-threshold in modkit.

minimod freq --skip-supplementary -m 0.9 -b -c "a[A]" ref.fa rna_m6A_DRACH.bam > mm_freq_aA.bed

modkit pileup --filter-threshold A:0.9 --motif A 0 --reference ref.fa rna_m6A_DRACH.bam mk_pileup_aA.bed

test/compare_freq_bed_bed.sh mm_freq_aA.bed mk_pileup_aA.bed out_dir

Philosophy

Minimod is intended to be kept simple. Its main use case is for reference-based analysis.

Rationale for decisions