Reports for study 02_denovo

##################################

Software version list

################################## velvet 1.2.10 oases 1.2.10 SPAdes v.3.11.1 SoLID script denovo_preprocessor_solid_v2.2.1.pl (creation of double encoded files for SOLiD reads) BBTools bioinformatics tools, including BBMap v.37.68 program khmer (digital normalization for Desiree and Rywal velvet assemblies) FASTQ Splitter v.0.1.2 (April 24, 2014) CLC de-novo v.8.5.4, v.9.1 and v.10.1.1 ToFu cDNA cupcake pipeline v.2017 Trinity v.r2013-02-25

##################################

commands

##################################

#################################

khmer digital normalization

#################################

cleaned reads, RywalNahG example -> diginorm for reads because there are too many to be used for assembly of low k-mers

normalize-by-median.py -C 20 -M 135G -k 20 -o RywalNahG_cleaned_Illumina_SE_norm_k20_C20.fastq RywalNahG_cleaned_Illumina_SE.fastq normalize-by-median.py -p -C 20 -M 135G -k 20 -o RywalNahG_cleaned_Illumina_PE_norm_k20_C20.fastq RywalNahG_cleaned_Illumina_PE.fastq

#################################

SPAdes

################################# #test spades #completed in 22 sec python3 SPAdes-3.11.1-Linux/bin/spades.py --pe1-1 SPAdes-3.11.1-Linux/share/spades/test_dataset/ecoli_1K_1.fq.gz --pe1-2 SPAdes-3.11.1-Linux/share/spades/test_dataset/ecoli_1K_2.fq.gz -t 48 -m 500 -o spades_test

cleaned reads, RywalNahG example

Rywal-NahG dataset Illumina PE(+SE); PE are strand specific; original data and the exported CLC fastq is rf oriented!! default kmer!

python3 SPAdes-3.11.1-Linux/bin/spades.py --rna --pe1-12 ./clean_reads/RywalNahG_cleaned_Illumina_PE.fastq --pe1-s ./clean_reads/RywalNahG_cleaned_Illumina_SE.fastq --ss-rf -t 48 -m 350 -o RywalIlluminaSpades

#################################

Velvet/Oases

#################################

potato genotype leaf transcriptome assembly using velvet/oasis

RywalNahG as an example

first do the reverse complement from fastq files to get fr orientation of PE readov!

./seqtk/seqtk seq -r ./clean_reads/RywalNahG_cleaned_Illumina_PE.fastq > ./clean_reads/RywalNahG_cleaned_Illumina_PE_rc.fastq ./seqtk/seqtk seq -r ./clean_reads/RywalNahG_cleaned_Illumina_SE.fastq > ./clean_reads/RywalNahG_cleaned_Illumina_SE_rc.fastq

BBnorm error correct and normalzation

./bbmap/tadpole.sh -Xmx400g in=./clean_reads/RywalNahG_cleaned_Illumina_SE_rc.fastq out=./clean_reads/RywalNahG_cleaned_Illumina_SE_rc.tadpole.fastq mode=correct k=50 ./bbmap/bbnorm.sh -Xmx400g in=./clean_reads/RywalNahG_cleaned_Illumina_SE_rc.tadpole.fastq out=./clean_reads/RywalNahG_cleaned_Illumina_SE_rc.tadpole.BBnorm.fastq target=100 min=5 rm -rf ./clean_reads/RywalNahG_cleaned_Illumina_SE_rc.tadpole.fastq ./bbmap/tadpole.sh -Xmx400g in=./clean_reads/RywalNahG_cleaned_Illumina_PE_rc.fastq out=./clean_reads/RywalNahG_cleaned_Illumina_PE_rc.tadpole.fastq mode=correct k=50 ./bbmap/bbnorm.sh -Xmx400g in=./clean_reads/RywalNahG_cleaned_Illumina_PE_rc.tadpole.fastq out=./clean_reads/RywalNahG_cleaned_Illumina_PE_rc.tadpole.BBnorm.fastq target=100 min=5 rm -rf ./clean_reads/RywalNahG_cleaned_Illumina_PE_rc.tadpole.fastq

Velvet/Oases

./velvet/velveth RywalIlluminaVelvet 23,84,10 -fastq -shortPaired ./clean_reads/RywalNahG_cleaned_Illumina_PE_rc.tadpole.BBnorm.fastq -short ./clean_reads/RywalNahG_cleaned_Illumina_SE_rc.tadpole.BBnorm.fastq -strand_specific for((n=23; n<=83; n=n+10)); do ./velvet/velvetg RywalIlluminaVelvet_"$n" -ins_length 175 -read_trkg yes -min_contig_lgth 200 -cov_cutoff 1 -exp_cov auto; done for((n=23; n<=83; n=n+10)); do ./oases/oases RywalIlluminaVelvet_"$n" -scaffolding yes; done

SOLID ASSEMBLY

Convert the reads to double encoding with denovo_preprocessor_solid_v2.2.1.pl

due to palindrome issues for solid data the assemblies should be done with even kmers

for((n=24; n<=44; n=n+10)); do velvetg_de solid_"$n" -read_trkg yes -min_contig_lgth 200 -amos_file yes; done for((n=24; n<=44; n=n+10)); do oases solid_"$n" -scaffolding yes -amos_file yes; done