Skip to content

AfshinLab/BLR

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

377 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Barcode-Linked Reads Analysis

This GitHub repository describes and distributes all script used in "Efficient whole genome haplotyping and high-throughput single molecule phasing with barcode-linked reads", see figure 1 for overview. The main repo is used for pre-processing of read data and takes raw fastq files as input and outputs either (1) fastq files for metagenomic de novo analysis, (2) fastq files for Human Genome haplotyping or (3) bam files ready custom variant calling and phasing analysis.

To run subsequent analysis to the (1) output and get metagenomic assemblies look at BLR metagenomics. Processing from (2) for for Human Genome Haplotyping or human reference-free assembly please consider the wfa2tenx GitHub.

BLR Analysis is now also available at OMICtools.

Dependencies

Here follows a list with links to all bioinformats software needed to use of this part of the pipeline.

It will also be required to have downloaded Picard Tools and a Bowtie2 reference genome (e.g. GRCh38), available at e.g. Illumina iGenomes. Lastly to utilize all aspects of the pipeline some GNU software are also needed.

Setup

First, download this GitHub repository by writing the cloning command in your terminal.

git clone https://github.com/FrickTobias/BLR.git

Then provide BLR_Analysis with the appropriate paths for Picard Tools and your Bowtie2 reference data (consult example folder for further details).

bash setpath.sh </path/to/picardtools/jarfile> </path/to/bowtie2_reference/fastafile>

Useage

For all available options, see -h (--help) and for more details consult the step-by-step folder which describes all steps performed by BLR_automation. For examples and analysis file contents, see the example folder.

(1) de novo Metagenomics

First trim read sequences and extract barcode sequences with BLR_automation and cluster barcode sequences (stop analysis at second step using -e, --end 2).

bash BLR_automation.sh -e 2 -r -m <john.doe@myworkplace.com> -p <processors> <read_1.fq> <read_2.fq> <output>

Following this, run athena_assembly.sh provided in the BLR_metagenomics GitHub repository.

(2) Human Haplotyping and Assembly

Start by running the complete pre-processing pipeline with the fastq generation option -f (--fastq).

bash BLR_automation.sh -f -r -m <john.doe@myworkplace.com -p <processors> <read_1.fq> <read_2.fq> <output> 

Continue by converting filtered fastq files to Long Ranger/Supernova input format using wfa2tenx and run the appropriate pipeline.

(3) Custom Phasing Analysis

Run the preprocessing pipeline using default settings.

bash BLR_automation.sh -r -m <john.doe@myworkplace.com -p <processors> <read_1.fq> <read_2.fq> <output> 

Use the .rmdup.x2.filt.bam files for further analysis.

Overview

drawing

Figure 1: BLR data analysis overview. (a) Reads are trimmed for their first handle using cutadapt followed by extraction fo the barcode sequence to a separate fasta files. Reads continue to be trimmed for another handle sequence just before the insert sequences and lastly reads are stripped of any traces of reverse complements of handle sequences from their 3' end. (b) Barcodes are split into several files files depending on their first three bases and clustered independently using CD-HIT-454. These are then combined into a summary file, NNN.clstr. (c) Trimmed reads are assembled into an initial assembly with IDBA-UD which is then used as reference for mapping the origin trimmed reads to with BWA which also incorporates the clustered barcode sequences into the resulting bam file. The bamfile is used to assemble the original read data, using the spacially devided (mapping positions) barcode information. The resulting assembly is then processed by ARCS and put into LINKS to yield the final scaffolds. (d) Trimmed reads are mapped with Bowtie2 converted to bam files. This bam file is tagged with barcode information by tag_bam.py. Picardtools is used to remove PCR and optical duplicates and is then used again to mark duplicate positions where reads have different barcodes. The marked bam file is filtered for barcode duplicates using cluster_rmdup.py and subsequently filtered for clusters with large amounts of molecules by filter_clusters.py. This bam file has its reads converted to fastq files and converted according to input format specifications of Long Ranger and Supernova by wfa2tenx.py.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors