Barcode-Linked Reads Analysis

This GitHub repository describes and distributes all script used in "Efficient whole genome haplotyping and high-throughput single molecule phasing with barcode-linked reads", see figure 1 for overview. The main repo is used for pre-processing of read data and takes raw fastq files as input and outputs either (1) fastq files for metagenomic de novo analysis, (2) fastq files for Human Genome haplotyping or (3) bam files ready custom variant calling and phasing analysis.

To run subsequent analysis to the (1) output and get metagenomic assemblies look at BLR metagenomics. Processing from (2) for for Human Genome Haplotyping or human reference-free assembly please consider the wfa2tenx GitHub.

BLR Analysis is now also available at OMICtools.

Dependencies

Here follows a list with links to all bioinformats software needed to use of this part of the pipeline.

It will also be required to have downloaded Picard Tools and a Bowtie2 reference genome (e.g. GRCh38), available at e.g. Illumina iGenomes. Lastly to utilize all aspects of the pipeline some GNU software are also needed.

pigz
mail

Setup

First, download this GitHub repository by writing the cloning command in your terminal.

git clone https://github.com/FrickTobias/BLR.git

Then provide BLR_Analysis with the appropriate paths for Picard Tools and your Bowtie2 reference data (consult example folder for further details).

bash setpath.sh </path/to/picardtools/jarfile> </path/to/bowtie2_reference/fastafile>

Useage

For all available options, see -h (--help) and for more details consult the step-by-step folder which describes all steps performed by BLR_automation. For examples and analysis file contents, see the example folder.

(1) de novo Metagenomics

First trim read sequences and extract barcode sequences with BLR_automation and cluster barcode sequences (stop analysis at second step using -e, --end 2).

bash BLR_automation.sh -e 2 -r -m <john.doe@myworkplace.com> -p <processors> <read_1.fq> <read_2.fq> <output>

Following this, run athena_assembly.sh provided in the BLR_metagenomics GitHub repository.

(2) Human Haplotyping and Assembly

Start by running the complete pre-processing pipeline with the fastq generation option -f (--fastq).

bash BLR_automation.sh -f -r -m <john.doe@myworkplace.com -p <processors> <read_1.fq> <read_2.fq> <output>

Continue by converting filtered fastq files to Long Ranger/Supernova input format using wfa2tenx and run the appropriate pipeline.

(3) Custom Phasing Analysis

Run the preprocessing pipeline using default settings.

bash BLR_automation.sh -r -m <john.doe@myworkplace.com -p <processors> <read_1.fq> <read_2.fq> <output>

Use the .rmdup.x2.filt.bam files for further analysis.

Overview

Figure 1: BLR data analysis overview. (a) Reads are trimmed for their first handle using cutadapt followed by extraction fo the barcode sequence to a separate fasta files. Reads continue to be trimmed for another handle sequence just before the insert sequences and lastly reads are stripped of any traces of reverse complements of handle sequences from their 3' end. (b) Barcodes are split into several files files depending on their first three bases and clustered independently using CD-HIT-454. These are then combined into a summary file, NNN.clstr. (c) Trimmed reads are assembled into an initial assembly with IDBA-UD which is then used as reference for mapping the origin trimmed reads to with BWA which also incorporates the clustered barcode sequences into the resulting bam file. The bamfile is used to assemble the original read data, using the spacially devided (mapping positions) barcode information. The resulting assembly is then processed by ARCS and put into LINKS to yield the final scaffolds. (d) Trimmed reads are mapped with Bowtie2 converted to bam files. This bam file is tagged with barcode information by tag_bam.py. Picardtools is used to remove PCR and optical duplicates and is then used again to mark duplicate positions where reads have different barcodes. The marked bam file is filtered for barcode duplicates using cluster_rmdup.py and subsequently filtered for clusters with large amounts of molecules by filter_clusters.py. This bam file has its reads converted to fastq files and converted according to input format specifications of Long Ranger and Supernova by wfa2tenx.py.

Name		Name	Last commit message	Last commit date
Latest commit History 377 Commits
example		example
figures		figures
python scripts		python scripts
step-by-step		step-by-step
.gitignore		.gitignore
BLR_automation.sh		BLR_automation.sh
README.md		README.md
setpath.sh		setpath.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Barcode-Linked Reads Analysis

Dependencies

Setup

Useage

(1) de novo Metagenomics

(2) Human Haplotyping and Assembly

(3) Custom Phasing Analysis

Overview

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Barcode-Linked Reads Analysis

Dependencies

Setup

Useage

(1) de novo Metagenomics

(2) Human Haplotyping and Assembly

(3) Custom Phasing Analysis

Overview

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages