OPTIMIZING

SPARSE ARRAY STORAGE

FOR GENOMICS

TECHNOLOGY

01 / FAST

Using high-level APIs provided in C++, Java*, and Spark*, users can both write and read variant records to and from GenomicsDB shared-nothing instances in parallel using multiple processes in a Single Process Multiple Data (SPMD) manner.

02 / SCALABLE

GenomicsDB uses columnar sparse arrays where samples are mapped to rows and genome positions or sites of variants are mapped to columns. These columns are partitioned in a shared-nothing fashion across thousands of machines, enabling the joint genotyping workflow in Broad Institute’s genome analyzer toolkit (GATK) to scale to 100,000 samples and beyond.

03 / EFFICIENT

GenomicsDB allows bioinformaticians to achieve analysis results with high statistical confidence. The low-level storage format enables faster and more efficient retrievals from disk compared to the use of files. Additionally, using libraries optimized for Intel® architecture to compress data on disk, GenomicsDB cumulatively achieves orders of magnitude improvement in performance compared to existing tools.

 

CHARTER

OUR STORY

GenomicsDB was initially developed by Intel in collaboration with the Broad Institute of MIT & Harvard. GenomicsDB is an open sourced library and tools with a focus on optimizing sparse array storage specifically for genomic data. It is currently being hosted and developed by the open-source community sponsored by Omics Data Automation

OUR MEMBERS

Karthik Gururaj  - Primary Contributor 

Eric Banks - Broad Institute

Christopher Denny - UCLA

Kemal Sonmez - Oregon Health & Sciences University 

Jaclyn Smith - Oxford University

Melvin Lathara - Omics Data Automation

Nalini Ganapati - Omics Data Automation

Aleks Shar - Omics Data Automation

If you would like to become a charter member please contact us. Charter members help guide the future development of GenomicsDB

OUR TECHNOLOGY

GenomicsDB is a C++ library built on top of an array based storage system for importing, querying and transforming variant data. 

Current Development and Maintenance Supported by

300x300.png
 
 
 

GET IN TOUCH

DONATE

 
LINKS
ADMIN
GITHUB

© 2018 GenomicsDB.org 

Untitled-1.png