top of page

OPTIMIZING

SPARSE ARRAY STORAGE

FOR GENOMICS

TECHNOLOGY

01 / FAST

Using high-level APIs provided in C++, Java*, and Spark*, users can both write and read variant records to and from GenomicsDB shared-nothing instances in parallel using multiple processes in a Single Process Multiple Data (SPMD) manner.

02 / SCALABLE

GenomicsDB uses columnar sparse arrays where samples are mapped to rows and genome positions or sites of variants are mapped to columns. These columns are partitioned in a shared-nothing fashion across thousands of machines, enabling the joint genotyping workflow in Broad Institute’s genome analyzer toolkit (GATK) to scale to 100,000 samples and beyond.

03 / EFFICIENT

GenomicsDB allows bioinformaticians to achieve analysis results with high statistical confidence. The low-level storage format enables faster and more efficient retrievals from disk compared to the use of files. Additionally, using libraries optimized for Intel® architecture to compress data on disk, GenomicsDB cumulatively achieves orders of magnitude improvement in performance compared to existing tools.

Untitled-3.png
Untitled-3.png
Untitled-3.png
Product

CHARTER

Asset 8.png
OUR STORY

GenomicsDB was initially developed by Intel in collaboration with the Broad Institute of MIT & Harvard. GenomicsDB is an open sourced library and tools with a focus on optimizing sparse array storage specifically for genomic data. It is currently being hosted and developed by the open-source community sponsored by Omics Data Automation

Untitled-4.png
OUR MEMBERS

Karthik Gururaj  - Primary Contributor 

Eric Banks - Broad Institute

Christopher Denny - UCLA

Kemal Sonmez - Oregon Health & Sciences University 

Jaclyn Smith - Oxford University

Melvin Lathara - Omics Data Automation

Nalini Ganapati - Omics Data Automation

Aleks Shar - Omics Data Automation

If you would like to become a charter member please contact us. Charter members help guide the future development of GenomicsDB

Untitled-5.png
OUR TECHNOLOGY

GenomicsDB is a C++ library built on top of an array based storage system for importing, querying and transforming variant data. 

Current Development and Maintenance Supported by

300x300.png
Charter
Github
Contact

GET IN TOUCH

DONATE

Donate

Thanks! Message sent.

bottom of page