top of page

OPTIMIZING
SPARSE ARRAY STORAGE
FOR GENOMICS

TECHNOLOGY

01 / FAST

Using high-level APIs provided in C++, Java*, and Spark*, users can both write and read variant records to and from GenomicsDB shared-nothing instances in parallel using multiple processes in a Single Process Multiple Data (SPMD) manner.

02 / SCALABLE

GenomicsDB uses columnar sparse arrays where samples are mapped to rows and genome positions or sites of variants are mapped to columns. These columns are partitioned in a shared-nothing fashion across thousands of machines, enabling the joint genotyping workflow in Broad Institute’s genome analyzer toolkit (GATK) to scale to 100,000 samples and beyond.

03 / EFFICIENT

GenomicsDB allows bioinformaticians to achieve analysis results with high statistical confidence. The low-level storage format enables faster and more efficient retrievals from disk compared to the use of files. Additionally, using libraries optimized for Intel® architecture to compress data on disk, GenomicsDB cumulatively achieves orders of magnitude improvement in performance compared to existing tools.

Untitled-3.png
Untitled-3.png
Untitled-3.png
Product

CHARTER

Asset 8_edited.png
OUR STORY

GenomicsDB was initially developed by Intel in collaboration with the Broad Institute of MIT & Harvard. GenomicsDB is an open sourced library and tools with a focus on optimizing sparse array storage specifically for genomic data. It is currently being hosted and developed by the open-source community sponsored by dātma Health Science. 

Untitled-4_edited.png
OUR MEMBERS

Karthik Gururaj  - Primary Contributor 

Eric Banks - Broad Institute

Christopher Denny - UCLA

Kemal Sonmez - Oregon Health & Sciences University 

Jaclyn Smith - Oxford University

Melvin Lathara - dātma

Nalini Ganapati - dātma

Aleks Shar - dātma

If you would like to become a charter member please contact us. Charter members help guide the future development of GenomicsDB

Untitled-5_edited.png
OUR TECHNOLOGY

GenomicsDB is a C++ library built on top of an array based storage system for importing, querying and transforming variant data. 

Current Development and Maintenance Supported by

datma-logo-TM-color-RGB.png
Charter
Contact

GET IN TOUCH

Thanks! Message sent.

DONATE

Donate
bottom of page