Towards practical and robust DNA-based data archiving using the yin-yang codec system.

Autor: Ping Z; BGI-Shenzhen, Shenzhen, China.; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China.; George Church Institute of Regenesis, BGI-Shenzhen, Shenzhen, China.; Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China., Chen S; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China.; George Church Institute of Regenesis, BGI-Shenzhen, Shenzhen, China.; China National GeneBank, BGI-Shenzhen, Shenzhen, China., Zhou G; Department of Genetics, Harvard Medical School, Boston, MA, USA.; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA., Huang X; BGI-Shenzhen, Shenzhen, China., Zhu SJ; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK., Zhang H; BGI-Shenzhen, Shenzhen, China.; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China.; George Church Institute of Regenesis, BGI-Shenzhen, Shenzhen, China.; Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China., Lee HH; Department of Genetics, Harvard Medical School, Boston, MA, USA., Lan Z; School of Mathematical Science, Capital Normal University, Beijing, China., Cui J; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China.; George Church Institute of Regenesis, BGI-Shenzhen, Shenzhen, China.; China National GeneBank, BGI-Shenzhen, Shenzhen, China., Chen T; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China.; George Church Institute of Regenesis, BGI-Shenzhen, Shenzhen, China.; China National GeneBank, BGI-Shenzhen, Shenzhen, China., Zhang W; BGI-Shenzhen, Shenzhen, China.; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China., Yang H; BGI-Shenzhen, Shenzhen, China.; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China.; George Church Institute of Regenesis, BGI-Shenzhen, Shenzhen, China., Xu X; BGI-Shenzhen, Shenzhen, China. xuxun@genomics.cn.; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China. xuxun@genomics.cn.; Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. xuxun@genomics.cn.; China National GeneBank, BGI-Shenzhen, Shenzhen, China. xuxun@genomics.cn., Church GM; George Church Institute of Regenesis, BGI-Shenzhen, Shenzhen, China. gchurch@genetics.med.harvard.edu.; Department of Genetics, Harvard Medical School, Boston, MA, USA. gchurch@genetics.med.harvard.edu.; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA. gchurch@genetics.med.harvard.edu., Shen Y; BGI-Shenzhen, Shenzhen, China. shenyue@genomics.cn.; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China. shenyue@genomics.cn.; George Church Institute of Regenesis, BGI-Shenzhen, Shenzhen, China. shenyue@genomics.cn.; Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. shenyue@genomics.cn.
Jazyk: angličtina
Zdroj: Nature computational science [Nat Comput Sci] 2022 Apr; Vol. 2 (4), pp. 234-242. Date of Electronic Publication: 2022 Apr 25.
DOI: 10.1038/s43588-022-00231-2
Abstrakt: DNA is a promising data storage medium due to its remarkable durability and space-efficient storage. Early bit-to-base transcoding schemes have primarily pursued information density, at the expense of introducing biocompatibility challenges or decoding failure. Here we propose a robust transcoding algorithm named the yin-yang codec, using two rules to encode two binary bits into one nucleotide, to generate DNA sequences that are highly compatible with synthesis and sequencing technologies. We encoded two representative file formats and stored them in vitro as 200 nt oligo pools and in vivo as a ~54 kbps DNA fragment in yeast cells. Sequencing results show that the yin-yang codec exhibits high robustness and reliability for a wide variety of data types, with an average recovery rate of 99.9% above 10 4 molecule copies and an achieved recovery rate of 87.53% at ≤10 2 copies. Additionally, the in vivo storage demonstration achieved an experimentally measured physical density close to the theoretical maximum.
(© 2022. The Author(s).)
Databáze: MEDLINE