Korean Genome Project Data May Be Useful for Cancer, Other Disease Studies

Researchers in South Korea, the US, and the UK have released an initial set of data from
the Korean Genome Project (Korea1K), including Korean-specific genome variation patterns,
which they said can be a useful resource for clinical and ethnogenetic studies
May 28, 2020

The first phase of Korea1K includes 1,094 whole genomes, sequenced at an average depth of 31x, paired with data on 79 quantitative clinical traits, the researchers of Korean Genomics Center (KOGIC) reported in a study published on Wednesday in Science Advances. They identified 39 million single nucleotide variants and indels, of which half were singletons or doubletons, meaning they are extremely rare.

Approximately half of the variants they identified were classified as singletons or doubletons. Surprisingly, more than 70 percent of them had not been previously reported in dbSNP, and less than 20 percent of the variants were classified as very common. Regarding indels, the researchers observed more deletions than insertions, possibly resulting from skewed variant calling.

“Also, Korea1K, as a reference, showed better imputation accuracy for Koreans than the [1000 Genomes (1KGP)] panel,” the authors added. “As proof of utility, germline variants in cancer samples could be filtered out more effectively when the Korea1K variome was used as a panel of normals compared to non-Korean variome sets.”

Of the 1,094 Korean genomes in the dataset, 1,007 genomes were newly generated, the researchers said, and they combined these data with systematically acquired clinical and biochemical measurements from the blood and urine of the participants. They characterized SNVs, indels, copy number variations, transposable element (TE) insertion, and human leukocyte antigen (HLA) type in the Korean population and contrasted the Korean data with similar data from other populations.