Basic Medicine, Bioinformatics

Biomedical Information Analysis

Analysis of Whole-genome, Multi-omics, and/or Meta-genomic Data for Biomedical Science on High Performance Computer

Medical Sciences Course

  • Master / Doctoral Degree



Professor, Ph.D.

*Concurrent Position

Research Theme

  • Huge scale data analysis of human genome, multi-omics and/or metagenomics data on high performance computer
  • Analysis of whole-genome, multi-omics and/or metagenomics data for association study with disease and phenotype
  • Data processing, statistical methods and software development for high performance sequencer, SNP array and other new high-throughput technology
Research Keywords:

bioinformatics, data-driven science, high performance computing, high performance sequence, human genome

Technical Keywords:

whole genome sequencing, genome-wide association study, multi dimensional omics analysis, metagenomics, machine learning

Laboratory Introduction

Our research is aiming to make distinct contributions in both methodological and practical aspects of the present-day “Big Data” science, particurally biomedical science. We analyze diverse and heterogeneous types of genomic and biomedical data to find new knowledge and insight. Analysis are conducted utilizing supercomputing resources, informatics, and statistical approaches.
Over a decade ago, the representative human genome was sequenced relying on the efforts of many researchers worldwide and cost more than one billion dollars. Nowadays, however, personal genomes are being sequenced more easily and faster at lower cost. This situation means that novel methodological advances are absolutely required for the integration and analysis of individual genomes, omics, and biomedical information (e.g., physiological and clinical information). On the basis of this view, faculty staff specializing in bioinformatics, statistical mathematics, population genetics, and molecular evolutionary biology are actively conducting research in our laboratory. Graduate students can conduct their research in advanced computer science, massive data analysis, statistical modeling, and algorithm/software development with our laboratory members. We turn out fine students who can solve the flood of biomedical information from a data-driven science point of view.
Currently our lab has constructed the 1070 Japanese whole-genome reference panel from high-performance sequencers on our super-computer system and have partially opened them to the public ( We have also developed a SNP array optimized for the Japanese population called Japonica array.

Figure 1. Large-scale data analysis of individual genomes towards personalized healthcare

Figure 1. Large-scale data analysis of individual genomes towards personalized healthcare

Figure 2. Big data analysis with data-driven science

Figure 2. Big data analysis with data-driven science

Recent Publications

  • Yamaguchi-Kabata Y et al. iJGVD: an integrative Japanese genome variation database based on whole-genome sequencing. Human Genome Variation. 2:15050, 2015
  • Nagasaki M et al. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun. 6:8018, 2015
  • Kawai Y et al. Japonica array: improved genotype imputation by designing a population-specific SNP array with 1070 Japanese individuals. J Hum Genet. 60:581?587, 2015
  • Sato Y et al. Inter-individual differences in the oral bacteriome are greater than intra-day fluctuations in individuals. PLoS One. 10(6): e0131607, 2015
  • Kojima K et al. HapMonster: a statistically unified approach for variant calling and haplotyping based on phase-informative reads. Lecture Notes in Bioinformatics. 8542, 107-118, 2014