To effectively prevent, detect, and treat health conditions that affect people during their lifecourse, healthcare professionals and researchers need to know which sections of the population are susceptible to which health conditions and at which ages. Funded by GlaxoSmithKline, researchers from the UCL Institute of Health Informatics are leading a novel project to develop a reproducible pipeline for defining and redefining human disease at scale.
For over 313 disease phenotypes, this portal contains computable phenotype definitions, disease population descriptive statistics, and phenotyping validation information. The manuscript that accompanies this portal can be found here:
Torralbo, A., Davitte, J.M., Croteau-Chonka, D.C. et al. A computational framework for defining and validating reproducible phenotyping algorithms of 313 diseases in the UK Biobank. Sci Rep 15, 24607 (2025). DOI: 10.1038/s41598-025-05838-
Data Sources
1. UK Biobank
UK (United Kingdom) Biobank is a large, population-based prospective study, established to allow detailed investigations of the genetic and nongenetic determinants of the diseases of middle and old age . It aims to combine extensive and precise assessment of exposures with comprehensive follow-up and characterisation of many different health-related outcomes, as well as to promote innovative science by maximising access to the resource. Recruitment of 500,000 participants and the collection of an unprecedented wealth of baseline data and samples were completed in 2010. Activity is now focused on further phenotyping of participants and their health outcomes and on providing access to researchers from around the world.
More information on the UK Biobank can be found here:
UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age Sudlow C, Gallacher J, Allen N, Beral V, Burton P, et al. (2015) UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLOS Medicine 12(3): e1001779. https://doi.org/10.1371/journal.pmed.1001779
2. CALIBER
CALIBER is a research platform that provides reproducible phenotyping algorithms for electronic health records. The data we used are from primary care (Clinical Practice Research Datalink), hospital admissions (Hospital Episode Statistics), socioeconomic deprivation information (using the Index of Multiple Deprivation) and cause-specific mortality data (Office of National Statistics) in England for ~15 million individuals. CALIBER enables researchers to recreate the longitudinal pathway of patients through healthcare settings and study disease onset and progression.
More information on CALIBER can be found here:
Spiros Denaxas, Arturo Gonzalez-Izquierdo, Kenan Direk, Natalie K Fitzpatrick, Ghazaleh Fatemifar, Amitava Banerjee, Richard J B Dobson, Laurence J Howe, Valerie Kuan, R Tom Lumbers, Laura Pasea, Riyaz S Patel, Anoop D Shah, Aroon D Hingorani, Cathie Sudlow, Harry Hemingway, UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER, Journal of the American Medical Informatics Association, Volume 26, Issue 12, December 2019, Pages 1545–1559, https://doi.org/10.1093/jamia/ocz105