Machine learning to identify socio-behavioural predictors of HIV positivity in East and Southern Africa
Autor: | Erol Orel, Stéphane Marchand-Maillet, Aziza Merzouki, Rachel T Esra, Olivia Keiser, Janne Estill |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
education.field_of_study
HIV Positivity business.industry Risk of infection Population 01 natural sciences Confidence interval 3. Good health law.invention 010104 statistics & probability 03 medical and health sciences 0302 clinical medicine Condom law Medicine Residence 030212 general & internal medicine 0101 mathematics Predictive variables education business Epidemic control Demography |
DOI: | 10.1101/2020.01.27.20018242 |
Popis: | BackgroundThere is a need for high yield HIV testing strategies to reach epidemic control. We aimed to predict the HIV status of individuals based on socio-behavioural characteristics.MethodsWe analysed over 3,200 variables from the most recent Demographic Health Survey from 10 countries in East and Southern Africa. We trained four machine-learning algorithms and selected the best based on the f1 score. Training and validation were done on 80% of the data. The model was tested on the remaining 20% and on a left-out country which was rotated around. The best algorithm was retrained on the variables which were most predictive. We studied two scenarios: one aiming to identify 95% of people living with HIV (PLHIV) and one aiming to identify individuals with 95% or higher probability of being HIV positive.FindingsOverall 55,151 males and 69,626 females were included. XGBoost performed best in predicting HIV with a mean f1 of 76·8% [95% confidence interval 76·0%-77·6%] for males and 78·8% [78·2%-79·4%] for females. Among the ten most predictive variables, nine were identical for both sexes: longitude, latitude and, altitude of place of residence, current age, age of most recent partner, total lifetime number of sexual partners, years lived in current place of residence, condom use during last intercourse and, wealth index. Model performance based on these variables decreased minimally. For the first scenario, 7 males and 5 females would need to be tested to identify one HIV positive person. For the second scenario, 4·2% of males and 6·2% of females would have been identified as high-risk population.InterpretationWe were able to identify PLHIV and those at high risk of infection who may be offered pre-exposure prophylaxis and/or voluntary medical male circumcision. These findings can inform the implementation of HIV prevention and testing strategies.FundingSwiss National Science Foundation. |
Databáze: | OpenAIRE |
Externí odkaz: |