Popis: |
In 2010, the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) started the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-be (nuMoM2b), a prospective cohort study of a racially/ethnically/geographically diverse population of nulliparous women with singleton gestation. The nuMoM2b is a very large dataset, consisting of data for 10,038 patients with over 4,600 features per patient, spread out over 80 files. In this report, we share our experience preparing and working with this dataset. We present our data preprocessing of the nuMoM2b dataset to get a deeper understanding of the data, extract the most relevant features, make the fewest assumptions when filling in unknown values, and reducing the dimensionality of the data. We hope this report is useful to researchers interested in building machine learning and statistical models from the nuMoM2b dataset. |