VADSTI Data Science Training Program

Data Science Approaches to Better Understand Clinical and Genomic Informatics

This free online program presents the recorded live virtual lectures and discussions, as well as the assignments and supplementary resources made available for participants in Howard University's Virtual Applied Data Science Training Institute (VADSTI) Fall Training Series, which ran from September 1, 2022 through October 28, 2022. 

About this Program

Advancements in technology, and computational tools have made it possible to generate and store large datasets in many disciplines including public health, clinical and biomedical, genomics and business and economics. There is increased demand for utilization of data analytics capabilities to look at trends, predict outcomes, and to make better clinical, health policy, and economic decisions. Skills in data science are particularly critical for advancing the science of minority health and health disparities.

To address this need, the Howard University Research Centers in Minority Institutions (RCMI) Program, with funding from NIH, offered the VADSTI 2022 Fall Training Series. The program aims to enhance data science capability and application by providing training in the foundations of programming and the critical data analytic skills for planning and conducting research that involves big data and pertinent to minority issues. This free program covers topics including, Foundations of Data Science, Introduction to Python, Statistical Concepts, Data Exploration and Visualization and Predictive Analytics. 

Program Objectives & Competencies

The primary objective of this program is to provide training in the foundations of data science and advance analytic skills and introduce tools for clinical and genomic research. Participants will:

  • Be introduced to the principles of data science
  • Be introduced to Python programming skills
  • Gain practical, hands-on experience with Python and related libraries for accessing data from multiple sources
  • Gain insights in use predictive analytic
  • Learn about the underlying concepts of probability and statistics for data analytics
  • Be introduced to advanced statistical analytic techniques utilized in biomedical, clinical, and genomic research
  • Understand the concepts of data partitioning and practice behind supervised and unsupervised learning
  • Be introduced to advanced algorithmic techniques including machine learning and deep learning

Program Pre-requisites

There are no prerequisites for research knowledge topics.

A basic undergraduate knowledge of algebra and probability is recommended for content knowledge topics.

Program Structure

The program has nine course modules. Each course module takes approximately six hours to complete.  The first three course modules are foundational, intended for beginners. The remaining six course modules dive deeper into more intermediate and advanced data science skills. The course modules are:

  • Foundations of Data Science
  • Introduction to Python
  • Probability, Statistical Inference, and Regression Models
  • Data Preparation, Exploration, and Visualization 
  • Classification Models
  • Data Clustering and Dimensionality Reduction
  • Seminal Presentations on Current Research Topics
  • Machine Learning and Predictive Models I
  • Machine Learning and Predictive Models II

All program files, including the datasets used in practice activities, are collected in the public repository on GitHub at

Certificate of Completion

Participants who complete all program modules on AIM-AHEAD Connect will receive a Certificate of Completion from AIM-AHEAD.


At the end of this program, we ask that participants complete the standard course evaluation for AIM-AHEAD Connect.



VADSTI's Fall Training Series was supported (in part) by the National Institute on Minority Health and Health Disparities of the National Institutes of Health under Award Number 2U54MD007597. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Instrumental Persons

We would like to recognize and express our deep gratitude for the contributions of the following people who shared in the creation and implementation of the VADSTI 2022 Fall Training Series on AIM-AHEAD Connect:

John Kwagyan, PhD, Howard University
Legand Burge, PhD, Howard University, Data Science Training Core, AIM-AHEAD
Stacey McRae, BS, Howard University
Prem Saggar, PhD, University of Maryland, College Park
Moussa Doumbia, PhD, Howard University
Paul Kolm, PhD, MedStar Health Research Institute
Ebelechukwu Nwafor, PhD, Villanova University
Martin Skarzynski, PhD, Amazon Web Services
Rochelle Tractenberg, PhD, Georgetown University
Maria J. Molina, PhD, University of Maryland, College Park
Johanna Hardin, PhD, Pomona College
Zhe Fei, PhD, University of California, Riverside
Toufeeq Ahmed, PhD, Communications Hub, AIM-AHEAD
Aidan Hoyal, MSIS, MA, Communications Hub, AIM-AHEAD
Zainab Latif, MCS, Communications Hub, AIM-AHEAD