Open Data Science & AI/ML

Welcome to the AIM-AHEAD Open Data Science & AI/ML Course


About this Course

This is an asynchronous, self-directed course for beginner- and intermediate-level learners. The main presentations and Jupyter Notebooks in this course were adopted from IBM's OpenDS4All repository. OpenDS4All is an open-source data science curriculum that provides educators with access to the tools and materials needed to establish data science courses at institutions across the world.

The perspective of the materials largely comes from computer science, with an emphasis on data wrangling and engineering as well as machine learning and validation. Prior versions of the content have been used to teach students ranging from freshmen to PhD students, across a wide range of fields. The emphasis is largely on core concepts and algorithms with grounding in today's technologies and best practices. 

Course Structure
This course contains six modules (units): 

  1. Introduction & Data Science Ethics
  2. Building AI Trust
  3. Intermediate Data Science
  4. Unsupervised Machine Learning
  5. Supervised Machine Learning
  6. Data Science and Healthcare Use Cases 

Each unit contains several lessons in both video and interactive presentation formats, as well as auto-graded quizzes to check your understanding. We encourage course participants to engage in discussion around course content using the Discussion Board. Participants may work through the course units in any order. Those who complete all modules will receive a Certificate of Completion from AIM-AHEAD.

This is a beginner/intermediate level course. There are no official pre-requisites, however, students will benefit from:

  • Some familiarity with programming in Python (writing small functions, importing and calling library functions, using Python data structures)
  • Familiarity with probability theory and very basic statistical notions

For a comprehensive collection of data science teaching and learning resources, including resources for brushing up on essential arithmetic and algebraic skills, be sure to check out the Data Science Training Core Portal in the AIM-AHEAD Connect course catalog.  

At the end of this program, we ask that participants complete the AIM-AHEAD Connect course evaluation.


We would like to acknowledge and express our gratitude to the the Northeast Big Data Innovation Hub, the National Student Data Corps (NSDC), and students from Columbia University for their valuable contributions to the creation of content made available in this course. Special thanks to Emily Rothenberg for her course design and development work as NSDC Program Coordinator. 


Card image cap
Tomislav Galjanic
M.S. in Data Science, Columbia University
Card image cap
Yucen Wang
M.A. Statistics student, Columbia University
Card image cap
Renyin Zhang
M.A. Statistics student, Columbia University
Card image cap
Abhishek Sinha
M.S. in Data Science, Columbia University
Card image cap
Rahul Singh
Northeast Big Data Innovation Hub
Card image cap
Varalika Mahajan
M.S. in Data Science, Columbia University

Instrumental Persons

We would like to recognize and express our gratitude to the following people who shared in the creation and implementation of this course:

  • Omar Aljawfi, PhD, MedStar Health Research Institute
  • Florence Hudson, PhD, Columbia University
  • Sara Stienecker, MedStar Health Research Institute
  • Emily Rothenberg, Columbia University 
  • Toufeeq Ahmed, PhD, Communications Hub, AIM-AHEAD
  • Aidan Hoyal, MSIS, MA, Communications Hub, AIM-AHEAD
  • Zainab Latif, MCS, Communications Hub, AIM-AHEAD

OpenDS4All materials licensed via Creative Commons: 
Attribution-NonCommercial-ShareAlike 4.0 International