Instructors: Carly Bobak, PhD, and Christian Darabos, PhD
2-Week Course for Summer Scholars 2024
Dates Available:
Session 1 - June 30 - July 12 2024,
Session 3 - July 28 - August 9 2024
OVERVIEW
"The world cannot be understood without numbers. But the world cannot be understood with numbers alone." - Hans Rosling, Factfulness (2018)
Data Science, a multidisciplinary field blending data inference, algorithm development, and technology, stands at the forefront of transforming raw data into meaningful insights and innovations. At its core, data science involves extracting knowledge and insights from structured and unstructured data using methods rooted in statistics, machine learning, and data analysis. This field is pivotal in today's information age, driving decision-making in industries ranging from healthcare to finance and influencing societal advancements at an unprecedented scale.
This two-week Data Science Bootcamp is designed to introduce high school students to this critical and burgeoning field. Our program emphasizes the dual importance of quantitative analysis and qualitative interpretation in understanding and leveraging data. Beginning with Python programming fundamentals, a cornerstone in the data science toolkit, the curriculum advances through vital concepts such as data structures, manipulation, and exploratory data analysis (EDA). A special focus on Natural Language Processing (NLP) underscores the interdisciplinary nature of data science, integrating computational methods with linguistic insights.
The bootcamp's structure intertwines theoretical understanding with practical application, ensuring that students learn the mechanics of data science and develop a critical, data-driven mindset. From engaging in hands-on projects to delving into real-world datasets, participants will acquire the skills to convert data into compelling stories and actionable intelligence. This program is not just a technical journey; it's a gateway into the expansive world of data science, where machine learning, artificial intelligence, and big data are pivotal tools in shaping our future.
In this immersive Data Science Bootcamp, students will embark on an enriching educational journey marked by a blend of individual challenges and collaborative exploration.
Real-World Application: Each day, students will engage in meticulously designed exercises that reflect real-world research scenarios. These activities are crafted to reinforce core data science concepts and demonstrate their practical application in various fields.
Core Concept Mastery: From understanding the nuances of Python programming to grasping complex data structures, students will systematically master the fundamental pillars of data science. This structured approach ensures a comprehensive grasp of essential topics.
Guided Group Project: Participants will have the unique opportunity to contribute to a group data science project. This project serves as a capstone experience, allowing students to apply their learning in a collaborative, realistic setting.
Expert Instruction: Leading the sessions are doctoral data scientists who bring a wealth of knowledge and real-world experience. Their hands-on instruction is not just about imparting technical know-how; it's about mentoring future data scientists in the art and science of extracting meaning from data.
Collaborative Learning Environment: The bootcamp fosters a supportive and interactive learning environment. Students will learn not only from experts but also from each other through group discussions, project collaborations, and peer-to-peer interactions.
This blend of theoretical learning, practical exercises, and expert guidance is designed to provide students with a holistic understanding of data science and its impactful applications in the modern world.
LEARNING OUTCOMES
Upon completing this course, students will:
- Proficiency in Python for Data Science: Students will gain hands-on experience in Python, focusing on its application in data science. This includes understanding data structures, libraries like Pandas and NumPy, and utilizing Python for data manipulation and analysis.
- Fundamentals of Data Analysis and Visualization: Participants will learn to perform exploratory data analysis (EDA), interpret data through statistical methods, and create meaningful visualizations using tools like Matplotlib and Seaborn. This outcome ensures students can derive insights from data and effectively communicate them.
- Introduction to Natural Language Processing (NLP): Students will be introduced to the basics of NLP, learning to process and analyze text data. Skills acquired will include text manipulation, sentiment analysis, and creating visual representations like word clouds.
- Execution of Data Science: Throughout this course, students will consistently work towards exercises on a dataset that align with each day’s topic. These exercises serve as a daily treasure hunt revealing more and more about the data set as each day passes. In totality, all exercises serve as a project where students will have the opportunity to reflect on their discoveries at the end of the course.
- Critical Thinking and Problem-Solving in Data Science: Participants will develop critical thinking skills specific to data science, learning to approach problems analytically, question assumptions, and interpret results within context. This outcome is essential for applying data science skills in real-world scenarios.
PREREQUISITES
- High school-level STEM knowledge, especially mathematics
- Proficient English communication skills, both written and verbal
- A large-screen WIFI capable device with a full keyboard such as:
- a laptop computer (Windows PC, Mac, or Linux)
- a large-screen tablet with an external keyboard and mouse/trackpad
Biographies
Carly Bobak, PhD, serves as a Biomedical Informatics Scientist within the Research Computing and Data Services team at Dartmouth College's Information, Technology, and Consulting department. With her PhD in Quantitative Biomedical Sciences from Dartmouth, Dr. Bobak plays a pivotal role in bridging data science with biomedical research. Her responsibilities include collaborating with faculty across various disciplines, addressing their data science needs, and leading workshops in areas like basic programming, data visualization, and generalized AI.
In addition to her role in research computing, Dr. Bobak contributes to academia as an instructor for Dartmouth's QBS program, teaching courses such as Data Wrangling and previously Foundations of Data Science. Her active research interests match her expertise and passion for teaching in developing diagnostic biomarkers for tuberculosis through innovative multi-cohort, multi-omics approaches. This blend of educational, research, and practical application positions Dr. Bobak as a key figure in advancing data science within Dartmouth's academic and research communities.
Christian Darabos, Ph.D., is the Senior Director for Research Computing and Data Services at Dartmouth College. He holds a double PhD degree in Business Information Systems and Molecular Biotechnologies from Switzerland and Italy. Christian leads a team of over 20+ IT professionals, software engineers, domain experts (STEM, GenAI/AI/ML, Data Science, GIS), and facilitators who provide comprehensive support and services covering the campus' research community computational needs. This includes high-performance and cloud computing, data storage, grant support, software licensing, security and privacy compliance for sensitive data computing and storage, and custom software solutions developments.
Christian also serves as a faculty lecturer in the Quantitative Biomedical Sciences program at the Geisel School of Medicine, where he teaches a graduate-level course and co-leads computational seminar series on Reproducible Research. His current research interests include data science, biomedical informatics, machine learning, generative AI, reproducible research best practices, and automation.