PHYS 7332 - Network Science Data II - Fall 2024

Wednesdays & Thursdays: 11:00am – 12:45pm
September 5 – December 11, 2024
177 Huntington, room 207

Summary

This course offers an introduction to network analysis and is designed to provide students with an overview of the core data scientific skills required to analyze complex networks. Through hands-on lectures, labs, and projects, students will learn actionable skills about network analysis techniques using Python (in particular, the networkx library). The course network data collection, data input/output, network statistics, dynamics, and visualization. Students also learn about random graph models and algorithms for computing network properties like path lengths, clustering, degree distributions, and community structure. In addition, students will develop web scraping skills and will be introduced to the vast landscape of software tools for analyzing complex networks. The course ends with a large-scale final project that demonstrates the proficiency of the students in network analysis.

Course book: https://brennanklein.com/network-science-data-textbook

Course Learning Outcomes

  1. Proficiency in Python and networkx for network analysis.

  2. Strong foundation of complex network algorithms and their applications.

  3. Skills in statistical description of networks.

  4. Experience in collecting and analyzing online data.

  5. Broad knowledge of various network libraries and tools.

Course Materials

We will rely on an ongoing Jupyter Book to base our class on, which can be found at: https://asmithh.github.io/network-science-data-book/. This book is the result of years of effort by Matteo Chinazzi and Qian Zhang, who taught previous versions of this course to Network Science PhD students. In 2024, we reimagined the course, while keeping a substantial amount of their original material.

Besides that, there are no required materials for this course, but we will periodically draw from:

Additionally, we recommend engagement with other useful network science and/or Python materials:

Coursework, Class Structure, Grading

This is a twice-weekly hands-on class that emphasizes building experience with coding. This does not necessarily mean every second of every class will be live-coding, but it will inevitably come up in how the class is taught. We are often on the lookout for improving the pedagogical approach to this material, and we would welcome feedback on class structure. We will try to incorporate a 5-minute break for stretching, Q&A, grabbing water, etc. within each class. The course will be co-taught, featuring lectures from the core instructors as well as outside experts. Grading in this course will be as follows:

  • Class Attendance & Participation: 10%

  • Problem Sets: 45%

  • Mid-Semester Project Presentation: 15%

  • Final Project — Presentation & Report: 30%

Final Project

The final project for this course is a chance for students to synthesize their knowledge of network analysis into pedagogical materials around a topic of their choosing. Modeled after chapters in the Jupyter book for this course, students will be required to make a new “chapter” for our class’s textbook; this requires creating a thoroughly documented, informative Python notebook that explains an advanced topic that was not deeply explored in the course. For these projects, students are required to conduct their own research into the background of the technique, the original paper(s) introducing the topic, and how/if it is currently used in today’s network analysis literature. Students will demonstrate that they have mastered this technique by using informative data for illustrating the usefulness of the topic they’ve chosen. Every chapter should contain informative data visualizations that build on one another, section-by-section. The purpose of this assignment is to demonstrate the coding skills gained in this course, doing so by learning a new network analysis technique and sharing it with members of the class. Over time, these lessons may find their way into the curriculum for future iterations of this class. Halfway through the semester, there will be project update presentations where students receive class and instructor feedback on their project topics. Throughout, we will be available to brainstorm students’ ideas for project topics.

Ideas for Final Project Chapters (non-exhaustive):

  1. Graph Embedding (or other ML technique)

  2. Network Reconstruction

  3. Link Prediction

  4. Graph Distances

  5. Motifs in Networks

  6. Network Null Models (advanced)

  7. Network Sparsification

  8. Spectral Properties of Networks (advanced)

  9. Mechanistic vs Statistical Network Models

What You’ll Learn

Students should leave this class with an ever-growing codebase of resources for analyzing and deriving insights from complex networks, using Python. These skills range from being able to (from scratch) code algorithms on graphs, including path length calculations, network sampling, dynamical processes, and network null models; as well as interfacing with standard data science questions around storing, querying, and analyzing large complex datasets.

Instructors

Brennan Klein is core faculty at the Network Science Institute at Northeastern University. He is the director of the Complexity & Society Lab. His research spans two broad topics: 1) Information, emergence, and inference in complex systems — developing tools and theory for characterizing dynamics, structure, and scale in networks, and 2) Public health and public safety — creating and analyzing large-scale datasets that reveal inequalities in the United States, from epidemics to mass incarceration. Professor Klein received a PhD in Network Science in 2020 from Northeastern University and got his BA in Cognitive Science & Psychology from Swarthmore College in 2014.

Alyssa Smith is fourth-year PhD student in Network Science at Northeastern University. Her current work focuses on the ways that structure and agency interact in social networks to encourage mobilization. She is interested in making big data and computational tools usable by academics without specialized technical training. She use mixed methods, ranging from terabyte-scale datasets to autoethnography, to make sense of the world. Her dissertation work revolves around structure—the place one occupies in a social network—and agency—an individual’s characteristics and proclivities—which are thought to be the two main driving forces behind engagement in social movements. We can think of structure and agency as two separate, competing factors, or we can think of them as a duality: in much the same way that light is both a particle and a wave, the interplay of structure and agency is what governs mobilization. Before joining the Network Science Institute, Alyssa received a BS in Humanities and Engineering with Comparative Media Studies and Computer Science from MIT in 2017; after that, she worked in tech for 4 years. Website: https://asmithh.github.io/.

Syllabus below (or pdf here).


This schedule is subject to change.

Class Date Topic Instructor
0 Wed, Sep. 4, '24 Introduction to the Course, Github, Computing Setup Brennan Klein
1 Thu, Sep. 5, '24 Python Refresher (Data Structures, Numpy, etc.) Brennan Klein
2 Wed, Sep. 11, '24 Networkx 1 — Loading Data, Basic Statistics Brennan Klein
3 Thu, Sep. 12, '24 Networkx 2 — Graph Algorithms Brennan Klein
4 Wed, Sep. 18, '24 Distributions of Network Properties & Centralities Brennan Klein
5 Thu, Sep. 19, '24 Scraping Web Data 1 — BeautifulSoup, HTML, Pandas Both
6 Wed, Sep. 25, '24 Scraping Web Data 2 — Creating a Network from Scraped Data Alyssa Smith
7 Thu, Sep. 26, '24 Big Data 1 — Algorithmic Complexity & Computing Paths Alyssa Smith
8 Wed, Oct 2, '24 Data Science 1 — Pandas, SQL, Regressions Alyssa Smith
9 Thu, Oct. 3, '24 Data Science 2 — Querying SQL Tables for Network Construction Alyssa Smith
10 Wed, Oct. 9, '24 Clustering & Community Detection 1 — Traditional Brennan Klein
11 Thu, Oct. 10, '24 Clustering & Community Detection 2 — Contemporary Brennan Klein
12 Wed, Oct. 16, '24 Visualization 1 — Python Brennan Klein
13 Thu, Oct. 17, '24 Project Update Presentations Both
14 Wed, Oct. 23, '24 Introduction to Machine Learning 1 — General Alyssa Smith
15 Thu, Oct. 24, '24 Introduction to Machine Learning 2 — Networks Alyssa Smith
16 Wed, Oct. 30, '24 Visualization 2 — Guest Lecture (Pedro Cruz, Northeastern University) Both
17 Thu, Oct. 31, '24 Dynamics on Networks 1 — Diffusion and Random Walks Brennan Klein
18 Wed, Nov 6, '24 Dynamics on Networks 2 — Compartmental Models Brennan Klein
19 Thu, Nov. 7, '24 Dynamics on Networks 3 — Agent-Based Models Brennan Klein
20 Wed, Nov. 13, '24 Big Data 2 — Scalability Alyssa Smith
21 Thu, Nov. 14, '24 Network Sampling (Theory) Brennan Klein
22 Wed, Nov. 20, '24 Network Sampling (Practice) Alyssa Smith
23 Thu, Nov. 21, '24 Dynamic of Networks: Temporal Networks Brennan Klein
Wed, Nov. 27, '24 Office Hours Both
Thu, Nov. 28, '24 Thanksgiving, no class
24 Wed, Dec. 4, '24 Spatial Data, OSMNX, GeoPandas Brennan Klein
25 Thu, Dec. 5, '24 Office Hours Both
26 Wed, Dec. 11, '24 Final Presentations Both