CNET 5442 - Sports Analytics through Data and Networks - Spring 2026

Mondays & Wednesdays: 2:50 – 4:30pm
January 7 – April 24, 2026
Classroom TBD

Summary

Sports provide a powerful lens for learning quantitative thinking. Every game generates a stream of measurable events—goals, passes, shots, substitutions—that can be modeled as time series, networks, probability distributions, or decision processes. In this course, sports analytics is more than an industry tool: it is a Trojan horse for teaching statistics, modeling, scientific computing, and complex systems. Students will work hands-on with real datasets and Python notebooks to practice regression, hypothesis testing, Bayesian inference, classification, and causal inference. These methods expand into the tools of complex systems, from passing networks and player embeddings to tournament forecasting and the diffusion of tactics. The semester culminates in a group project where students apply course methods to a dataset of their choice (preferably, a novel dataset collected for this course). Students will leave the course not only seeing how analytics reshapes modern sports but also having gained a transferable set of skills for analyzing complex systems in any domain.

Course Learning Outcomes

By the end of this course, students will be able to:

  1. Apply core statistical and data science techniques—including regression, hypothesis testing, Bayesian inference, classification, and causal inference—to real sports datasets.

  2. Represent sports and complex systems data in multiple forms (e.g., tables, networks, sequences, tensors) and evaluate the assumptions and limitations of each representation.

  3. Construct and analyze network models of play, including passing networks, ranking systems, and player embeddings, using appropriate computational tools.

  4. Develop and implement forecasting and simulation models to predict matches, tournaments, and tactical trends, and assess their accuracy.

  5. Collect, clean, and process raw sports data; generate reproducible analyses in Python; and communicate findings through effective visualizations, written reports, and oral presentations.

  6. Critically evaluate how analytics influences sports and broader complex systems, and formulate evidence-based arguments that test intuition against data.

________________________________

Coursework, Class Structure, Grading

This course meets twice weekly and combines short lectures with interactive coding and discussion. Lectures will introduce the core ideas, followed by live demonstrations and guided exercises in Python. Students should bring laptops to every class, as many sessions will include in-class coding, data exploration, or short activities. We will occasionally host guest speakers from academia and industry to provide perspective on applied sports analytics. A short mid-class break will be built in to allow for questions and informal interaction. Class time will be complemented by structured weekly assignments, which focus on applying methods to real datasets, and a semester-long group project that allows students to design and carry out their own analysis. The project emphasizes the full research cycle: posing a question, collecting data, building models, evaluating results, and communicating findings clearly.

Students can expect timely communication about assignments via Canvas, feedback on major assessments within two weeks, and responses to emails within 48 hours on weekdays. In turn, I expect students to come prepared to participate in coding and discussion, to collaborate respectfully, and to communicate promptly if challenges arise that may affect their coursework.

Grading will be based on the following:

  • Attendance and Participation 10%

    • Active participation in discussions, coding labs, and peer feedback sessions.

  • Weekly Assignments 45%

    • Seven coding and analysis assignments that build/evaluate technical skills.

  • Midterm Project Proposal and Presentation 10%

    • Groups present their research plan and preliminary analysis to receive structured feedback.

  • Final Project Report and Presentation 35%

    • A group project involving data collection/analysis, culminating in a report and presentation.

Final Project Details

The semester culminates in a group project that applies course methods to a real sports or complex systems dataset. Projects may involve analyzing an existing dataset, scraping or collecting new data, or extending techniques introduced in class. Projects will be evaluated on originality, rigor, use of course methods, clarity of communication, and reproducibility of analysis. Deliverables include:

  • Proposal & Intermediate Presentation (Week 7/8): A short written description of your question, dataset, and plan and a 5-7 minute presentation of your project design for feedback.

  • Final Report and Presentation (due Finals Week): A 8-20 page write-up that presents your methods, results, and interpretation, and a group presentation delivered during the final week.

________________________________

Course Materials

There is no single textbook that covers the scope of this course. Instead, students will work with a combination of open-source texts, research articles, and software tools. All required readings will be made available through the course website.

Resources:

  • Network Science, by Albert-László Barabási (free online textbook).

  • Research papers and case studies in sports analytics (provided as PDFs).

Software and Data

  • Python (e.g. numpy, pandas, matplotlib, networkx, statsmodels, scikit-learn, statsbombpy, among others) and Jupyter notebooks, distributed through the course GitHub.

  • Open datasets, including StatsBomb open event data, NBA play-by-play data, FIFA World Cup tracking datasets, ATP Tennis datasets, and more.

  • Students are encouraged to collect or scrape additional data for their final projects.

________________________________

Instructor

Brennan Klein is core faculty at the Network Science Institute and Assistant Teaching Professor in the Department of Physics. He is the program director of the MS in Complex Network Analysis at Northeastern University. Prof. Klein is also the director of the Complexity & Society Lab, which is focused on two broad research areas: 1) Information, emergence, and inference in complex systems: developing tools and theory for characterizing dynamics, structure, and scale in networks, and 2) Public health and public safety: drawing on complex systems science to document—and fight against—emergent or systemic disparities in society, especially as they relate to public health and public safety. In 2023, Prof. Klein was awarded the René Thom Young Researcher Award, given to a researcher to recognize substantial early career contributions and leadership in research in Complex Systems-related fields. Prof. Klein is the Data for Justice Fellow at the Institute on Policing, Incarceration & Public Safety at Harvard University’s Hutchins Center for African & African American Research. He received a PhD in Network Science in 2020 from Northeastern University and earned his BA in Cognitive Science & Psychology from Swarthmore College in 2014. Website: http://brennanklein.com.

Office Hours

Prof. Klein will hold regular office hours twice per week to support coding assignments, data collection, and group projects. Students are encouraged to attend office hours for technical support, debugging help, and early feedback on project ideas. For Spring 2026, the current plan is to hold office hours:

  • Mondays, 1:30-2:30pm, Network Science Institute (177 Huntington Ave, 10th floor) or Zoom link posted on Canvas.

________________________________

Accessibility and Accommodations

Northeastern is committed to providing equal educational opportunities for all students. Students who require accommodations for a documented disability should contact the Disability Resource Center as early as possible to ensure that appropriate arrangements can be made. Once you have documentation, please share your accommodation letter with me so we can discuss how best to support your learning.

Late Work Policy

Assignments are due on the dates listed in the schedule. Each student has a 48-hour grace period across the semester that can be applied to any assignment without penalty. After this, late work will be marked down 10% per day, up to three days. Extensions for serious circumstances will be considered.

Academic Integrity

All students are expected to uphold Northeastern University’s Academic Integrity Policy, which prohibits cheating, plagiarism, fabrication, unauthorized collaboration, and other forms of academic dishonesty. You are responsible for ensuring that your work reflects your own effort and analysis, even when you consult outside resources such as peers, published materials, or AI tools. Proper citation is required whenever you use code, data, text, or ideas that are not your own. Questions about what counts as appropriate collaboration or citation should be raised with me directly. Suspected violations will be referred to the Office of Student Conduct and Conflict Resolution. More information can be found here: https://osccr.sites.northeastern.edu/academic-integrity-policy/.

All student records and coursework in this class are handled in compliance with the Family Educational Rights and Privacy Act. Please use your Northeastern email account for all course communications.

________________________________

Policy on Artificial Intelligence and Large Language Models

This course recognizes the potential of artificial intelligence (AI) tools—such as ChatGPT, Copilot, Claude, and other text or code generators—to support learning, creativity, and efficiency. You are encouraged to use AI when it adds value to your learning process, provided that its use is transparent, relevant, and critically evaluated. AI can help brainstorm ideas, debug code, generate visualizations, or give writing feedback, but it is not a substitute for your own analysis or reasoning.

Guidelines for Use

  • AI use will vary depending on the assignment. Labels will be provided to indicate whether AI use is prohibited, permitted, encouraged, or required, depending on the learning objectives.

  • For assignments where AI use is allowed: cite the tool, include information about the prompt or queries you used, and briefly explain how it contributed to your work. This is not meant to police your prompts, but rather to crowdsource and share effective strategies for navigating the tool.

  • You remain responsible for the accuracy, originality, and integrity of all submitted work. AI tools are known to make errors, invent references, or introduce bias. Verification is your responsibility.

Learning Orientation

Think of AI as a ladder, not a crutch. Its purpose is to extend your abilities, not to replace the productive struggle of problem-solving. Over-reliance on AI will limit your growth, while thoughtful use can accelerate your improvement on a range of quantitative and qualitative skills. Throughout the semester, we will highlight best practices for integrating AI into analysis, coding, and communication in ways that strengthen—not weaken—your understanding.

________________________________

Schedule below (or in pdf here).

Schedule and topics may be adjusted with reasonable notice.


Week 1

Class 1: Wed. Jan. 7, 2026

Introduction to the course and semester goals: Sports as complex systems; why sports are data-rich laboratories.


Week 2

Class 2: Mon. Jan. 12, 2026

Sports Data Types and Structures I: What is a data structure? Tables, graphs, tensors, sequences.

Class 3: Wed. Jan. 14, 2026

Sports Data Types and Structures II: State of sports analytics across baseball, basketball, soccer, football, hockey.


Week 3

Mon. Jan. 19, 2026

No class — Martin Luther King Jr. Day

Class 4: Wed. Jan. 21, 2026

Distributions & Surprises: Soccer goal intervals, streaks, null models vs observed.


Week 4

Class 5: Mon. Jan. 26, 2026

Survival Analysis: Hazard rates, survival curves for soccer goals.

Class 6: Wed. Jan. 28, 2026

Regression I: Expected goals (xG) model; logistic regression foundations.


Week 5

Class 7: Mon. Feb. 2, 2026

Regression II: Calibration, odds ratios, comparing regression to ML baselines.

Class 8: Wed. Feb. 4, 2026

Ranking I: Elo, Bradley-Terry, PageRank (tennis case study).


Week 6

Class 9: Mon. Feb. 9, 2026

Ranking II: Cross-sport applications (FIFA rankings, NCAA, chess).

Class 10: Wed. Feb. 11, 2026

Hypothesis Testing I: Is there a home advantage? t-tests, ANOVA.


Week 7

Mon. Feb. 16, 2026

No class, Presidents' Day

Class 11: Wed. Feb. 18, 2026

Hypothesis Testing II: Referee bias, fouls/cards, permutation testing.


Week 8

Class 12: Mon. Feb. 23, 2026

Classification: Predicting shot outcomes; ROC curves, AUC, precision-recall; overfitting, cross-validation.

Class 13: Wed. Feb. 25, 2026

Bayesian I: Priors and posteriors; penalty shootouts, hot hand.


Week 9

SPRING BREAK NO CLASS


Week 10

Class 14: Mon. Mar. 9, 2026

Bayesian II: Elo as Bayesian updating; posterior predictive checks.

Class 15: Wed. Mar. 11, 2026

Causality I: Red cards and match outcomes; regression discontinuity. Midterm project proposals due.


Week 11

Class 16: Mon. Mar. 16, 2026

Networks I: Passing networks, centrality measures.

Class 17: Wed. Mar. 18, 2026

Networks II: Motifs, communities, positional roles, robustness.


Week 12

Class 18: Mon. Mar. 23, 2026

Web Scraping Masterclass: How to collect novel data (Python requests, BeautifulSoup, APIs).

Class 19: Wed. Mar. 25, 2026

Guest Lecture: Guest Lecture TBD; Discussion + Q & A.


Week 13

Class 20: Mon. Mar. 30, 2026

Embeddings: PCA, MDS, nonlinear embeddings (t-SNE, UMAP).

Class 21: Wed. Apr. 1, 2026

Forecasting I: Match outcomes using networks and xG.


Week 14

Class 22: Mon. Apr. 6, 2026

Forecasting II: Tournament simulations (Monte Carlo, bracket modeling)

Class 23: Wed. Apr. 8, 2026

Information Theory: Entropy of play selection, mutual information between players.


Week 15

Class 24: Mon. Apr. 13, 2026

Innovation & Diffusion: How tactics spread (NBA three-point revolution, soccer pressing).

Class 25: Wed. Apr. 15, 2026

Invited Speaker II: Project workshop with guest feedback.


Week 16

Class 26: Mon. Apr. 20, 2026

Final Presentations I: Group Projects.

Class 27: Wed. Apr. 22, 2026

Final Presentations II: Solo Projects.