CNET 5052 - Advanced Tools for Complex Network Analysis
Tuesdays: 1:45 — 4:50pm
January 7 – April 24, 2026
177 Huntington Ave. #226
Summary
This course extends the foundations of CNET 5051 into a set of advanced, research-facing tools for complex network analysis. Topics emphasize modern workflows for network inference and modeling (e.g., link prediction, sparsification, Bayesian/EM-style reasoning, stochastic block models and model fitting), computational methods for structure (e.g., distances, spectral tools, motifs, signed networks), and dynamics and simulation (e.g., reconstruction, games on networks, agent-based models). A parallel goal throughout the semester is to develop good research habits: reproducible code, clear documentation, defensible evaluation, and careful interpretation. Course materials (notebooks, readings, assignments, and code templates) will be distributed through a dedicated GitHub repository. Students conclude the semester with a final project in the form of a short research-style paper that presents a network-science question, method, or empirical study with clear results and limitations.
Course Learning Outcomes
Build reproducible network-analysis workflows in Python, including clear project structure, documentation, and version-controlled code suitable for research collaboration.
Implement and evaluate methods for network structure and inference, including graph distances, link prediction, and sparsification/sampling, with appropriate baselines and metrics.
Formulate and fit probabilistic and generative network models (e.g. stochastic block models), and interpret results with attention to uncertainty, model assumptions, and diagnostics.
Apply computational tools for network structure beyond standard metrics, including spectral methods, motifs, and signed-network analysis.
Design and analyze network dynamics and simulation studies (e.g., reconstruction problems, games on networks, agent-based models).
Produce a research-style final project that combines data, methods, results, and interpretation into a reproducible repository and a well-structured paper with proper citation practices.
________________________________
Coursework, Class Structure, Grading
This is a once-weekly, hands-on, code-forward course focused on developing comfort, fluency, and independence with computational workflows in network science. Each class meeting blends conceptual discussions, notebook-driven demonstrations, short implementation exercises, and guided time for students to deepen their computational practice.
Grading will be based on the following:
Attendance and Participation 10%
Active participation in discussions, coding labs, and peer feedback sessions.
Assignments 45%
Four coding and analysis assignments that build/evaluate technical skills.
Final Project Report and Presentation 45%
Proposal (5%), mid-semester update presentation (5%), final paper + reproducible repository (25%), final presentation (10%).
Final Project Details
The final project is a research-style project designed to mirror how network science work is actually done: you will pose a question (or evaluate a method), assemble or generate data, implement an analysis pipeline, report results, and communicate limitations. Projects may be methodological (e.g., comparing techniques, extending existing tools, theoretical work, etc.) or applied (e.g., a focused empirical study of an online, biological, spatial, or infrastructure network). The emphasis is on clarity, defensible evaluation, and reproducibility.
Project milestones
Tue, Jan 27 (in class): Proposal + short presentation. Submit a brief (up to 1 page) proposal and give a short (no more than 5 min) in-class overview of your plan.
Tue, Feb 17 (in class): Mid-semester update presentations. Present progress and preliminary results to receive feedback (5 min).
Tue, Apr 21 (in class): Final project presentations. Present completed work and receive peer/instructor feedback (12 min, +3 min Q&A).
Final submission package. The final project submission must include:
Reproducible GitHub repository containing:
A clear README describing the project, how to reproduce results, and how data are obtained.
A reproducible environment specification (e.g., requirements.txt or environment.yml).
Code and/or notebooks that run end-to-end (data —> results —> figures/tables).
Proper attribution for any external code, data, or tools used.
Research paper (PDF), typically 8-12 pages, written for a scientific audience.
Final presentation (in class) that communicates motivation, methods, key results, limitations.
Evaluation criteria. Projects will be assessed based on the clarity and specificity of the research question and the motivation for the design choices that follow from it. Work should demonstrate methodological correctness, including appropriate use of course tools and accurate implementation. Projects should also include a defensible evaluation strategy—with sensible baselines, well-chosen metrics, and validation or robustness checks that support the claims being made. Strong projects interpret results carefully, making clear what the findings do and do not imply, and explicitly discussing limitations. Reproducibility is essential: repositories should be well organized and documented, with enough information for another reader to rerun the analysis and recover the main results. Finally, projects will be evaluated on communication quality, including the structure and readability of the paper, the clarity of figures and tables, and the effectiveness of the final presentation.
________________________________
Course Materials
There is no single textbook that covers the scope of this course. Instead, students will work with a combination of open-source texts, research articles, and software tools. All required readings, notebooks, assignments, and code templates will be available through the course GitHub repository.
Resources:
Python and Data Science
VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O’Reilly Media, Inc. https://jakevdp.github.io/PythonDataScienceHandbook/
Severance, C. (2016). Python for Everybody: Exploring Data using Python 3. Charles Severance. https://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf
Downey, A. (2012). Think Python: How to Think Like a Computer Scientist. https://www.greenteapress.com/thinkpython/thinkpython.pdf
Network Science & Complex Systems
Barabási, A.L. & Pósfai, M. (2016). Network Science. Cambridge University Press. https://networksciencebook.com/
Menczer, F., Fortunato, S., & Davis, C. A. (2020). A First Course in Network Science. Cambridge University Press. https://doi.org/10.1017/9781108653947
Thurner, S., Hanel, R., & Klimek, P. (2018). Introduction to the Theory of Complex Systems. Oxford University Press. https://academic.oup.com/book/25504
Klein, B., Smith, A., Chinazzi, M., Zhang, Q., et al. (2025) Network Science Data & Models Python Textbook — https://network-science-data-and-models.github.io/phys7332_fa25/README.html
Software and Data
Python (e.g. numpy, pandas, matplotlib, networkx, statsmodels, scikit-learn, among others) and Jupyter notebooks, distributed through the course GitHub.
________________________________
Instructors
Brennan Klein is core faculty at the Network Science Institute and Assistant Teaching Professor in the Department of Physics. He is the program director of the MS in Complex Network Analysis at Northeastern University. Prof. Klein is also the director of the Complexity & Society Lab, which is focused on two broad research areas: 1) Information, emergence, and inference in complex systems: developing tools and theory for characterizing dynamics, structure, and scale in networks, and 2) Public health and public safety: drawing on complex systems science to document—and fight against—emergent or systemic disparities in society, especially as they relate to public health and public safety. As of 2025, he is also the director of NetSI Sport, an interdisciplinary research group focusing on complex systems-inspired approaches to sports analytics. In 2023, Prof. Klein was awarded the René Thom Young Researcher Award, given to a researcher to recognize substantial early career contributions and leadership in research in Complex Systems-related fields. Prof. Klein is the Data for Justice Fellow at the Institute on Policing, Incarceration & Public Safety at Harvard University’s Hutchins Center for African & African American Research. He received a PhD in Network Science in 2020 from Northeastern University and earned his BA in Cognitive Science & Psychology from Swarthmore College in 2014. Website: brennanklein.com.
Milo Trujillo is a Postdoctoral Research Fellow and Associate Director of the Communication Media and Marginalization Lab at the Network Science Institute. His primary interest is in how the structure of online platforms, including both their technical design and social policies, influences online group behavior. These topics include content moderation and deplatforming, the emergence of alt-tech, decentralized social platforms, and the governance of open source software. Dr. Trujillo received a PhD in Complex Systems and Data Science in 2024 from the University of Vermont, and received M.S. and B.S. degrees in computer science and a B.S. in Science and Technology Studies from Rensselaer Polytechnic Institute in 2020 and 2018. Website: https://backdrifting.net/.
Office Hours
Friday afternoons from 3:00-4:00pm at 177 Huntington Ave. 10th floor.
________________________________
Accessibility and Accommodations
Northeastern is committed to providing equal educational opportunities for all students. Students who require accommodations for a documented disability should contact the Disability Resource Center as early as possible to ensure that appropriate arrangements can be made. Once you have documentation, please share your accommodation letter with me so we can discuss how best to support your learning.
Late Work Policy
Assignments are due on the dates listed in the schedule. Each student has a 48-hour grace period across the semester that can be applied to any assignment without penalty. After this, late work will be marked down 10% per day, up to three days. Extensions for serious circumstances will be considered.
Academic Integrity
All students are expected to uphold Northeastern University’s Academic Integrity Policy, which prohibits cheating, plagiarism, fabrication, unauthorized collaboration, and other forms of academic dishonesty. You are responsible for ensuring that your work reflects your own effort and analysis, even when you consult outside resources such as peers, published materials, or AI tools. Proper citation is required whenever you use code, data, text, or ideas that are not your own. Questions about what counts as appropriate collaboration or citation should be raised with me directly. Suspected violations will be referred to the Office of Student Conduct and Conflict Resolution. More information can be found here: https://osccr.sites.northeastern.edu/academic-integrity-policy/.
All student records and coursework in this class are handled in compliance with the Family Educational Rights and Privacy Act. Please use your Northeastern email account for all course communications.
________________________________
Policy on Artificial Intelligence and Large Language Models
This course recognizes the potential of artificial intelligence (AI) tools—such as ChatGPT, Copilot, Claude, and other text or code generators—to support learning, creativity, and efficiency. You are encouraged to use AI when it adds value to your learning process, provided that its use is transparent, relevant, and critically evaluated. AI can help brainstorm ideas, debug code, generate visualizations, or give writing feedback, but it is not a substitute for your own analysis or reasoning.
Guidelines for Use
AI use will vary depending on the assignment. Labels will be provided to indicate whether AI use is prohibited, permitted, encouraged, or required, depending on the learning objectives.
For assignments where AI use is allowed: cite the tool, include information about the prompt or queries you used, and briefly explain how it contributed to your work. This is not meant to police your prompts, but rather to crowdsource and share effective strategies for navigating the tool.
You remain responsible for the accuracy, originality, and integrity of all submitted work. AI tools are known to make errors, invent references, or introduce bias. Verification is your responsibility.
Learning Orientation
Think of AI as a ladder, not a crutch. Its purpose is to extend your abilities, not to replace the productive struggle of problem-solving. Over-reliance on AI will limit your growth, while thoughtful use can accelerate your improvement on a range of quantitative and qualitative skills. Throughout the semester, we will highlight best practices for integrating AI into analysis, coding, and communication in ways that strengthen—not weaken—your understanding.
Schedule below (or in pdf here).
Schedule and topics may be adjusted with reasonable notice.
Week 1
Class 1: Tue. Jan. 13, 2026
Introduction, Growth, Distances — (Both)
Course overview; computational expectations; what “advanced tools” means in practice.
Network growth models (with an emphasis on implementable generative processes).
Graph distances at scale: shortest paths, efficiency, diameter, and practical approximations.
Final project examples + structured brainstorming.
Friday, Jan. 16, 2026 — Assignment 1 announced
Week 2
Class 2: Tue. Jan. 20, 2026
Link Prediction and Sparsification — (Klein)
Link prediction as inference: scores, features, and evaluation (with attention to leakage).
Similarity-based predictors and baselines; where they work and where they fail.
Sparsification/sampling for scale: what structure is preserved, what is distorted, and why.
Connections to homophily and robustness (as framing for the homework).
Week 3
Class 3: Tue. Jan. 27, 2026
Bayesian Methods & Expectation Maximization — (Trujillo, both)
Project idea due (in class): short write-up (up to 1 page) + brief presentation.
Bayes’ rule, likelihood, priors, and posteriors.
A compact view of latent-variable models and EM as an inference pattern.
How probabilistic framing changes link prediction and uncertainty reporting.
Friday, January 30, 2026 — Assignment 1 due
Week 4
Class 4: Tue. Feb 3, 2026
Communities Revisited and the SBM as a Generative Object — (Both)
Community structure: “algorithmic” vs “model-based” perspectives.
Stochastic Block Models as a data-generating story (and what that implies).
What SBMs can/can’t represent; why degree correction matters (conceptually).
Friday, February 6, 2026 — Assignment 2 announced
Week 5
Class 5: Tue. Feb. 10, 2026
Fitting SBMs in Practice with graph-tool — (Klein)
Practical SBM fitting workflows in graph-tool.
Model selection / complexity control (intuition + what the software is optimizing).
Interpreting partitions responsibly: uncertainty, stability, and diagnostics.
Week 6
Class 6: Tue. Feb. 17, 2026
Spatial Networks + Mid-Semester Project Updates — (Klein)
Intermediate project update presentations (in class).
Embedding networks into space: distance effects, spatial statistics, and null models.
Properties of spatial networks and what changes when geometry matters.
Friday, February 20, 2026 — Assignment 2 due
Week 7
Class 7: Tue. Feb. 24, 2026
Machine Learning Workflows for Network Data — (Trujillo)
End-to-end ML pipelines for network problems: features, splits, baselines, metrics.
When “standard” ML assumptions break on network data (dependence, sampling, leakage).
Friday, February 27, 2026 — Assignment 3 announced
Tue. Mar. 3, 2026
SPRING BREAK NO CLASS
Week 8
Class 8: Tue. Mar. 10, 2026
Topics in Big Data for Network-Scale Questions — (Trujillo)
Streaming constraints and approximate computation as a design choice.
HyperLogLog for approximate distinct counting: intuition and implementation.
Where sketches plug into network analysis workflows (and where they don’t).
Friday, March 13, 2026 — Assignment 3 due
Week 9
Class 9: Tue. Mar. 17, 2026
Network Dynamics and Reconstruction — (Klein)
Dynamics on networks as computational objects (simulation and inference).
Reconstruction problems: partial observation, missing edges, and temporal evidence.
Connecting mechanistic models to data and to evaluation.
Friday, March 20, 2026 — Assignment 4 announced
Week 10
Class 10: Tue. Mar. 24, 2026
Games on Networks and Agent-Based Models — (Klein)
Games on networks: strategic interaction with topology as structure.
Agent-based models on networks: design patterns, debugging, and interpretation.
What “mechanism” buys you (and what it doesn’t) in network settings.
Week 11
Class 11: Tue. Mar. 31, 2026
Spectral Methods — (Klein)
Laplacians, eigenvectors, and what spectra say about structure.
Spectral clustering (conceptual and computational view).
Spectral ideas as “tools you can reuse” across network tasks.
Friday, April 3, 2026 — Assignment 4 due
Week 12
Class 12: Tue. Apr. 7, 2026
Motifs and Signed Networks — (Klein)
Motifs: counting, null models, and what “significance” really means.
Signed networks: balance, structure, and analysis tools for positive/negative ties.
Week 13
Class 13: Tue. Apr. 14, 2026
Flexible Topics / Tooling Comparisons — (Both)
Student-driven topics based on project needs and open questions from the semester.
Tooling comparisons and practical workflow choices (when/why to use what).
Week 14
Class 14: Tue. Apr. 21, 2026
Final Project Presentations — (Both)
Final project paper + repository due
Project presentations + feedback.
Synthesis and wrap-up.
