CNET 5052 - Advanced Tools for Complex Network Analysis

Tuesdays: 1:45 — 4:50pm
January 7 – April 24, 2026
177 Huntington Ave. #226

Summary

This course extends the foundations of CNET 5051 into a set of advanced, research-facing tools for complex network analysis. Topics emphasize modern workflows for network inference and modeling (e.g., link prediction, sparsification, Bayesian/EM-style reasoning, stochastic block models and model fitting), computational methods for structure (e.g., distances, spectral tools, motifs, signed networks), and dynamics and simulation (e.g., reconstruction, games on networks, agent-based models). A parallel goal throughout the semester is to develop good research habits: reproducible code, clear documentation, defensible evaluation, and careful interpretation. Course materials (notebooks, readings, assignments, and code templates) will be distributed through a dedicated GitHub repository. Students conclude the semester with a final project in the form of a short research-style paper that presents a network-science question, method, or empirical study with clear results and limitations.

Course Learning Outcomes

  1. Build reproducible network-analysis workflows in Python, including clear project structure, documentation, and version-controlled code suitable for research collaboration.

  2. Implement and evaluate methods for network structure and inference, including graph distances, link prediction, and sparsification/sampling, with appropriate baselines and metrics.

  3. Formulate and fit probabilistic and generative network models (e.g. stochastic block models), and interpret results with attention to uncertainty, model assumptions, and diagnostics.

  4. Apply computational tools for network structure beyond standard metrics, including spectral methods, motifs, and signed-network analysis.

  5. Design and analyze network dynamics and simulation studies (e.g., reconstruction problems, games on networks, agent-based models).

  6. Produce a research-style final project that combines data, methods, results, and interpretation into a reproducible repository and a well-structured paper with proper citation practices.

________________________________

Coursework, Class Structure, Grading

This is a once-weekly, hands-on, code-forward course focused on developing comfort, fluency, and independence with computational workflows in network science. Each class meeting blends conceptual discussions, notebook-driven demonstrations, short implementation exercises, and guided time for students to deepen their computational practice.

Grading will be based on the following:

  • Attendance and Participation 10%

    • Active participation in discussions, coding labs, and peer feedback sessions.

  • Assignments 45%

    • Four coding and analysis assignments that build/evaluate technical skills.

  • Final Project Report and Presentation 45%

    • Proposal (5%), mid-semester update presentation (5%), final paper + reproducible repository (25%), final presentation (10%).

Final Project Details

The final project is a research-style project designed to mirror how network science work is actually done: you will pose a question (or evaluate a method), assemble or generate data, implement an analysis pipeline, report results, and communicate limitations. Projects may be methodological (e.g., comparing techniques, extending existing tools, theoretical work, etc.) or applied (e.g., a focused empirical study of an online, biological, spatial, or infrastructure network). The emphasis is on clarity, defensible evaluation, and reproducibility.

Project milestones

  • Tue, Jan 27 (in class): Proposal + short presentation. Submit a brief (up to 1 page) proposal and give a short (no more than 5 min) in-class overview of your plan.

  • Tue, Feb 17 (in class): Mid-semester update presentations. Present progress and preliminary results to receive feedback (5 min).

  • Tue, Apr 21 (in class): Final project presentations. Present completed work and receive peer/instructor feedback (12 min, +3 min Q&A).

Final submission package. The final project submission must include:

  • Reproducible GitHub repository containing:

    • A clear README describing the project, how to reproduce results, and how data are obtained.

    • A reproducible environment specification (e.g., requirements.txt or environment.yml).

    • Code and/or notebooks that run end-to-end (data —> results —> figures/tables).

    • Proper attribution for any external code, data, or tools used.

  • Research paper (PDF), typically 8-12 pages, written for a scientific audience.

  • Final presentation (in class) that communicates motivation, methods, key results, limitations.

Evaluation criteria. Projects will be assessed based on the clarity and specificity of the research question and the motivation for the design choices that follow from it. Work should demonstrate methodological correctness, including appropriate use of course tools and accurate implementation. Projects should also include a defensible evaluation strategy—with sensible baselines, well-chosen metrics, and validation or robustness checks that support the claims being made. Strong projects interpret results carefully, making clear what the findings do and do not imply, and explicitly discussing limitations. Reproducibility is essential: repositories should be well organized and documented, with enough information for another reader to rerun the analysis and recover the main results. Finally, projects will be evaluated on communication quality, including the structure and readability of the paper, the clarity of figures and tables, and the effectiveness of the final presentation.

________________________________

Course Materials

There is no single textbook that covers the scope of this course. Instead, students will work with a combination of open-source texts, research articles, and software tools. All required readings, notebooks, assignments, and code templates will be available through the course GitHub repository.

Resources:

Software and Data

  • Python (e.g. numpy, pandas, matplotlib, networkx, statsmodels, scikit-learn, among others) and Jupyter notebooks, distributed through the course GitHub.

________________________________

Instructors

Brennan Klein is core faculty at the Network Science Institute and Assistant Teaching Professor in the Department of Physics. He is the program director of the MS in Complex Network Analysis at Northeastern University. Prof. Klein is also the director of the Complexity & Society Lab, which is focused on two broad research areas: 1) Information, emergence, and inference in complex systems: developing tools and theory for characterizing dynamics, structure, and scale in networks, and 2) Public health and public safety: drawing on complex systems science to document—and fight against—emergent or systemic disparities in society, especially as they relate to public health and public safety. As of 2025, he is also the director of NetSI Sport, an interdisciplinary research group focusing on complex systems-inspired approaches to sports analytics. In 2023, Prof. Klein was awarded the René Thom Young Researcher Award, given to a researcher to recognize substantial early career contributions and leadership in research in Complex Systems-related fields. Prof. Klein is the Data for Justice Fellow at the Institute on Policing, Incarceration & Public Safety at Harvard University’s Hutchins Center for African & African American Research. He received a PhD in Network Science in 2020 from Northeastern University and earned his BA in Cognitive Science & Psychology from Swarthmore College in 2014. Website: brennanklein.com.

Milo Trujillo is a Postdoctoral Research Fellow and Associate Director of the Communication Media and Marginalization Lab at the Network Science Institute. His primary interest is in how the structure of online platforms, including both their technical design and social policies, influences online group behavior. These topics include content moderation and deplatforming, the emergence of alt-tech, decentralized social platforms, and the governance of open source software. Dr. Trujillo received a PhD in Complex Systems and Data Science in 2024 from the University of Vermont, and received M.S. and B.S. degrees in computer science and a B.S. in Science and Technology Studies from Rensselaer Polytechnic Institute in 2020 and 2018. Website: https://backdrifting.net/.

Office Hours

Friday afternoons from 3:00-4:00pm at 177 Huntington Ave. 10th floor.

________________________________

Accessibility and Accommodations

Northeastern is committed to providing equal educational opportunities for all students. Students who require accommodations for a documented disability should contact the Disability Resource Center as early as possible to ensure that appropriate arrangements can be made. Once you have documentation, please share your accommodation letter with me so we can discuss how best to support your learning.

Late Work Policy

Assignments are due on the dates listed in the schedule. Each student has a 48-hour grace period across the semester that can be applied to any assignment without penalty. After this, late work will be marked down 10% per day, up to three days. Extensions for serious circumstances will be considered.

Academic Integrity

All students are expected to uphold Northeastern University’s Academic Integrity Policy, which prohibits cheating, plagiarism, fabrication, unauthorized collaboration, and other forms of academic dishonesty. You are responsible for ensuring that your work reflects your own effort and analysis, even when you consult outside resources such as peers, published materials, or AI tools. Proper citation is required whenever you use code, data, text, or ideas that are not your own. Questions about what counts as appropriate collaboration or citation should be raised with me directly. Suspected violations will be referred to the Office of Student Conduct and Conflict Resolution. More information can be found here: https://osccr.sites.northeastern.edu/academic-integrity-policy/.

All student records and coursework in this class are handled in compliance with the Family Educational Rights and Privacy Act. Please use your Northeastern email account for all course communications.

________________________________

Policy on Artificial Intelligence and Large Language Models

This course recognizes the potential of artificial intelligence (AI) tools—such as ChatGPT, Copilot, Claude, and other text or code generators—to support learning, creativity, and efficiency. You are encouraged to use AI when it adds value to your learning process, provided that its use is transparent, relevant, and critically evaluated. AI can help brainstorm ideas, debug code, generate visualizations, or give writing feedback, but it is not a substitute for your own analysis or reasoning.

Guidelines for Use

  • AI use will vary depending on the assignment. Labels will be provided to indicate whether AI use is prohibited, permitted, encouraged, or required, depending on the learning objectives.

  • For assignments where AI use is allowed: cite the tool, include information about the prompt or queries you used, and briefly explain how it contributed to your work. This is not meant to police your prompts, but rather to crowdsource and share effective strategies for navigating the tool.

  • You remain responsible for the accuracy, originality, and integrity of all submitted work. AI tools are known to make errors, invent references, or introduce bias. Verification is your responsibility.

Learning Orientation

Think of AI as a ladder, not a crutch. Its purpose is to extend your abilities, not to replace the productive struggle of problem-solving. Over-reliance on AI will limit your growth, while thoughtful use can accelerate your improvement on a range of quantitative and qualitative skills. Throughout the semester, we will highlight best practices for integrating AI into analysis, coding, and communication in ways that strengthen—not weaken—your understanding.


Schedule below (or in pdf here).

Schedule and topics may be adjusted with reasonable notice.


Week 1

Class 1: Tue. Jan. 13, 2026

Introduction, Growth, Distances — (Both)

  • Course overview; computational expectations; what “advanced tools” means in practice.

  • Network growth models (with an emphasis on implementable generative processes).

  • Graph distances at scale: shortest paths, efficiency, diameter, and practical approximations.

  • Final project examples + structured brainstorming.

Friday, Jan. 16, 2026 — Assignment 1 announced

Week 2

Class 2: Tue. Jan. 20, 2026

Link Prediction and Sparsification — (Klein)

  • Link prediction as inference: scores, features, and evaluation (with attention to leakage).

  • Similarity-based predictors and baselines; where they work and where they fail.

  • Sparsification/sampling for scale: what structure is preserved, what is distorted, and why.

  • Connections to homophily and robustness (as framing for the homework).


Week 3

Class 3: Tue. Jan. 27, 2026

Bayesian Methods & Expectation Maximization — (Trujillo, both)

  • Project idea due (in class): short write-up (up to 1 page) + brief presentation.

  • Bayes’ rule, likelihood, priors, and posteriors.

  • A compact view of latent-variable models and EM as an inference pattern.

  • How probabilistic framing changes link prediction and uncertainty reporting.

Friday, January 30, 2026 — Assignment 1 due

Week 4

Class 4: Tue. Feb 3, 2026

Communities Revisited and the SBM as a Generative Object — (Both)

  • Community structure: “algorithmic” vs “model-based” perspectives.

  • Stochastic Block Models as a data-generating story (and what that implies).

  • What SBMs can/can’t represent; why degree correction matters (conceptually).

Friday, February 6, 2026 — Assignment 2 announced

Week 5

Class 5: Tue. Feb. 10, 2026

Fitting SBMs in Practice with graph-tool — (Klein)

  • Practical SBM fitting workflows in graph-tool.

  • Model selection / complexity control (intuition + what the software is optimizing).

  • Interpreting partitions responsibly: uncertainty, stability, and diagnostics.


Week 6

Class 6: Tue. Feb. 17, 2026

Spatial Networks + Mid-Semester Project Updates — (Klein)

  • Intermediate project update presentations (in class).

  • Embedding networks into space: distance effects, spatial statistics, and null models.

  • Properties of spatial networks and what changes when geometry matters.

Friday, February 20, 2026 — Assignment 2 due

Week 7

Class 7: Tue. Feb. 24, 2026

Machine Learning Workflows for Network Data — (Trujillo)

  • End-to-end ML pipelines for network problems: features, splits, baselines, metrics.

  • When “standard” ML assumptions break on network data (dependence, sampling, leakage).

Friday, February 27, 2026 — Assignment 3 announced

Tue. Mar. 3, 2026

SPRING BREAK NO CLASS


Week 8

Class 8: Tue. Mar. 10, 2026

Topics in Big Data for Network-Scale Questions — (Trujillo)

  • Streaming constraints and approximate computation as a design choice.

  • HyperLogLog for approximate distinct counting: intuition and implementation.

  • Where sketches plug into network analysis workflows (and where they don’t).

Friday, March 13, 2026 — Assignment 3 due

Week 9

Class 9: Tue. Mar. 17, 2026

Network Dynamics and Reconstruction — (Klein)

  • Dynamics on networks as computational objects (simulation and inference).

  • Reconstruction problems: partial observation, missing edges, and temporal evidence.

  • Connecting mechanistic models to data and to evaluation.

Friday, March 20, 2026 — Assignment 4 announced

Week 10

Class 10: Tue. Mar. 24, 2026

Games on Networks and Agent-Based Models — (Klein)

  • Games on networks: strategic interaction with topology as structure.

  • Agent-based models on networks: design patterns, debugging, and interpretation.

  • What “mechanism” buys you (and what it doesn’t) in network settings.


Week 11

Class 11: Tue. Mar. 31, 2026

Spectral Methods — (Klein)

  • Laplacians, eigenvectors, and what spectra say about structure.

  • Spectral clustering (conceptual and computational view).

  • Spectral ideas as “tools you can reuse” across network tasks.

Friday, April 3, 2026 — Assignment 4 due

Week 12

Class 12: Tue. Apr. 7, 2026

Motifs and Signed Networks — (Klein)

  • Motifs: counting, null models, and what “significance” really means.

  • Signed networks: balance, structure, and analysis tools for positive/negative ties.


Week 13

Class 13: Tue. Apr. 14, 2026

Flexible Topics / Tooling Comparisons — (Both)

  • Student-driven topics based on project needs and open questions from the semester.

  • Tooling comparisons and practical workflow choices (when/why to use what).


Week 14

Class 14: Tue. Apr. 21, 2026

Final Project Presentations — (Both)

  • Final project paper + repository due

  • Project presentations + feedback.

  • Synthesis and wrap-up.