Training Resources

This page compiles pre-reading, tools, and further reading for each confirmed session in the SICSS-Melbourne 2026 program. Use it to prepare before each day and to explore topics that interest you after the institute.

General Resources

Bit by Bit: Social Research in the Digital Age — Matthew J. Salganik, 2018. A comprehensive, open-access introduction to computational social science covering surveys, experiments, mass collaboration, and ethics. Book
Computational Social Science — David Lazer et al., 2009. The foundational paper arguing for a field that leverages large-scale digital data to understand human behaviour. Paper
National Statement on Ethical Conduct in Human Research — NHMRC / ARC / Universities Australia, 2023 (updated 2025). Australia's core framework for research ethics, essential for anyone working with human or digital trace data. Guide
Computational Analysis of Communication — Wouter van Atteveldt, Damian Trilling & Carlos Arcila, 2022. An open-access textbook with R and Python code covering text, network, and image analysis. It is not expected of the participants to be familiar with R or Python, but it is advisable to explore what they are and some foundational terms. Book
R for Data Science — Hadley Wickham & Garrett Grolemund, 2nd ed. A practical, free introduction to data science with R and the tidyverse for researchers with little coding experience. Another resource that would be useful as an introduction into the key aspects of working with data using R.Tutorial
Introduction to Cultural Analytics & Python — Melanie Walsh, 2021. A free online textbook covering Python basics, text analysis, and social media data specifically for humanities and social science scholars. Tutorial

Session-Related Resources

Day 1 — Introduction to Computational Social Science · Monday, 22 June

11:00–12:30 | Keynote Dialogue

What is Computational Social Science and Why It Matters in Australia?

Pre-reading

Computational Social Science — Lazer et al., 2009. The landmark essay that defined the field and its potential for studying society at scale. Paper
Bit by Bit — Chapter 1: Introduction — Salganik, 2018. An accessible overview of how digital data is transforming social research. Book

Tools & Platforms

SICSS Website — Hub for all Summer Institutes in Computational Social Science worldwide.
Australian Internet Observatory (AIO) — Australia-wide research infrastructure for digital platform observability.

Pre-reading

Reflecting on Social Bias: Challenges and Opportunities for Computational Social Science — Kathirgamalingam et al., 2025. The keynote speaker's forthcoming paper on social bias in CSS. Preprint
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries — Olteanu et al., 2019. A comprehensive survey of biases arising from digital data sources, collection strategies, and analytical methods. Paper
Bit by Bit — Chapter 2: Observing Behavior — Salganik, 2018. Covers limitations and biases inherent in using digital trace data for research. Book

Day 2 — Working with Data: Ethics and Practices · Tuesday, 23 June

09:00–10:30 | Panel

Ethics in Computational Social Science

Pre-reading

Bit by Bit — Chapter 6: Ethics — Salganik, 2018. A practical guide to ethical considerations in digital-age research, including informed consent and privacy. Book
National Statement on Ethical Conduct in Human Research — NHMRC, 2023. Australia's authoritative framework for research involving human participants. Guide

Tools & Platforms

FAIR Principles — Guidelines for making research data Findable, Accessible, Interoperable, and Reusable.
CARE Principles for Indigenous Data Governance — Framework centred on Collective benefit, Authority to control, Responsibility, and Ethics.

Pre-reading

A framework for privacy preserving digital trace data collection through data donation — Boeschoten et al., 2022. Introduces data donation as a methodology, covering design principles, participant experience, and infrastructure. Paper

Tools & Platforms

Data Donation (datadonation.eu) — European research hub for data donation methodology, tools, and projects.
Australian Internet Observatory (AIO) Data Donation System — A demo version of AIO's data donation platform demonstrating current functionality.

Pre-reading

ARDC Nectar Research Cloud — ARDC Page — ARDC. Introduction to Australia's national research cloud, what it offers, and who can use it. Guide

Tools & Platforms

Nectar Research Cloud Dashboard — The main portal for accessing virtual machines and cloud compute.
Nectar Cloud Tutorials — Step-by-step guides for getting started with instances, storage, and networking.

Day 3 — Data Collection and Working Across Disciplines · Wednesday, 24 June

09:00–10:30 | Panel

Does Computational Social Science Lack Theory?

Pre-reading

Preface: Big Data Is Not About the Data! — Gary King, 2016. A provocative argument that CSS suffers from an overemphasis on methodology at the expense of substantive theory. Paper
Integrating Explanation and Prediction in Computational Social Science — Hofman et al., 2021. Argues for balancing predictive accuracy with explanatory power in CSS research. Paper

Pre-reading

Australian Internet Observatory — About — Overview of the AIO's mission and the platforms it covers. Guide

Tools & Platforms

AIReD — Australian Internet Research Dashboard — Platform with 500M+ social media posts from BlueSky, Mastodon, YouTube, GDELT, and historic X/Twitter data. Setup (requires AAF institutional login)

13:30–15:00 | Workshop

Collecting and Analysing Data Download Packages

Pre-reading

A framework for privacy preserving digital trace data collection through data donation — Boeschoten et al., 2022. Covers design principles and infrastructure for data download package studies. Paper

Tools & Platforms

Port — Open-source framework for locally processing donated data download packages.
Data Donation (datadonation.eu) — Resources on best practices for requesting and using DDPs from major platforms.

Pre-reading

Text Mining with R: A Tidy Approach — Silge & Robinson, 2017. Free online book covering tokenisation, sentiment analysis, and topic modelling in R. Book
Computational Analysis of Communication — Ch. 10-12 — van Atteveldt et al., 2022. Covers text as data from preprocessing to supervised and unsupervised methods. Book

Tools & Platforms

tidytext (R) — R package for tidy text mining workflows.
quanteda (R) — Comprehensive R package for quantitative text analysis.
spaCy (Python) — Industrial-strength NLP library for tokenisation, NER, and text processing.

Day 4 — Tools and Approaches to Data Analysis · Thursday, 25 June

09:00–10:30 | Workshop

Screen Capture for Data Collection

Pre-reading

MOAT — Description of the Australian Internet Observatory's Mobile Ad Observatory Toolkit (MOAT) and relevant work conducted.Guide

Tools & Platforms

AIO Mobile Screen Capture Tools — Research-grade mobile and browser extension tools for capturing personalised platform content (ads, feeds, recommendations).

Pre-reading

PREPARATION steps for the session on ”Using LLMs to Create Data Analysis Pipelines for Text-as-Data Research” - Use this to setup and prepare your computer for the hands-on component of the workshop.
quallmer: AI-Assisted Qualitative Data Analysis — Maerz & Benoit, 2025. CRAN page for the R package that enables codebook-based, LLM-assisted text coding with built-in replication and validation. Tool
Can ChatGPT Replace Manual Annotation? — Gilardi et al., 2023. Assesses GPT-4's accuracy as a text annotator compared to crowd workers across multiple tasks. Paper

Tools & Platforms

quallmer (GitHub) — R package for structured LLM-assisted coding, validation, and audit trails in text-as-data research. Setup
quallmer.app — Interactive Shiny companion app for manual coding, reviewing LLM annotations, and computing agreement metrics.
ellmer (R) — Backend package for connecting to multiple LLM providers (OpenAI, Anthropic, Google, Ollama).

Pre-reading

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al., 2020. The seminal paper introducing the RAG framework that combines retrieval with generation. Paper

Pre-reading

Computational Analysis of Communication — Ch. 14: Multimedia Analysis — van Atteveldt et al., 2022. Introduces computational image analysis concepts and methods for communication researchers. Book

Tools & Platforms

Image Machine — Open-source tool for clustering visually similar images using machine vision embeddings and identifying visual patterns in large datasets.
UMAP — Uniform Manifold Approximation and Projection for dimensionality reduction, used for 2D visualisation of image similarity.

Day 5 — Disciplines, Careers, and Industry · Friday, 26 June

09:00–10:30 | Panel

Cross-Disciplinary Collaboration: Bringing Social Science and Computational Analysis Together

Pre-reading

Interdisciplinary research has consistently lower funding success — Bromham et al., 2016. Empirical analysis of why interdisciplinary work is both valuable and difficult, with practical strategies for success. Paper

Week 2 — Collaborative Research Projects · Wed 1 July

Tue 30 Jun · 10:00 | Workshop

Music Score Analysis through Natural Language Interfaces

Preparation Materials

PREPARATION - Connecting the Encoding Music MCP Server

Wed 1 Jul · 09:00 | Workshop

Validation in Computational Social Science

Pre-reading

PREPARATION - Wikibase Preparation Materials for 1 Jul - Francesco Bailo session
Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts — Grimmer & Stewart, 2013. Foundational discussion of validation strategies including human evaluation, replication, and inter-coder reliability. Paper

Tools & Platforms

irr (R) — R package for computing inter-rater reliability statistics including Cohen's kappa and Krippendorff's alpha.

Thematic Index

Foundations & Theory

This theme covers the intellectual history, definitions, and core theoretical debates in computational social science. Participants will explore the origins of CSS, discuss the relationship between prediction and explanation, and examine critiques regarding whether CSS lacks theory. Additionally, resources under this theme address key issues of validation and research reproducibility in computational studies.

What is CSS and Why It Matters in Australia — Day 1
Does CSS Lack Theory? — Day 3
Validation in CSS — Week 2

Key resources: Lazer et al. (2009), Bit by Bit, Hofman et al. (2021)

Ethics & Research Design

Computational social science introduces unique ethical challenges and data quality concerns that go beyond traditional social research. This theme focuses on identifying and mitigating algorithmic and data biases, navigating ethical review processes under Australian and international guidelines, and implementing participant-centric methodologies such as privacy-preserving data donations. Students will learn how to align their research designs with the FAIR and CARE principles.

Bias in CSS — Day 1
Ethics in CSS — Day 2
Data Donations and Participant-Centric Research — Day 2

Key resources: Olteanu et al. (2019), NHMRC National Statement, FAIR Principles

Data Collection Methods

This theme introduces practical techniques and infrastructure for gathering digital trace data. Resources cover the use of API-based dashboards (specifically the AIReD platform), methodologies for collecting and processing user-donated data download packages, and deployed screen capture solutions for observing personalized algorithmic feeds. It also covers the setup and use of the Nectar Research Cloud for scaling data collection.

The AIReD Platform — Day 3
Collecting and Analysing Data Download Packages — Day 3
Screen Capture for Data Collection — Day 4
Nectar Research Cloud — Day 2

Key resources: AIReD, Port, Nectar Cloud

Computational Analysis (Text, Images, LLMs)

Once data is collected, computational techniques are required to process and analyze it at scale. This theme spans quantitative and qualitative text analysis (using tidy workflows and R/Python NLP libraries), the deployment of large language models for text-as-data annotation, retrieval-augmented generation (RAG) pipelines, and computer vision methodologies for pattern discovery in large image datasets.

Working with Text Using Computational Techniques — Day 3
Using LLMs for Text-as-Data Research (Quallmer) — Day 4
RAG 101 — Day 4
Image Analysis for Qualitative and Quantitative Research — Day 4

Key resources: Text Mining with R, quallmer, Lewis et al. (2020) RAG paper, Image Machine (QUT DMRC)

Careers, Collaboration & Publishing

This theme focuses on the professional development of computational social scientists. It provides guides and advice on building successful interdisciplinary collaborations that bridge computer science and social research, working with industry and public policy partners, writing grants for CSS projects, and demystifying the publishing process in cross-disciplinary journals.

Cross-Disciplinary Collaboration — Day 5
Working With and In the Industry — Day 5
Career Success — Day 5
Grant Writing in CSS — Day 5
Demystifying Publishing in CSS — Day 2

Key resources: Bromham et al. (2016), Lazer et al. (2020)

ARC Centre for Automated Decision Making and Society

Australian Research Data Commons Logo

The Australian Internet Observatory (https://doi.org/10.25956/twvn-ca19) is a co-investment partnership with RMIT University, QUT, University of Queensland, University of Melbourne, Swinburne University, Deakin University and the Australian Research Data Commons (ARDC) through the HASS and Indigenous Research Data Commons (DOI:10.3565/hjrp-b141). The ARDC is enabled by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS).

SICSS-Melbourne

Training Resources

General Resources

Session-Related Resources

Day 1 — Introduction to Computational Social Science · Monday, 22 June

Pre-reading

Tools & Platforms

Further reading

Pre-reading

Further reading

Day 2 — Working with Data: Ethics and Practices · Tuesday, 23 June

Pre-reading

Tools & Platforms

Further reading

Pre-reading

Tools & Platforms

Further reading

Pre-reading

Tools & Platforms

Further reading

Day 3 — Data Collection and Working Across Disciplines · Wednesday, 24 June

Pre-reading

Further reading

Pre-reading

Tools & Platforms

Pre-reading

Tools & Platforms

Further reading

Pre-reading

Tools & Platforms

Further reading

Day 4 — Tools and Approaches to Data Analysis · Thursday, 25 June

Pre-reading

Tools & Platforms

Further reading

Pre-reading

Tools & Platforms

Further reading

Pre-reading

Further reading

Pre-reading

Tools & Platforms

Further reading

Day 5 — Disciplines, Careers, and Industry · Friday, 26 June

Pre-reading

Further reading

Week 2 — Collaborative Research Projects · Wed 1 July

Preparation Materials

Pre-reading

Tools & Platforms

Further reading

Thematic Index

Foundations & Theory

Ethics & Research Design

Data Collection Methods

Computational Analysis (Text, Images, LLMs)

Careers, Collaboration & Publishing

Host a Location