Training Resources
This page compiles pre-reading, tools, and further reading for each confirmed session in the SICSS-Melbourne 2026 program. Use it to prepare before each day and to explore topics that interest you after the institute.
General Resources
-
Bit by Bit: Social Research in the Digital Age — Matthew J. Salganik, 2018. A comprehensive, open-access introduction to computational social science covering surveys, experiments, mass collaboration, and ethics. Book
-
Computational Social Science — David Lazer et al., 2009. The foundational paper arguing for a field that leverages large-scale digital data to understand human behaviour. Paper
-
National Statement on Ethical Conduct in Human Research — NHMRC / ARC / Universities Australia, 2023 (updated 2025). Australia's core framework for research ethics, essential for anyone working with human or digital trace data. Guide
-
Computational Analysis of Communication — Wouter van Atteveldt, Damian Trilling & Carlos Arcila, 2022. An open-access textbook with R and Python code covering text, network, and image analysis. It is not expected of the participants to be familiar with R or Python, but it is advisable to explore what they are and some foundational terms. Book
-
R for Data Science — Hadley Wickham & Garrett Grolemund, 2nd ed. A practical, free introduction to data science with R and the tidyverse for researchers with little coding experience. Another resource that would be useful as an introduction into the key aspects of working with data using R.Tutorial
-
Introduction to Cultural Analytics & Python — Melanie Walsh, 2021. A free online textbook covering Python basics, text analysis, and social media data specifically for humanities and social science scholars. Tutorial
Session-Related Resources
11:00–12:30 | Keynote Dialogue
What is Computational Social Science and Why It Matters in Australia?
Pre-reading
Tools & Platforms
Further reading
15:15–16:30 | Keynote
Social Bias in Computational Social Science
Pre-reading
Further reading
09:00–10:30 | Panel
Ethics in Computational Social Science
Pre-reading
Tools & Platforms
Further reading
11:00–12:30 | Workshop
Data Donations and Participant-Centric Research
Pre-reading
Tools & Platforms
Further reading
15:00–16:00 | Talk
Nectar Research Cloud
Pre-reading
Tools & Platforms
Further reading
09:00–10:30 | Panel
Does Computational Social Science Lack Theory?
Pre-reading
Further reading
11:00–12:30 | Talk
The AIReD Platform for Australia-wide Social Media Discovery and Usage
Pre-reading
Tools & Platforms
13:30–15:00 | Workshop
Collecting and Analysing Data Download Packages
Pre-reading
Tools & Platforms
- Port — Open-source framework for locally processing donated data download packages.
- Data Donation (datadonation.eu) — Resources on best practices for requesting and using DDPs from major platforms.
Further reading
15:30–17:00 | Workshop
Working with Text Using Computational Techniques
Pre-reading
Tools & Platforms
- tidytext (R) — R package for tidy text mining workflows.
- quanteda (R) — Comprehensive R package for quantitative text analysis.
- spaCy (Python) — Industrial-strength NLP library for tokenisation, NER, and text processing.
Further reading
09:00–10:30 | Workshop
Screen Capture for Data Collection
Pre-reading
- MOAT — Description of the Australian Internet Observatory's Mobile Ad Observatory Toolkit (MOAT) and relevant work conducted.Guide
Tools & Platforms
- AIO Mobile Screen Capture Tools — Research-grade mobile and browser extension tools for capturing personalised platform content (ads, feeds, recommendations).
Further reading
11:00–12:30 | Workshop
Using LLMs to Create Data Analysis Pipelines for Text-as-Data Research
Pre-reading
Tools & Platforms
- quallmer (GitHub) — R package for structured LLM-assisted coding, validation, and audit trails in text-as-data research. Setup
- quallmer.app — Interactive Shiny companion app for manual coding, reviewing LLM annotations, and computing agreement metrics.
- ellmer (R) — Backend package for connecting to multiple LLM providers (OpenAI, Anthropic, Google, Ollama).
Further reading
13:30–14:30 | Workshop
RAG 101
Pre-reading
Tools & Platforms
- LangChain — Popular Python framework for building RAG pipelines, with pre-built modules for document loading, embedding, and retrieval. Setup
- LlamaIndex — Framework for connecting LLMs to external data sources, purpose-built for RAG applications.
Further reading
14:45–16:15 | Workshop
Image Analysis for Qualitative and Quantitative Research
Pre-reading
Tools & Platforms
- Image Machine — Open-source tool for clustering visually similar images using machine vision embeddings and identifying visual patterns in large datasets.
- UMAP — Uniform Manifold Approximation and Projection for dimensionality reduction, used for 2D visualisation of image similarity.
Further reading
09:00–10:30 | Panel
Cross-Disciplinary Collaboration: Bringing Social Science and Computational Analysis Together
Pre-reading
Further reading
Wed 1 Jul · 09:00 | Workshop
Validation in Computational Social Science
Pre-reading
Tools & Platforms
- irr (R) — R package for computing inter-rater reliability statistics including Cohen's kappa and Krippendorff's alpha.
Further reading
Thematic Index
Foundations & Theory
This theme covers the intellectual history, definitions, and core theoretical debates in computational social science. Participants will explore the origins of CSS, discuss the relationship between prediction and explanation, and examine critiques regarding whether CSS lacks theory. Additionally, resources under this theme address key issues of validation and research reproducibility in computational studies.
- What is CSS and Why It Matters in Australia — Day 1
- Does CSS Lack Theory? — Day 3
- Validation in CSS — Week 2
Key resources:
Lazer et al. (2009),
Bit by Bit,
Hofman et al. (2021)
Ethics & Research Design
Computational social science introduces unique ethical challenges and data quality concerns that go beyond traditional social research. This theme focuses on identifying and mitigating algorithmic and data biases, navigating ethical review processes under Australian and international guidelines, and implementing participant-centric methodologies such as privacy-preserving data donations. Students will learn how to align their research designs with the FAIR and CARE principles.
- Bias in CSS — Day 1
- Ethics in CSS — Day 2
- Data Donations and Participant-Centric Research — Day 2
Key resources:
Olteanu et al. (2019),
NHMRC National Statement,
FAIR Principles
Data Collection Methods
This theme introduces practical techniques and infrastructure for gathering digital trace data. Resources cover the use of API-based dashboards (specifically the AIReD platform), methodologies for collecting and processing user-donated data download packages, and deployed screen capture solutions for observing personalized algorithmic feeds. It also covers the setup and use of the Nectar Research Cloud for scaling data collection.
- The AIReD Platform — Day 3
- Collecting and Analysing Data Download Packages — Day 3
- Screen Capture for Data Collection — Day 4
- Nectar Research Cloud — Day 2
Key resources:
AIReD,
Port,
Nectar Cloud
Computational Analysis (Text, Images, LLMs)
Once data is collected, computational techniques are required to process and analyze it at scale. This theme spans quantitative and qualitative text analysis (using tidy workflows and R/Python NLP libraries), the deployment of large language models for text-as-data annotation, retrieval-augmented generation (RAG) pipelines, and computer vision methodologies for pattern discovery in large image datasets.
- Working with Text Using Computational Techniques — Day 3
- Using LLMs for Text-as-Data Research (Quallmer) — Day 4
- RAG 101 — Day 4
- Image Analysis for Qualitative and Quantitative Research — Day 4
Key resources:
Text Mining with R,
quallmer,
Lewis et al. (2020) RAG paper,
Image Machine (QUT DMRC)
Careers, Collaboration & Publishing
This theme focuses on the professional development of computational social scientists. It provides guides and advice on building successful interdisciplinary collaborations that bridge computer science and social research, working with industry and public policy partners, writing grants for CSS projects, and demystifying the publishing process in cross-disciplinary journals.
- Cross-Disciplinary Collaboration — Day 5
- Working With and In the Industry — Day 5
- Career Success — Day 5
- Grant Writing in CSS — Day 5
- Demystifying Publishing in CSS — Day 2
Key resources:
Bromham et al. (2016),
Lazer et al. (2020)

The Australian Internet Observatory (https://doi.org/10.25956/twvn-ca19) is a co-investment partnership with RMIT University, QUT, University of Queensland, University of Melbourne, Swinburne University, Deakin University and the Australian Research Data Commons (ARDC) through the HASS and Indigenous Research Data Commons (DOI:10.3565/hjrp-b141). The ARDC is enabled by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS).