Analyze & Collaborate
Build Reproducible Analysis Workflows
Analysis and collaboration transform data into knowledge—but only if your work can be understood, verified, and built upon. This phase covers practices for creating transparent, reproducible analyses and collaborating effectively with your research team.
What you’ll find here: Approaches to computational reproducibility, tools for literate programming, strategies for code documentation, and platforms for collaborative workflows.
- Introduction to R (3h) - Statistical programming fundamentals
- Version Control with Git (1.5h) - Git within RStudio
- Collaborative GitHub (1h) - Team workflows
- Writing Readable Code (45 min) - Clean code practices
- Introduction to Quarto (2h) - Literate programming
- R Package Management with renv (1h) - Manage dependencies
- Git Branching and Merging (1h) - Advanced Git workflows (optional)
Core Analysis Activities
Reproducible analysis requires attention to multiple aspects of your workflow.
Computational Workflows
Creating traceable analyses:
- Scripted workflows (not point-and-click)
- Automated pipelines
- Environment management
- Dependency tracking
Code Documentation
Making code understandable:
- Inline comments explaining decisions
- Function documentation
- Workflow documentation (README)
- Analysis narratives
Collaboration
Working effectively with teams:
- Shared code repositories
- Code review processes
- Collaborative writing platforms
- Communication tools
Quality Assurance
Ensuring analysis quality:
- Code testing and validation
- Peer code review
- Reproducibility checks
- Statistical consulting
Reproducible Analysis Frameworks
Different research contexts require different reproducibility approaches.
Computational reproducibility ensures others can rerun your analysis and obtain the same results.
Core principles:
Scripted Workflows
- Avoid manual data manipulation
- Use scripts instead of point-and-click
- Document all analysis steps
Environment Control
- Track software versions
- Manage package dependencies
- Use containers when appropriate
Organized Structure
- Consistent project organization
- Relative (not absolute) paths
- Clear file naming
Random Seeds
- Set seeds for random processes
- Document stochastic procedures
- Enable exact replication
Inline Comments
Comments explain why you made decisions, not just what the code does.
What to comment:
- Rationale for methodological choices
- Explanation of complex algorithms
- Context for non-obvious code
- Known limitations or workarounds
- Citations to methods or papers
What not to comment:
- Obvious code
- Outdated information
- Sensitive information
README Files
README files are essential for understanding how to use your code repository.
Required elements:
- Project title and description
- How to run the analysis: Step-by-step instructions
- Software requirements: Language versions, dependencies
- Input data: Where to obtain or how to access
- Output: What files are produced
- Project structure: Explanation of directories
Example:
# Analysis of Treatment Effects
## Requirements
- R version 4.3+
- Packages listed in renv.lock
## Running the Analysis
1. Install dependencies: `renv::restore()`
2. Run scripts in order: 01_preprocessing.R, 02_analysis.RLiterate programming combines code, results, and narrative text in a single document.
Introduction to Quarto (2.5h) - Create reproducible documents combining code, results, and prose.
Common tools:
Computational notebooks:
- Jupyter Notebooks (Python, R, Julia)
- R Markdown / Quarto
- Observable (JavaScript)
Benefits:
- Results auto-update with code changes
- Combine analysis and reporting
- Output to multiple formats
- Facilitate reproducibility
What to include in analysis notebooks:
- Introduction: Research question
- Methods: Statistical procedures
- Data loading: Source and preprocessing
- Main analysis: Tests and models
- Results: Tables and figures
- Session info: Software versions
Git Best Practices
Version control is essential for tracking changes to analysis code.
Key practices:
- Commit frequently with logical changesets
- Write meaningful commit messages
- Never commit large data files
- Never commit sensitive information
- Use branches for experimental analyses
- Tag releases for publications
Good commit message:
Add power analysis for sample size justification
- Implemented simulation-based power calculation
- Tested effect sizes (d = 0.3, 0.5, 0.8)
Collaboration & Review
Collaboration workflows:
- Centralized workflow: Simple, one main branch
- Feature branch workflow: Each analysis on separate branch
- Pull request review: Review before merging
Code review checklist:
What reviewers check:
- Functionality and correctness
- Reproducibility
- Readability and organization
- Best practices
Analysis Planning
Plan analyses before looking at outcome data to avoid biases.
Analysis plans should specify:
- Statistical tests to be used
- Variables and transformations
- Handling of missing data
- Outlier criteria and treatment
- Multiple comparisons corrections
Exploratory vs. confirmatory:
- Confirmatory: Tests specified in preregistration
- Exploratory: Additional analyses not prespecified
- Clearly label which is which
Quality Assurance
Code quality practices:
- Follow style guides (Tidyverse for R, PEP 8 for Python)
- Use automated styling tools (styler, black, lintr)
- Consistent naming conventions
- Descriptive variable names
- Limit line length (80-100 characters)
Statistical consulting:
Consider consulting for complex designs, advanced methods, or results interpretation.
StaBLab (LMU Statistical Consulting Unit) provides expert guidance on statistical analyses.
- Contact: kontakt@stablab.stat.uni-muenchen.de
- Services: Analysis planning, method selection, power analysis
Research Integrity & Avoiding Bias
Questionable research practices (QRPs) can inflate false positive rates and reduce reproducibility.
Common QRPs to avoid:
- p-hacking: Running multiple analyses and reporting only significant ones
- HARKing: Hypothesizing After Results are Known
- Selective reporting: Omitting null results or failed experiments
- Optional stopping: Stopping data collection when results become significant
Prevention strategies:
- Preregister hypotheses and analysis plans
- Distinguish confirmatory from exploratory analyses
- Report all conducted analyses
- Use Registered Reports format
Resources:
- p-Hacking interactive demo - See how easy it is to find “significant” results
- Big Little Lies Shiny App - Simulation of p-hacking strategies by Angelika Stefan & Felix Schönbrodt
For detailed guidance on avoiding bias, see the PRO Initiative guidelines for making analyses public.
Tools & Resources
Statistical Software
Open-source tools for transparent, reproducible statistical analysis:
R / RStudio
Statistical programming environment
Tutorial availableRead more
Free, open-source language for statistics and data science. Extensive package ecosystem. RStudio IDE provides integrated development environment.
Python
General-purpose programming with data science libraries
Read more
Powerful libraries like NumPy, pandas, scikit-learn. Popular in machine learning and computational research.
JASP
GUI for statistical analysis
Read more
Free, open-source alternative to SPSS. Bayesian and frequentist analyses. Produces reproducible output.
- R for Data Science - Free online book by Hadley Wickham
- Tidy Data - Foundational paper on data organization
- Swirl - Interactive R learning within RStudio
- RStudio Cheat Sheets - Quick reference guides
Computational Notebooks
Tools for combining code, output, and narrative in reproducible documents:
Quarto
Reproducible documents combining code and narrative
Tutorial availableRead more
Create reproducible manuscripts, presentations, and reports. Supports R, Python, Julia. Renders to multiple formats including HTML, PDF, and Word.
Jupyter Notebooks
Interactive computational notebooks
Read more
Web-based notebooks combining code, output, and markdown. JupyterLab provides full IDE. Share via JupyterHub or export formats.
Observable
JavaScript notebooks for data visualization
Read more
Reactive notebooks with powerful visualization libraries. Ideal for interactive data exploration and communication.
Collaborative Platforms
GitHub
Code hosting and collaboration
Tutorial availableRead more
Public and private repositories. Pull request workflows. GitHub Actions for automation. Issue tracking and project boards.
LRZ GitLab
Institutional Git hosting
Supported at LMURead more
Private repositories for LMU research. CI/CD pipelines. Integrated issue tracking and code review.
OSF
Research project management
Read more
Combines storage, version control, and collaboration. Preregistration support. DOIs for projects and components.
Environment Management
Tools to ensure your code runs the same way everywhere:
renv
R package management
Tutorial availableRead more
Isolate package dependencies per project. Create reproducible R environments. Works with RStudio projects.
Conda
Python environment manager
Read more
Manage Python packages and environments. Cross-platform. Includes scientific computing packages.
Docker
Containerization for full reproducibility
Read more
Package entire computational environment. Ensures exact software versions. Ideal for complex workflows.
Cloud Computing Environments
Avoid the “Works on My Machine” error by running code in cloud environments:
Posit Cloud
RStudio in the browser
Read more
Run R and RStudio entirely in the cloud. Share projects with collaborators. Free tier available for teaching and small projects.
Code Ocean
Computational reproducibility platform
Read more
Create reproducible “compute capsules” with code, data, and environment. Supports R, Python, Julia, and more. DOIs for computational workflows.
Google Colab
Free Jupyter notebooks with GPU
Read more
Run Python notebooks in Google’s cloud. Free GPU/TPU access for machine learning. Easy sharing via Google Drive.
Writing & Communication
Tools for collaborative writing and team communication:
Overleaf
Collaborative LaTeX editor in the browser
Read more
Real-time collaboration, track changes, comments, and version history. Free tier available. Many journal templates included.
PaperHive
Collaborative annotation and discussion
Read more
Annotate and discuss research papers collaboratively. Works with PDFs and supports public or private discussions.
LMU Chat (Matrix)
Decentralized, secure team messaging
Supported at LMURead more
Open-source, federated messaging protocol. LMU provides Matrix hosting for secure, GDPR-compliant team communication with channels, direct messages, and file sharing.
LRZ Compute Cloud provides computational resources for data-intensive analyses.
- Services: Virtual machines, high-performance computing, storage
- Access: Available to LMU researchers
- Support: LRZ Service Desk
Analysis & Collaboration Checklist
Before Starting Analysis:
During Analysis:
Before Sharing: