Collect & Manage

Implement FAIR data management, organize research outputs, maintain version control, and ensure quality

Organize Your Research Workflow

Data collection and management aren’t just administrative tasks—they’re the foundation of reproducible research. This phase covers practices for capturing, organizing, documenting, and maintaining research outputs throughout your project.

What you’ll find here: Strategies for FAIR data management, tools for version control and documentation, and approaches to organizing research materials effectively.

Download RDM Checklist View Best Practices Guide

Learn More: Data Management & Sharing

FAIR Data Management (2h) - Managing data following FAIR principles
Data Documentation & Validation in R (2h) - Create codebooks, validate data, and ensure quality
Data Sharing (1h) - Lecture on sharing research data
Report Detailed Methods & Protocols (30 min) - ReproducibiliTeach lecture
Write Reusable Protocols (30 min) - ReproducibiliTeach lecture
RDMkit - Discipline-specific data management guidance

Core Data Management Activities

Effective data management involves multiple interconnected practices working together.

Organization & Storage

Common practices:

Consistent file naming conventions
Logical directory structures
Regular backups (3-2-1 rule)
Secure storage solutions

Documentation

Researchers often maintain:

Electronic lab notebooks
Data dictionaries and codebooks
README files for datasets
Metadata standards

Version Control

Essential for tracking:

Code and script changes
Document revisions
Collaborative workflows
Analysis reproducibility

Quality & Security

Maintaining integrity through:

Quality control procedures
Data validation checks
Access controls
Anonymization protocols

Data Management Approaches

Different aspects of data management require different tools and strategies.

Learn More: FAIR Data Management

FAIR Data Management (2h) - Learn how to organize, document, and manage research data following FAIR principles.

Applying FAIR principles ensures your data is useful beyond your immediate project.

The FAIR Framework:

Findable

Persistent identifiers (DOIs)
Rich metadata descriptions
Searchable repositories

Accessible

Standard retrieval protocols
Clear access procedures
Metadata remains accessible

Interoperable

Standard data formats
Controlled vocabularies
Common terminologies

Reusable

Rich documentation
Clear usage licenses
Provenance information

Naming Conventions

Consistent file naming makes data findable and understandable.

Best practices:

Use descriptive, meaningful names
Include dates (YYYY-MM-DD format)
Avoid spaces (use underscores or hyphens)
Use version numbers (v01, v02)
Keep names short but informative

Example:

2025-01-15_experiment-A_participant-001_v02.csv

Directory Structure & Backups

Organization patterns:

By data type (raw/, processed/, analyzed/)
By date or session (2025-01/, 2025-02/)
By subject or condition (control/, treatment/)

3-2-1 Backup Rule:

3 copies of your data (original + 2 backups)
2 different media types (local drive + cloud)
1 copy stored off-site (cloud or separate location)

Warning

Never rely on a single device for your only copy of research data.

README Files

README files are essential for understanding your data and code.

What to include:

Project title and description
Authors and date of collection
File organization explanation
Data collection methods
Variable definitions
Known issues or limitations

Tip

Write README files for your future self—you’ll return months or years later.

Data Dictionaries

Data dictionaries define all variables in your dataset.

Essential elements:

Variable name (as it appears in data)
Full variable label/description
Data type (numeric, string, date)
Allowed values or ranges
Units of measurement
Missing data codes

Example:

Variable: age_years
Description: Participant age at data collection
Type: Numeric
Range: 18-65
Units: Years
Missing: -99

Learn More: Data Dictionaries in R

Data Documentation & Validation in R (2h) - Create codebooks automatically, validate data against expectations, and ensure data quality using R packages like codebook and pointblank.

Version control systems track changes to files over time, enabling collaboration and reproducibility.

Learn More: Version Control with Git

Introduction to Git and GitHub (3h) - Learn version control fundamentals for research projects.

Born Open Data

Data that is automatically uploaded to a repository (e.g., GitHub) including timestamps and automatically generated logs is called born open.

Advantages:

Full openness and transparency from the start
Built-in data management through version control
Simplified data sharing at publication
Complete audit trail of changes

Resources:

Rouder (2016) - The what, why, and how of born-open data
Rouder, Haaf & Snyder (2018) - Minimizing Mistakes In Psychological Science

Tip

Born-open workflows work best with non-sensitive data. For human participant data, consider pseudonymization before automatic uploads.

Why use version control:

Track all changes to files
Revert to previous versions
Collaborate without conflicts
Maintain parallel versions

Document why changes were made
Backup all project history
Share code reliably
Enable reproducibility

Common workflows:

Code and scripts: Track all analysis code
Documentation: Version control for manuscripts, protocols
Small data files: Track metadata, data dictionaries
Configuration files: Manage software parameters

Warning

Large data files (>100MB) should not be stored directly in Git. Use Git LFS or data repositories instead.

Quality Control

Quality control identifies errors and issues before analysis.

Common QC checks:

Completeness: Check for missing data
Range checks: Verify values within expected ranges
Consistency: Check logical relationships
Duplicates: Identify duplicate records
Format: Ensure consistent data types

Document QC decisions:

Procedures and criteria
Cases flagged or excluded
Reasons for exclusions
Date and reviewer

Data Security & Anonymization

Access Control:

Use permission levels (read-only, read-write, admin)
Limit access to sensitive data
Remove access when collaborators leave
Use institutional authentication systems

Anonymization for human participant data:

Remove personally identifiable information (PII):

Names, addresses, contact information
Dates of birth, ages over 89
Geographic identifiers smaller than state
Social security numbers, medical records
Facial features in photographs

Anonymization approaches:

De-identification: Remove direct identifiers
Pseudonymization: Replace with codes
Aggregation: Report group-level data
Perturbation: Add noise to prevent re-identification

Important

Test anonymization on sample data first. Have a colleague review for remaining identifiers.

Tools & Resources

Documentation & Lab Notebooks

Tools for documenting research processes and maintaining lab records:

eLabFTW

Open-source electronic lab notebook

Free, self-hosted ELN with timestamping, templates, and database features. Ideal for documenting experiments and protocols.

Quarto

Reproducible documents combining code and narrative

Tutorial available

Create data collection protocols, README files, and documentation. Supports R, Python, Julia. Renders to multiple formats including HTML, PDF, and Word.

Jupyter Notebooks

Interactive computational notebooks

Combine code, output, and narrative text. Supports Python, R, Julia. Useful for documenting data processing workflows.

Version Control Systems

Platforms for tracking changes and collaborating on code and documents:

LRZ GitLab

Institutional Git hosting for LMU

Supported at LMU

Private repositories hosted by LRZ. Use LMU credentials. Suitable for active research projects.

GitHub

Popular Git hosting with collaboration features

Tutorial available

Free public repositories, GitHub Actions for automation, extensive integrations.

OSF

Research project management platform

Combines version control, storage, and collaboration. Integrates with GitHub, Dropbox, and other services.

Data Storage & Backup

Services for secure storage, synchronization, and backup of research data:

LRZ Sync+Share

LMU cloud storage service

Supported at LMU

50GB+ storage per user. GDPR-compliant. Desktop and mobile sync clients available.

LRZ DSS

Long-term archival storage

Supported at LMU

Tape-based backup for large datasets. Contact LRZ for access and quotas.

re3data

Registry of data repositories

Search to find repositories suited to your data type and discipline.

Anonymization Tools

Tools to help protect participant privacy while enabling data sharing:

Learn More: Data Anonymization

Maintaining Privacy with Open Data - Workshop by Ruben Arslan on anonymization strategies for behavioral research data.

Our Recommendation: Data Management Support at LMU

LMU University Library Data Management Services provides guidance on organizing and storing research data.

Contact: rdm@ub.uni-muenchen.de
Services: Storage recommendations, metadata guidance, repository selection, DMP support

Collect & Manage

Organize Your Research Workflow

Core Data Management Activities

Organization & Storage

Documentation

Version Control

Quality & Security

Data Management Approaches

Tools & Resources

Documentation & Lab Notebooks

eLabFTW

Quarto

Jupyter Notebooks

Version Control Systems

LRZ GitLab

GitHub

OSF

Data Storage & Backup

LRZ Sync+Share

LRZ DSS

re3data

Anonymization Tools

Amnesia

ARX

sdcMicro

Data Management Checklist