• Home
  • About
  • People
    • Management
    • Staff & HiWis
    • Scientific Board
    • Advisors
    • Members
    • Fellows
    • Alumnae
  • Partners
    • Institutional Members
    • Local Open Science Initiatives
    • Other LMU Support Services
    • External Partners
    • Funders
  • Training
  • Events
  1. Training Tracks
  2. Open Research Cycle
  3. Collect & Manage
  4. Website

Thanks for visiting the LMU Open Science Center–our site is still under construction for now, but we’ll be live soon!

  • Training Tracks
    • Self-Training
      • Principles
        • Credible Science
        • Replicability Crisis
      • Study Planning
        • Introduction to Data Simulations in R
        • Preregistration: Why and How?
        • Simulations for Advanced Power Analyses
      • Data Management
        • TBD: Data Anonymity
        • Data Dictionary
        • FAIR Data Management
        • Maintaining Privacy with Open Data
        • Introduction to Open Data
        • TBD: Generating Synthetic Data
      • Reproducible Processes
        • Advanced git
        • Collaborative coding with GitHub and RStudio
        • Introduction to Quarto
        • Introduction to R
        • Introduction to {renv}
        • Introduction to Version Control within RStudio
        • Introduction to Zotero
      • Publishing Outputs
        • Code Publishing
        • Open Access, Preprints, Postprints
    • Open Research Cycle
      • Plan & Design
        • Current
        • MI
        • Website
        • RL
      • Collect & Manage
        • Current
        • MI
        • Website
        • RL
      • Analyze & Collaborate
        • Current
        • MI
        • Website
        • RL
      • Preserve & Share
        • Current
        • MI
        • Website
        • RL
    • Train the Trainer

On this page

  • Core Data Management Activities
  • Data Management Approaches
  • Tools & Resources
    • Documentation & Lab Notebooks
    • Version Control Systems
    • Data Storage & Backup
    • Anonymization Tools
  • Data Management Checklist
  • Edit this page
  • Report an issue
  1. Training Tracks
  2. Open Research Cycle
  3. Collect & Manage
  4. Website

Collect & Manage

Implement FAIR data management, organize research outputs, maintain version control, and ensure quality

Organize Your Research Workflow

Data collection and management aren’t just administrative tasks—they’re the foundation of reproducible research. This phase covers practices for capturing, organizing, documenting, and maintaining research outputs throughout your project.

What you’ll find here: Strategies for FAIR data management, tools for version control and documentation, and approaches to organizing research materials effectively.

Download RDM Checklist View Best Practices Guide


NoteLearn More: Data Management & Sharing
  • FAIR Data Management (2h) - Managing data following FAIR principles
  • Data Documentation & Validation in R (2h) - Create codebooks, validate data, and ensure quality
  • Data Sharing (1h) - Lecture on sharing research data
  • Report Detailed Methods & Protocols (30 min) - ReproducibiliTeach lecture
  • Write Reusable Protocols (30 min) - ReproducibiliTeach lecture
  • RDMkit - Discipline-specific data management guidance

Core Data Management Activities

Effective data management involves multiple interconnected practices working together.

Organization & Storage

Common practices:

  • Consistent file naming conventions
  • Logical directory structures
  • Regular backups (3-2-1 rule)
  • Secure storage solutions

Documentation

Researchers often maintain:

  • Electronic lab notebooks
  • Data dictionaries and codebooks
  • README files for datasets
  • Metadata standards

Version Control

Essential for tracking:

  • Code and script changes
  • Document revisions
  • Collaborative workflows
  • Analysis reproducibility

Quality & Security

Maintaining integrity through:

  • Quality control procedures
  • Data validation checks
  • Access controls
  • Anonymization protocols

Data Management Approaches

Different aspects of data management require different tools and strategies.

NoteLearn More: FAIR Data Management

FAIR Data Management (2h) - Learn how to organize, document, and manage research data following FAIR principles.

  • FAIR Principles
  • File Organization
  • Documentation
  • Version Control
  • Quality & Security

Applying FAIR principles ensures your data is useful beyond your immediate project.

The FAIR Framework:

Findable

  • Persistent identifiers (DOIs)
  • Rich metadata descriptions
  • Searchable repositories

Accessible

  • Standard retrieval protocols
  • Clear access procedures
  • Metadata remains accessible

Interoperable

  • Standard data formats
  • Controlled vocabularies
  • Common terminologies

Reusable

  • Rich documentation
  • Clear usage licenses
  • Provenance information
Naming Conventions

Consistent file naming makes data findable and understandable.

Best practices:

  • Use descriptive, meaningful names
  • Include dates (YYYY-MM-DD format)
  • Avoid spaces (use underscores or hyphens)
  • Use version numbers (v01, v02)
  • Keep names short but informative

Example:

2025-01-15_experiment-A_participant-001_v02.csv
Directory Structure & Backups

Organization patterns:

  • By data type (raw/, processed/, analyzed/)
  • By date or session (2025-01/, 2025-02/)
  • By subject or condition (control/, treatment/)

3-2-1 Backup Rule:

  • 3 copies of your data (original + 2 backups)
  • 2 different media types (local drive + cloud)
  • 1 copy stored off-site (cloud or separate location)
Warning

Never rely on a single device for your only copy of research data.

README Files

README files are essential for understanding your data and code.

What to include:

  • Project title and description
  • Authors and date of collection
  • File organization explanation
  • Data collection methods
  • Variable definitions
  • Known issues or limitations
Tip

Write README files for your future self—you’ll return months or years later.

Data Dictionaries

Data dictionaries define all variables in your dataset.

Essential elements:

  • Variable name (as it appears in data)
  • Full variable label/description
  • Data type (numeric, string, date)
  • Allowed values or ranges
  • Units of measurement
  • Missing data codes

Example:

Variable: age_years
Description: Participant age at data collection
Type: Numeric
Range: 18-65
Units: Years
Missing: -99
NoteLearn More: Data Dictionaries in R

Data Documentation & Validation in R (2h) - Create codebooks automatically, validate data against expectations, and ensure data quality using R packages like codebook and pointblank.

Version control systems track changes to files over time, enabling collaboration and reproducibility.

NoteLearn More: Version Control with Git

Introduction to Git and GitHub (3h) - Learn version control fundamentals for research projects.

Born Open Data

Data that is automatically uploaded to a repository (e.g., GitHub) including timestamps and automatically generated logs is called born open.

Advantages:

  • Full openness and transparency from the start
  • Built-in data management through version control
  • Simplified data sharing at publication
  • Complete audit trail of changes

Resources:

  • Rouder (2016) - The what, why, and how of born-open data
  • Rouder, Haaf & Snyder (2018) - Minimizing Mistakes In Psychological Science
Tip

Born-open workflows work best with non-sensitive data. For human participant data, consider pseudonymization before automatic uploads.

Why use version control:

  • Track all changes to files
  • Revert to previous versions
  • Collaborate without conflicts
  • Maintain parallel versions
  • Document why changes were made
  • Backup all project history
  • Share code reliably
  • Enable reproducibility

Common workflows:

  • Code and scripts: Track all analysis code
  • Documentation: Version control for manuscripts, protocols
  • Small data files: Track metadata, data dictionaries
  • Configuration files: Manage software parameters
Warning

Large data files (>100MB) should not be stored directly in Git. Use Git LFS or data repositories instead.

Quality Control

Quality control identifies errors and issues before analysis.

Common QC checks:

  1. Completeness: Check for missing data
  2. Range checks: Verify values within expected ranges
  3. Consistency: Check logical relationships
  4. Duplicates: Identify duplicate records
  5. Format: Ensure consistent data types

Document QC decisions:

  • Procedures and criteria
  • Cases flagged or excluded
  • Reasons for exclusions
  • Date and reviewer
Data Security & Anonymization

Access Control:

  • Use permission levels (read-only, read-write, admin)
  • Limit access to sensitive data
  • Remove access when collaborators leave
  • Use institutional authentication systems

Anonymization for human participant data:

Remove personally identifiable information (PII):

  • Names, addresses, contact information
  • Dates of birth, ages over 89
  • Geographic identifiers smaller than state
  • Social security numbers, medical records
  • Facial features in photographs

Anonymization approaches:

  • De-identification: Remove direct identifiers
  • Pseudonymization: Replace with codes
  • Aggregation: Report group-level data
  • Perturbation: Add noise to prevent re-identification
Important

Test anonymization on sample data first. Have a colleague review for remaining identifiers.


Tools & Resources

Documentation & Lab Notebooks

Tools for documenting research processes and maintaining lab records:

eLabFTW

Open-source electronic lab notebook

Read more

Free, self-hosted ELN with timestamping, templates, and database features. Ideal for documenting experiments and protocols.

Quarto

Quarto

Reproducible documents combining code and narrative

Tutorial available
Read more

Create data collection protocols, README files, and documentation. Supports R, Python, Julia. Renders to multiple formats including HTML, PDF, and Word.

Jupyter Notebooks

Interactive computational notebooks

Read more

Combine code, output, and narrative text. Supports Python, R, Julia. Useful for documenting data processing workflows.

Version Control Systems

Platforms for tracking changes and collaborating on code and documents:

GitLab

LRZ GitLab

Institutional Git hosting for LMU

Supported at LMU
Read more

Private repositories hosted by LRZ. Use LMU credentials. Suitable for active research projects.

GitHub

GitHub

Popular Git hosting with collaboration features

Tutorial available
Read more

Free public repositories, GitHub Actions for automation, extensive integrations.

OSF

OSF

Research project management platform

Read more

Combines version control, storage, and collaboration. Integrates with GitHub, Dropbox, and other services.

Data Storage & Backup

Services for secure storage, synchronization, and backup of research data:

LRZ Sync+Share

LMU cloud storage service

Supported at LMU
Read more

50GB+ storage per user. GDPR-compliant. Desktop and mobile sync clients available.

LRZ DSS

Long-term archival storage

Supported at LMU
Read more

Tape-based backup for large datasets. Contact LRZ for access and quotas.

re3data

Registry of data repositories

Read more

Search to find repositories suited to your data type and discipline.

Anonymization Tools

Tools to help protect participant privacy while enabling data sharing:

Amnesia

Web-based anonymization application

Read more

Developed by OpenAIRE. Provides k-anonymity and other privacy-preserving transformations through a browser interface.

ARX

Comprehensive data anonymization tool

Read more

Open-source software supporting k-anonymity, l-diversity, t-closeness, and differential privacy. Desktop application with GUI.

sdcMicro

R package for statistical disclosure control

Read more

Implements various anonymization methods. Useful for microdata protection and risk assessment.

NoteLearn More: Data Anonymization

Maintaining Privacy with Open Data - Workshop by Ruben Arslan on anonymization strategies for behavioral research data.


TipOur Recommendation: Data Management Support at LMU

LMU University Library Data Management Services provides guidance on organizing and storing research data.

  • Contact: rdm@ub.uni-muenchen.de
  • Services: Storage recommendations, metadata guidance, repository selection, DMP support

Data Management Checklist

Throughout Data Collection:

Before Moving to Analysis:

MI
RL
  • Edit this page
  • Report an issue
Ludwig-Maximilians-Universität
LMU Open Science Center

Leopoldstr. 13
80802 München

Contact
  • Prof. Dr. Felix Schönbrodt (Managing Director)
  • Dr. Malika Ihle (Coordinator)
  • OSC team
Join Us
  • Subscribe to our announcement list
  • Become a member
  • LMU chat on Matrix

Imprint | Privacy Policy | Accessibility