1 Data project management
Slides
Best practices
Project organization
- Put each project in its own directory, which is named after the project.
- Put text documents associated with the project in the docs (or documents) directory.
- Put raw data and metadata in a data directory
- Put files generated during analysis in a results directory.
- Put project R code in the R directory.
Naming files
- File names should be both machine and human readable.
- Name all files to reflect their content or function.
Raw data
- Save the raw data, and do not modify it directly.
- Record all the steps used to process data.
Backing up/Keeping track of changes
- Back up (almost) everything created by a human being as soon as it is created.
- Keep changes small.
- Create, maintain, and use a checklist for saving and sharing changes to the project.
- Store each project in a folder that is mirrored off the researcher’s working machine.
- Copy the entire project whenever a significant change has been made.
Software/Code
- Place a brief explanatory comment at the start of every program.
- Give functions and variables meaningful names.
- Make dependencies and requirements explicit.
Collaboration
- Create an overview of your project.
- Create a shared “to-do” list for the project.
- Decide on communication and shared writing strategies, e.g. using google docs.
References
This lesson has been adapted from the following sources:
- Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2017). Good enough practices in scientific computing (F. Ouellette, Ed.) [Data set]. https://doi.org/10.1371/journal.pcbi.1005510
- Naming files by Jenny Bryan
- How to make your research reproducible: Managing Your Project