1 Data project management

Slides

Best practices

Project organization

  • Put each project in its own directory, which is named after the project.
  • Put text documents associated with the project in the docs (or documents) directory.
  • Put raw data and metadata in a data directory
  • Put files generated during analysis in a results directory.
  • Put project R code in the R directory.

Naming files

  • File names should be both machine and human readable.
  • Name all files to reflect their content or function.

Raw data

  • Save the raw data, and do not modify it directly.
  • Record all the steps used to process data.

Backing up/Keeping track of changes

  • Back up (almost) everything created by a human being as soon as it is created.
  • Keep changes small.
  • Create, maintain, and use a checklist for saving and sharing changes to the project.
  • Store each project in a folder that is mirrored off the researcher’s working machine.
  • Copy the entire project whenever a significant change has been made.

Software/Code

  • Place a brief explanatory comment at the start of every program.
  • Give functions and variables meaningful names.
  • Make dependencies and requirements explicit.

Collaboration

  • Create an overview of your project.
  • Create a shared “to-do” list for the project.
  • Decide on communication and shared writing strategies, e.g. using google docs.

References

This lesson has been adapted from the following sources: