February 14th: Documenting, Describing, Defining
Post authored by Lora Leligdon
For the second day of Love Your Data week, we will be discussing good data documentation!
Good documentation tells people they can trust your data by enabling validation, replication, and reuse.
Things to consider:
Why does having good documentation matter?
- It contributes to the quality and usefulness of your research and the data itself – for yourself, colleagues, students, and others.
- It makes the analysis and write-up stages of your project easier and less stressful.
- It helps your teammates, colleagues, and students understand and build on your work.
- It helps to build trust in your research by allowing others to validate your data or methods.
- It can help you answer questions about your work during pre-publication peer review and after publication.
- It can make it easier for others to replicate or reuse your data. When they cite the data, you get credit! Include these citations in your CV, funding proposal, or promotion and tenure package.
- It improves the integrity of the scholarly record by providing a more complete picture of how your research was conducted. This promotes public trust and support of research!
- Some communities and fields have been talking about documentation for decades and have well-developed standards for documentation (e.g., geospatial data, clinical data, etc.), while others do not (e.g., psychology, education, engineering, etc.). No matter where your research community or field falls in this spectrum, you can start improving your documentation today!
Stories (learn from others’ mistakes and successes)
- Error-laden database kills paper (Retraction Watch): http://retractionwatch.com/2016/12/27/error-laden-database-kills-paper-extinction-patterns/#more-46585
- The value of a good inventory system: https://www.dataone.org/data-stories/inventory-overload
- Metadata? I thought you were in charge of that: https://www.dataone.org/data-stories/metadata-i-thought-you-were-charge
- The case of the missing research protocol: https://www.dataone.org/data-stories/case-missing-research-protocol
- The importance of documenting how your images and visualizations are created: http://retractionwatch.com/2016/07/04/diabetes-researcher-logged-1-retraction-3-correx-after-pubpeer-comments/
Practical Tips by data type & format
- Lab notebooks
- Define all your codes clearly and operationally
- Document introductory & debriefing comments
- Make sure you’ve defined codes for non-verbal behavior
- Identify annotations separately from quotes or notes
- Documentation should include
- Your assumptions
- Rationale for choices in designing the interview
- The interview questions or script (if applicable)
- Relationship or map between the research questions and the interview questions
- Codes or notations for non-verbal behavior
- Syntax or codes to indicate annotations versus interview responses
- Documentation should include
- Georgia Tech’s documentation tips: http://d7.library.gatech.edu/research-data/documentation
- Best Practices for Project Metadata: http://ropensci.github.io/reproducibility-guide/sections/metaData/
- README files are a simple and low-tech way to start documenting your data better. Check out the sample readme.txt (filename = readme_template.txt) from IU or Cornell University’s data working group guide with tips for using readme files
- Check out Kristin Briney’s post on taking better notes
- Reining in your metadata – advice from an archivist
- Cornell University data working group also has some tips for writing metadata
- Want to learn more? Attend the upcoming Dartmouth workshops on data management to learn hands-on approaches to ensuring quality data.
- Check out some of the documentation guidelines and standards out there. What can you borrow or learn from them to improve your own documentation?
- USGS Data Management guidelines: https://www2.usgs.gov/datamanagement/describe/metadata.php
- CDISC has three foundational standards for clinical research data, including CDASH (Clinical Data Acquisitions Standards Harmonisation) & SDTM (Study Data Tabulation Model) & ADaM (Analysis Data Model)
- Marine Metadata Interoperability: https://marinemetadata.org/
- 10 Simple Rules for a Computational Biologist’s Laboratory Notebook http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004385
- Join the conversation on Twitter at (#LYD17 #loveyourdata) or share your insights on Facebook (#LYD17 #loveyourdata)
Stay tuned… tomorrow we will be providing good data examples!
Our daily blog posts are courtesy of the 2017 LYD Week Planning Committee. Learn more at https://loveyourdata.wordpress.com/lydw-2017/!