February 16th: Finding the Right Data
Post authored by Lora Leligdon
Thursday bring us to finding the right data for your project.
To find the right data, have a clear question and locate quality data sources.
Things to consider
In a 2004 Science Daily News article, the National Science Foundation used the phrase “here there be data” to highlight the exploratory nature of traversing the “untamed” scientific data landscape. The use of that phrase harkens to older maps of the world where unexplored territories or areas on maps bore the warning ‘here, there be [insert mythical/fantastical creatures]’ to alert explorers of the dangers of the unknown. While the research data landscape is (slightly) less foreboding, there’s still an adventurous quality to looking for research data.
- ‘Dear Mona’ series [FiveThirtyEight] https://fivethirtyeight.com/tag/dear-mona/
- And This is Why We Should Always Provide Our Data [PLOS ONE] http://blogs.plos.org/paleo/2013/01/25/and-this-is-why-we-should-always-provide-our-data/
- Achieving human and machine accessibility of cited data in scholarly publications [PeerJ CompSci] https://peerj.com/articles/cs-1/
- The patience of the data hunter: https://www.dataone.org/data-stories/patience-data-hunter
- Formulate a question
The data you find is only as good as the question you ask. Think of the age-old “who, what, where, when” criteria when putting together a question – specifying these elements helps to narrow the map of data available and can help direct where to look!
- WHO (population)
- WHAT (subject, discipline)
- WHERE (location, place)
- WHEN (longitudinal, snapshot)
This page from Michigan State University Libraries’ “How to find data & statistics” guide does a great job of further articulating these key elements to forming a question and putting together a data search strategy.
- Locate data source(s)
After you’ve identified the question, you can begin the scavenger hunt that is locating relevant source(s) of research data. One way to find data is to think about what organization, industry, discipline, etc. might gather and/or disseminate data relevant to your question.
Thinking about your source can also help with evaluating whether or not you have relevant, quality data to use.
- If you’re looking for general, multidisciplinary data sets – check out sources like ICPSR (Inter-university Consortium for Political and Social Research) or Amazon Public Datasets. Lists of open data repositories, such as Open Access Data Repositories, can help point to more discipline specific data sets.
- There is an increasing number of city or statewide data portals – some examples: New York City, Hawaii, and Illinois – that provide access to regional data on everything from traffic patterns to restaurant inspection results.
- At the federal level, several agencies and organizations provide access to nationwide data sets like Data.gov, Census Bureau, and Centers for Disease Control & Prevention.
- For international data, look to sites like UNdata and World Health Organization that cover a variety of countries and topics.
Check out this post from Nathan Yau, data viz whiz and creator of FlowingData — his post includes some of the sources listed above, but also highlights tips like scraping data from websites and using APIs to access data.
- Cite accordingly
The ability to reuse data is only as good as its quality; the ability to find relevant data is only possible if it’s discoverable. As a producer of data, that means following many of the practices articulated in earlier posts. As a consumer of data, that means being a good researcher and citing your data sources.
In general, citing data follows the same template as any other citation — include information such as author, title, year of publication, edition/version, and persistent identifier (e.g., Digital Object Identifier, Uniform Resource Name). Check with your data source as well – they may provide guidance on how they want to be cited!
BYODM — build your own (research) data map! Ask yourself:
- What data sources are most relevant to my research?
- Are there relevant data sets generated or held locally that I have access to?
- What information do I need to retrace my steps back to these data (e.g., contact information, URLs, etc.)?
Where have you found the right data? Join us on Twitter or Facebook (#LYD17 #loveyourdata) to share your stories! Our daily blog posts are courtesy of the 2017 LYD Week Planning Committee. Learn more at https://loveyourdata.wordpress.com/lydw-2017/!
Tomorrow we are going to wrap up the week with rescuing unloved data.