Data Finding: Principles and Strategies
-
Datasets do not simply materialize out of thin air. Data must be collected, and then assembled and organized into data sets that can be shared. When looking for data, a good place to start is to ask yourself: Who might have an incentive or mandate to collect the type of data you are interested in? It could be a person (i.e. a researcher working in a field where such data could answer questions of interest), non-profit organization (i.e. the institutional research department of a college or university, the American Civil Liberties Union etc.), a government department (i.e. the Bureau of Labor Statistics, Federal Reserve etc.), a private company or corporation (i.e. Facebook), an international organization (i.e. World Bank, IMF etc.), an independent research organization (i.e. Gallup), or any number of other groups or organizations.
-
When might the data have been collected and published?
-
It is rare to search for data without a rough idea of a broad topic you want to study. That being the case, it is often a good strategy to do a preliminary literature review with a view towards identifying common data sources in your field of interest. What is the nature and provenance of the data influential researchers are using? Good places to start looking into such questions are review papers or meta-analyses.
-
How might data that could help you answer your question be collected, organized, and stored? Would it take the form of a survey dataset? A time series or panel? A geographic dataset that identifies the location of important variables in space? A network dataset that specifies relations between different units? Thinking about the format your data of interest is likely to take can help you to approach your search more strategically. Sometimes, you may want to implement a specific empirical method on a dataset; thinking about the type of data that will lend itself to this method may also help you narrow your search.
-
Think about the relationship between theory and empirics. What are your units of analysis? How might the concept you are interested in be operationalized and empirically measured? There might be different ways to empirically define or measure your concept of interest, and this could point towards different data sources and different empirical strategies.
-
Be flexible and iterate. Sometimes, the data you would use in an ideal world may not exist. Maybe it would be unethical to do so. Maybe it would be too expensive or difficult. In such cases, think creatively about other ways to measure your concept of interest, or ways to pull together existing data in novel ways. Other times, you may find that the process of looking for data leads to new research questions that are different (and maybe more interesting, tractable, or generative) than the one you started with. You should be willing to pursue these novel avenues of inquiry.
-
Once you’ve identified the dataset you need, there may be more than one way to access it; think about the most efficient way in the context of your needs.
-
Don’t hesitate to ask for help!
Sources
These principles draw heavily from the following sources:
Battista, Andrew. 2021. “Principles of Finding Data.” New York University Libraries Data Services. https://guides.nyu.edu/ds_class_descriptions/Principles-of-Finding-Data
Gao, Wenli. 2021. “Finding Data.” University of Houston Libraries. https://guides.lib.uh.edu/c.php?g=879727&p=6319391