# Evaluating a New Datasource

By Drew Breunig

Important questions to answer (or begin to answer) before using a data source in a project.

- [ ] <strong>Review the data license and ensure it’s compatible with your project.</strong> Check that any share-alike requirements are compatible with your other data sources.
- [ ] <strong>Understand how the data was collected and processed.</strong> These steps “color” the data in (sometimes) subtle ways. Having a good grasp on how the data is produced will help you answer the following questions and mitigate any quirks inherent in the data.
- [ ] <strong>Consider the ethical aspects of the data.</strong> Were any participants properly notified? Is there any personally identifiable information in the dataset? Are you willing to take on the necessary responsibilities implicit with using the data?
- [ ] <strong>Check the timeliness of the data.</strong> When was it last updated and is it updated regularly? Be cautious of working with brand new datasets that have yet to establish a cadence as their formats may change and underlying issues may not have been discovered. Consider how often you’ll need this data for your project.
- [ ] <strong>Understand how the data is formatted and delivered.</strong> Some datasets are easy to consume. Others may require you to spend time learning new tooling capable of ingesting the information.
- [ ] <strong>Assess the dataset’s coverage.</strong> If you’re doing an analysis overtime, does the dataset adequately cover your time periods? If you’re performing a geosptaial analysis, does the dataset cover your area of interest?
- [ ] <strong>Sample the dataset and compare it to something you know well.</strong> After all of the above, see if the data passes the sniff test. If it’s geospatial, how well does it represent your home neighborhood? Compare it with the data you already have and see if it agrees or compliments what you already trust. Just <em>look </em>at it, save the deep quality analysis for after you’ve passed all these steps.
