APPENDIX E: Measuring Dataset and Data Asset Usage

In this initial phase, the Democratizing Data Initiative is focusing on measuring dataset and data asset usage in the research community. One way to understand dataset usage in research is to find research publications that mention the dataset. Those mentions in research publications can be used as a proxy for dataset usage, and these publications can be analyzed to understand their use.

For example, if we collect a set of all peer-reviewed publications that mentioned the NASS Census of Agriculture and were published between 2017 and 2021, we might find that around 50 of those publications were published in the Journal of Soil and Water Conservation, giving us insight into what kind of research was published that used this dataset. We might also find that a lot of these publications are authored by people affiliated with the University of Nebraska-Lincoln, which allows us to infer that this dataset is particularly useful for researchers at that institution.

Metadata Available

To understand what types of measures might be helpful in answering your questions about dataset usage, the following is a helpful list that notes some of the available information, or metadata, for each publication :

  • Title

  • Abstract

  • Keywords

  • References List

  • Acknowledgements

  • Full Text

  • Authors

  • Author affiliations (e.g., institution, city, country)

  • Document Type (e.g., journal article)

  • Source (e.g., journal name)

  • Publication year

  • Field (e.g., General Agricultural and Biological Sciences, Biochemistry)

For the purposes of this project, we have restricted the publications in the usage dashboard, as well as the database you will use, to those that mention at least one of three USDA datasets.

Constructing Measures

Using the information about the publications, we can answer questions and construct simple measures to investigate what’s happening in research. For example:

  • Which institutions mentioned at least one of the three USDA datasets in their publications?

    • Take the list of publications that mention at least one of the USDA datasets.

    • Create a list of institutions linked to each publication.

    • Count how many unique publications there are for each institution.

  • Overall, how many authors mentioned at least one of the three USDA datasets in their publications?

    • Take the list of publications that mention at least one of the USDA datasets.

    • Create a list of authors linked to each publication.

    • Count the unique number of authors.

  • Which institutions are publishing research on the topic of cover crops and using at least one of the USDA datasets?

    • Restrict the publication data on USDA datasets to only publications that have the Topic “cover crops” by searching through publication Topics.

    • Create a list of institutions using the restricted publication data.

    • Count how many unique publications there are for each institution.

The simple measures used in this project so far focus on two dimensions, although there are many more.

Measure
Description
Proxy for

Publication Count

Number of research publications

Research Productivity

Citation Count

Raw count of citations for a publication or set of publications

Research Impact / Scientific Impact

Measures can also be complex (such as field-weighted citation impact, or FWCI). FWCI takes a publication’s number of citations, and normalizes it based on the “expected” or average performance of a publication that is of the same publication year, document type, and field, to consider differences in citation practices across those three specifications. Below, you will see sample data on two institutions and their publications. Although University A has more citations than University B, their FWCI is lower. Digging into the publications further would help us understand if this is due to University B publishing more in fields where fewer citations is the norm, thus making the citations they accrued more meaningful.

Institution
Publication Count
Citation Count
Field-Weighted Citation Impact

University A

102

315

1.69

University B

95

226

2.88

Additional Data Sources

Additional data sources, such as those listed below, can enable even more measures to be developed.

  • Policy Documents that reference a publication (Source: Overton)

  • Patents that reference a publication (Source: LexisNexis PatentSight)

  • Social media, blog posts, news media, Wikipedia, and other text sources that mention or reference a publication (Source: PlumX)

  • Inferred author diversity, such as gender (binary scale) (Source: NamSor API)

A Note On Scientometrics

Using research outputs like peer-reviewed publications to study scientific research is its own field of study called scientometrics, or the “quantitative study of science, communication in science, and science policy.” Depending on the outputs used, there are other similar and overlapping fields of study, such as bibliometrics, informetrics, technometrics, and altmetrics. These fields apply mathematics and statistical methods to these written data sources to make inferences about behavior.

Last updated