Put succinctly, metadata is ‘documentation about your data.’ Regarding the data outputs of your research project, it is the documentation about your research data that allow others to make use of that data! To illustrate varying degrees of metadata consider this graphic, showing cans of soup and information about what is contained therein.

ICPSR (2013). Metadata Management and Tools [PowerPoint slides]. From workshop “Curating and Managing Research Data for Re-Use.”

On the soup can above we are given: a) no information at all, b) some information on the front label, and c) rich and consistently structured information about nutrition on the back label. With no information we’re forced to open the can to figure out what is inside; on the other hand, rich and structured nutritional information can allow us to incorporate the contents into larger dietary decisions.

Consider the following question: If you were to share your research data with colleagues in your research community for them to use in their own research, and you didn’t want them asking you question after question about your data, what information would you need to give them?

Whether you conducted your research on a server, in the field, in the library or in the lab, there will be some commonalities in what another researcher might want to know about the data used or created. They would clearly need to know your data formats, labels or descriptors for the datasets, and any relevant units of measurement. For observational research, the metadata could further include the instrument or technique used for observation, settings of any such instrument, and when and where the observations were taken. For research involving code or software, the metadata could also include the configurations, parameters or settings, properties of the computer and operating system, and even the scripts or code used to generate the data.

Further, another researcher would want to know the context in which the data were collected or generated. What was the original purpose for collecting these data? No doubt some of these details could be found in a relevant publication or other narrative, but probably not all!

Creating comprehensive metadata for your research data will ease the sharing, understanding and re-use of your research data. Also note that the colleagues who re-use your research data could very well be others in your research group, or even your future self! You likely know all the details of your research project and its data when you’ve finished with it; will you know those details in a year or two?

What steps should I take in creating good metadata?

First, start early! It is important to consider the metadata needed for your research project from the outset, as it will likely be much less onerous to create this documentation during the research process instead of after the project is completed.

To potentially reduce the burden of generating metadata for your research project, consider capturing notes about your research in a digital format instead of a physical lab notebook. Doing so will allow you to search your research notes more efficiently and can lead to easier aggregation of metadata at the end of the project.

Photo illustration: Troy Slama/DoIT, from http://www.news.wisc.edu/23146

Also consider automatic generation for some of your metadata during your research project, if possible. Perhaps you can have your instrument automatically record some important information about the images it generates, or perhaps you can add a few lines of code to your computer model to record metadata on the fly.

However you generate it, the metadata needed to make your research data understandable to other scientists, including your colleagues, will be unique to your research discipline and to the research project.

  1. Are there standards for metadata (documentation about your research data) in your research community that you will be using? If you are unsure, the Research Data Alliance Metadata Directory is one resource you can check.
  2. Who are the creators of the dataset for which you are now creating metadata (e.g Full names or organizations with contact information)?
  3. Who are the funders of the research that produced the dataset in question? List and include funding/award numbers if appropriate.
  4. Briefly describe the project that produced this dataset, including the purpose, methods and results of the analysis. This could be the same or similar to information found in a publication abstract.
  5. List significant dates associated with the dataset; e.g. dates of data acquisition or generation, project start and end date.
  6. Briefly describe the data sources used and products acquired and generated in the course of the project. These data products descriptions could include file sizes and formats. Also document how your data products are organized (e.g. folders and files or in a database).
  7. Briefly describe the methodology and techniques used in acquisition and generation of the above data products. In doing so consider mapping out your research workflow and any tools required to make use of the datasets (e.g., software, instruments). Be sure to document specific procedures and protocols used.
  8. When appropriate describe data formatting (e.g. column headers) and coding (e.g. how is a missing value recorded?) for the data products.
  9. Be sure to keep track of dates, times and versions of this dataset!
  10. Make clear your rights as the data creator and other’s rights as the data user (e.g. through use of a Creative Commons license).

Contact JHU Data Management Services at datamanagement@jhu.edu for assistance in effective and efficient development of metadata for your research project.