Data Management Plan


Overview

The data management plan described herein will guide the management, use and dissemination of any materials created as a result of the AG2PI project. Materials may be the result of any AG2PI activity, event or funded seed grant. Materials produced from or used in a funded seed grant may be subject to an NDA; these will be decided upon on a case-by-case basis.

The following principles drive AG2PI’s data management policies:

  • Security: All data generated through this project will be maintained securely, password protected, with reasonable physical and cyber protection.
  • Access to public data: Unless protected by non-disclosure agreement or security concerns, data and source code utilized by members of the AG2PI project team to prepare journal publications, annual reports, workshops/webinars and conferences will be made easily available to other researchers through open source data and code repositories. These include, for example, GitHub, BitBucket, CyVerse Data Commons, CyVerse Data Store, CoGe, NEON, Dryad, NCBI GenBank, NCBI SRA, Open Science Framework, Protein DataBank and others. In addition, all training workshop materials will be made available online through ReadTheDocs (https://readthedocs.org/). Delivered training workshops and field days will be recorded and made available via YouTube and other channels for subsequent asynchronous learning. Data and related information (metadata, code) generated by AG2PI will be made publicly available within 6 months of completion of the project by deposition to a federal data repository (preferred, where applicable) or other open source repository.
  • Data curation: The Libraries of the lead institutions are expanding their role in research data management and curation (RDM) and the AG2PI project team will work with them as needed. Technical services staff trained in metadata standards are available to tag submitted data collections to ensure maximum accessibility. The libraries are also creating technological migration plans, so that research data will never be lost because of changing technologies, and will be adding research data collections to existing disaster management plans. This will ensure that the research and data generated by this project, including the results of seed and micro-grants, are properly managed, curated and preserved for use by other researchers in perpetuity and to properly meet the data management guidelines of USDA NIFA and other agencies.

Types of Data, Samples, Physical Collections, Software, Curriculum Materials, and other Materials subject to these policies

This project will develop curriculum materials, workshops, case studies, YouTube videos, webinars, and best practices. Formative and summative surveys and other evaluations will be collected during each activity or event. Seed grant recipients may be required to make their data publicly available, FAIR compliant, and properly annotated with appropriate metadata.


Standards to be used for Data and Metadata Format and Content

Data may be stored in binary, tab delimited arrays of text, plain text, and excel formats. Some data may also be stored in relational and non-relational databases. Imagery and video will be kept with proper metadata and with other related information stored as internal metadata related to the data's format (e.g., geotiff) or in relational databases (such as date and time of acquisition, which project it was collected under, etc.). All metadata will satisfy FAIR principles (e.g., MIAPPE compliant).


Plans for Access and Sharing Including Provisions for Appropriate Protection of Privacy, Confidentiality, Security, Intellectual Property, or other Rights or Requirements

Curriculum materials, webinars, YouTube videos, and best practices created from this project will be openly and widely disseminated for the duration of the project. All data protocols will be shared and disseminated through scientific publications and standards for best practices. Some collaborators may provide proprietary data to be used as test beds for the best practices and data management protocols developed by this project or may generate proprietary data through seed grant projects. In that case, non-disclosure agreements will be signed and data will be stored on secure servers. Where possible, data, code, training material, etc. will use an open source license such as GPL or MIT.


Policies and Provisions for Reuse, Redistribution, and the Production of Derivatives

Computer software developed by this project will be packaged in different libraries consisting of subroutines used to implement specific algorithms developed in this project. Unless protected by non-disclosure agreement or security concerns, data and source code utilized to prepare journal publications, annual reports, workshops, and conference publications will be made easily available to other researchers through open source data and code repositories such as GitHub and BitBucket. Where possible, programs will be virtualized using Docker or Singularity containers and published on DockerHub. Similarly, analyses and workflows will be integrated into CyVerse's Discovery Environment or Visual Interactive Computing Environment for immediate reuse by researchers. Curriculum materials, webinars, YouTube videos, and best practices will be open to reuse and modification.


Plans for Archiving Data, Samples, and other Research Products, and for Preservation of Access to Them

Management of data will be coordinated through three phases, initial, near term and long term. The specific practices will depend on several factors, especially the available technology and associated costs, data quantity, and long-term public access. Initial data management will facilitate immediate robust storage and to facilitate later analysis phases. At least two copies of all data files will reside on redundant disk systems (i.e., RAID); a working copy will be backed up by at least one archive copy. All analyses will occur using the working copy of a data set. The primary goal of these considerations is to preserve the raw data in case of catastrophic hardware failure. Near term data management will support the project's computational analyses and facilitate public access. For example, CyVerse and CoGe allows researchers to store private data, genomes, and experiments; create notebooks; share data with other researchers; and have access to their analytical history. Long-term data management will store the data indefinitely in a publicly accessible venue. Where possible, data will be submitted to the appropriate publicly funded database, such as the NCBI Sequence Read Archive. We will also make use of the CyVerse Data Store and Data Commons. The CyVerse Data Commons allows for the public publishing of datasets and associated metadata, and can issue Digital Object Identifiers (DOIs). At all phases of data generation, processing, and storage, metadata will be collected and stored.

The AG2PI project will work with the participating University Libraries, in collaboration with project participants, to oversee development of protocols, policies, and procedures for technological migration and ongoing curation of curriculum materials and all associated research products, and the permanent maintenance of these data and products, in accordance with the data management guidelines of USDA and other agencies. This will include the integration of data management into existing disaster management plans. Metadata tagging will be used to enhance accessibility of the data and services of the Iowa State University library will be leveraged to ensure proper standards are met for long term curation.