DataONE Dash Production Release Criteria

The following features should be tested, documented, and available as a precondition to the production release of DataONE Dash.

Infrastructure

  • DataONE support in all three parallel Dash VM instances (dash-dev.cdlib.org, dash-stage.cdlib.org, dash.cdlib.org), each configured with the full technology stack:
    • Apache 2.4.x SSL-enabled, reverse proxy web server
    • Ruby 2.x
    • Rails 3.x application framework
    • Unicorn 4.x web server
    • XTF 3.1
    • Tomcat
    • J2SE 1.5.0 or newer
    • Python 2.7.6 for metadata harvester
    • Server certificates
    • Start-up scripts (init.d)
  • Codebase managed in GitHub repositories, one each for ingest, harvest, and XTF, but all under the umbrella of a common UC3 dash administrative user.
  • Deployment scripts for infrastructure (puppet).
  • Deployment scripts for application and brochure pages (capistrano).

Policies

Standardized submission agreement & data use agreement (CC-0)

  • Maximum size of individual dataset files and datasets
  • Maximum size of individual campus collections
  • No price model necessary since storage paid for by UNM

Application features

  • Branding
    • URL: oneshare.cdlib.org
    • DataONE logo in all headers (ingest and XTF).
    • “Contact us” in all footers goes to UC3 email only (not DataONE)
    • Attribution in all footers, e.g., “DataONE Dash is a collaborative project of the DataONE project, University of New Mexico, and UC3”.
    • DataONE-specific text for About and FAQ pages.
  • Contribution/Submission
    • OAuth-based authentication with Google credentials, supported in the Ruby/Rails application using the OmniAuth gem.
    • File browse and drag-and-drop dataset selection, subject to maximum dataset and file size limits.
    • Metadata entry form, with tooltip for each element:
      • Title
      • Data type - pick list
      • Author (repeatable)
      • Keywords (3 required + repeatable)
      • Abstract
      • Methods
      • Link (repeatable): identifier + identifier type (pick list) + description (pick list)
      • Identifier
      • Funder
      • Grant number
    • Auto-filled fields in metadata:
      • Date - YYYY
      • Institution - ONEShare
      • Rights - CC-0
      • Rights URI - http://creativecommons.org/publicdomain/zero/1.0/
      • Opportunity to review metadata and add/delete files prior to submission.
    • Automated packaging and submission to the Merritt ONEShare collection.
      • Creation of equivalent DataCite and Dublin Core metadata files, see the crosswalk
      • Creation of the DataONE manifest file. There will be a pair of lines (one for “mrt-datacite.xml” and one for “mrt-dc.xml”) for each data file in the submission.

          <code>#%dataonem_0.1
          #%profile | http://uc3.cdlib.org/registry/ingest/manifest/mrt-dataone-manifest
          #%prefix | dom: | http://uc3.cdlib.org/ontology/dataonem#
          #%prefix | mrt: | http://uc3.cdlib.org/ontology/mom#
          #%fields | dom:scienceMetadataFile | dom:scienceMetadataFormat | dom:scienceDataFile | mrt:mimeType
          mrt-datacite.xml | http://schema.datacite.org/meta/kernel-3/metadata.xsd | <u>file</u> | text/xml
          mrt-dc.txt | http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd | <u>file</u> | text/xml
          ...
          #%eof</code>
        
  • Discovery: Faceted XTF search and browse, with faceting of all DataCite metadata elements.
  • Miscellaneous
    • Pages for “About”, “Terms of service”, “FAQ”, “Policies” in all footers.
    • “Contact Us” page

Configuration

  • OAuth-based authentication with Google credentials, supported in the Ruby/Rails application with the OmniAuth gem.
  • Merritt ONEShare user role account.
  • Public Merritt ONEShare collection (already existing).
  • Submission profile for the ONEShare collection, using the VM-based storage service (uc3-mrt-store.cdlib.org) to deposit data in the UNM storage node with assigned DOIs and register with the Merritt member node. [Work with Mark to modify the existing DataUp profile]
  • EZID DOI minter for ONEShare. DataONE will become an EZID member.
  • Maximum size of individual campus collections.

Operations

  • Cron job for Python harvesting script.
  • Groundwork/Nagios monitors.
  • Weekly report of the total and incremental number and size of datasets by campus/Merritt collection, mailed to designated campus contacts.