Feature #3108

Initial implementation of Data Deposit API (SWORD v2 compliant)

Added by Philip Durbin 10 months ago. Updated 7 months ago.

Status:CompletedStart date:06/14/2013
Priority:HighDue date:
Assignee:Kevin Condon% Done:

100%

Category:-
Target version:3.6
Usability Testing:

Description

We plan to implement a Data Deposit API that is complaint with version 2 of a protocol called "SWORD" (Simple Web-service Offering Repository Deposit).

Developers should be able to use the Data Deposit API to (list from http://devguide.thedata.org/features/api/data-deposit/v1 ):

  • Create a study
  • Add files to a study with a zip file
  • Delete a file by database id
  • Replacing cataloging information (title, author, etc.) for a study
  • List studies in a dataverse
  • Display a study atom entry
  • Display a study statement
  • Delete a draft study
  • Release a study
  • Deaccession a released study
  • ...

Developers should not be able to use the Data Deposit API to (should be tested):

  • Replace all files on a study with a new zip file
  • Release a dataverse (SwordError with instructions expected)
  • Unrelease a dataverse (SwordError with instructions expected)

To be in compliance with SWORDv2, we should implement as much as possible of the official specification: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html (via http://swordapp.org/sword-v2/sword-v2-specifications/)

In order to implement SWORDv2, we plan to use the official Java server library (or a fork thereof) at https://github.com/swordapp/JavaServer2.0

The first expected use case for the DVN's SWORDv2-compliant Data Deposit API is the "Dataverse" plugin for Open Journal Systems (OJS) which is currently under development and described at http://projects.iq.harvard.edu/ojs-dvn and http://pkp.sfu.ca/wiki/index.php?title=PKP/Dataverse_Network_Integration

In addition to this ticket, planning, design, and documentation is also taking place at http://devguide.thedata.org/features/api/data-deposit


Related issues

Related to Feature #3186: Map additional Dublin Core Metadata Initiative (DCMI) ter... New 07/18/2013
Related to Feature #3202: SWORD API: Investigate how or if TOU will be implemented ... Completed 07/29/2013
Related to Suggestion #3208: Allow the Data Deposit API to use API keys New 08/01/2013
Related to Feature #3182: File upload: perform md5 checksums on all files. Completed 07/12/2013
Related to Bug #3224: View Study: Versioning tab should not show a null field a... Completed 08/14/2013
Related to Suggestion #3225: Refactor code for releasing dataverses out of backing bea... New 08/15/2013
Related to Feature #3232: Let file metadata (i.e. description) be specified during ... New 08/19/2013
Related to Documentation #3234: Release DVN 3.6 to SourceForge Completed 08/20/2013
Related to Bug #2748: File Upload: Clean out temp files from upload directory, ... New 01/28/2013
Related to Bug #3246: Sword API: Study metadata required fields are not require... Completed 08/27/2013
Related to Bug #3250: Sword: File name not required to be unique Completed 08/28/2013
Related to Bug #3269: Sword: Released study is deleted not deaccessioned Completed 09/06/2013
Related to Bug #3270: Sword: Uploading subsettable files fails, however study i... Completed 09/06/2013
Related to Bug #3273: SWORD API: Depositing particular zip file fails with null... Completed 09/06/2013
Related to Bug #3262: SWORD API: When requesting service document, should only ... Completed 09/03/2013
Related to Bug #3260: Sword API: Can't create study in dvn-3 Completed 08/30/2013
Related to Bug #3271: SWORD API: Updated field value in study statement does no... In Review 09/06/2013
Related to Bug #3256: SWORD API: Failure to create study due to too large metad... Completed 08/29/2013
Related to Bug #3255: SWORD API: Date field in study metadata is not validated,... Completed 08/29/2013
Related to Feature #3279: Data Deposit API v1 bug fixes and enhancements New 07/18/2013
Related to Feature #3278: Data Deposit API v2 New 08/01/2013
Related to Bug #3284: DOI: Studies created with SWORD API do not resolve, are n... Completed 09/10/2013
Related to Bug #3317: Attempting to release a deaccessioned study via SWORD res... New 09/18/2013

History

#1 Updated by Gustavo Durand 10 months ago

  • Assignee set to Philip Durbin
  • Target version set to 58

#2 Updated by Gustavo Durand 10 months ago

  • Status changed from New to In Design
  • % Done changed from 0 to 20

#3 Updated by Philip Durbin 10 months ago

  • Priority changed from Normal to High

#4 Updated by Philip Durbin 10 months ago

Today I created a couple proof of concept servlets and wrote a couple scripts to exercise them.

This is all in a new branch called "3108-data-deposit-api": https://github.com/IQSS/dvn/commit/4838565

The first script exercises the "service document" servlet, which for now only shows a minimal amount of XML:

murphy:dvn pdurbin$ tools/scripts/data-deposit-api/test-service-document 
<?xml version="1.0"?>
<service xmlns="http://www.w3.org/2007/app" xmlns:atom="http://www.w3.org/2005/Atom">
  <generator xmlns="http://www.w3.org/2005/Atom" uri="http://www.swordapp.org/" version="2.0"/>
  <version xmlns="http://purl.org/net/sword/terms/">2.0</version>
</service>

Eventually, the DVN service document will have a number of "collection" URLs. I don't think the DASH folks will mind me sharing the titles for some of their collection URLs:
  • FAS Scholarly Articles
  • FAS Student Papers
  • FAS Theses and Dissertations
  • GSD Scholarly Articles
  • GSE Scholarly Articles

So we should think about title for our collections URLs. We should also think about the URLs themselves which might look something like this:

  • /dvn/api/data-deposit/swordv2/collection/1902.1/1
  • /dvn/api/data-deposit/swordv2/collection/1902.1/2

The spec shows what the final XML should look like: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#protocoloperations_retreivingservicedocument

The second script deposits some data (i.e. "example.zip") into a directory specified by the "config-impl" parameter of the "collection" servlet. Expected output is something like this:

murphy:dvn pdurbin$ file example.zip 
example.zip: Zip archive data, at least v1.0 to extract
murphy:dvn pdurbin$ ls -a /tmp/uploads
.    ..
murphy:dvn pdurbin$ tools/scripts/data-deposit-api/test-collection 
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
  <generator uri="http://www.swordapp.org/" version="2.0"/>
  <id>fakeIri</id>
  <link href="fakeIri" rel="edit"/>
  <link href="fakeIri" rel="http://purl.org/net/sword/terms/add"/>
  <treatment xmlns="http://purl.org/net/sword/terms/">no treatment information available</treatment>
</entry>
murphy:dvn pdurbin$ ls -a /tmp/uploads
.        ..        xample.zip
murphy:dvn pdurbin$ file /tmp/uploads/xample.zip 
/tmp/uploads/xample.zip: Zip archive data, at least v1.0 to extract
murphy:dvn pdurbin$

Note that for now "example.zip" is changed to "xample.zip" due to a bug in the SWORDv2 library that is being discussed at https://github.com/swordapp/JavaServer2.0/pull/2 and http://www.mail-archive.com/sword-app-tech@lists.sourceforge.net/msg00321.html

The XML above needs a lot of work. It should provide a data deposit receipt per the spec: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#depositreceipt

No authentication takes place at all. The servlet simply accepts the file (!). I could use some assistance with figuring out how best to use DVN's authentication before accepting the file for upload.

#5 Updated by Gustavo Durand 10 months ago

  • Target version changed from 58 to 3.6

#6 Updated by Gustavo Durand 9 months ago

  • Status changed from In Design to In Dev
  • % Done changed from 20 to 50

#7 Updated by Philip Durbin 8 months ago

  • Status changed from In Dev to In QA
  • Assignee changed from Philip Durbin to Kevin Condon
  • % Done changed from 50 to 90

I'm passing this ticket over to QA.

http://devguide.thedata.org/features/api/data-deposit/v1 is the place to look for the latest curl commands for the various supported operations (create study, etc.).

Below is a brain dump of documentation work, questions that can be asked of the SWORD community, and ideas for v2 of the Data Deposit API:

  • document 15 allowed dublin core fields
  • document dvn.dataDeposit.maxUploadInBytes JVM option
  • email SWORD list about moving "add files" from edit-media to edit iri? draft at https://docs.google.com/document/d/1T0AvK0CwD6fwjrgdkOb6TC7BrQrBzWQWtmkaYcvjqMI/edit?usp=sharing
  • weird [Empty] on diff page after editing study in GUI (Redmine #3224)
  • test study.entry.add_author("Foo Bar") in python dvn_client (Kevin will test)
  • add to v2 ticket: support auto renaming of 50by1000.tab to 50by1000_1.tab? (original was 50by1000.dta)
  • add to v2 ticket: support RDF version of SWORD statement?
  • add to v2 ticket: split bibliographicCitation into more fields?
  • add to v2 ticket: support adding file metadata (description, category, filename)? (Redmine #3232)
  • add to v2 ticket: check md5sum on file add? (Redmine #3182)
  • add to v2 ticket: indicate acceptance of sword:collectionPolicy inside DVN? (emailed sword list)
  • add to v2 ticket: MTOMXMLStreamWriter and related DEBUG messages make logs very chatty
  • add to v2 ticket: fix bug in SWORD Java library that results in an ArrayIndexOutOfBoundsException at https://github.com/IQSS/swordv2-java-server-library/blob/aeaef8342361bf3de7e9fc8f2a979cd742bc31ae/src/main/java/org/swordapp/server/SwordAPIEndpoint.java#L334
  • add to v2 ticket: more SWORD compliant to not use attributes like this? <dcterms:isReferencedBy holdingsURI="http://dx.doi.org/10.1038/dvn333" agency="DOI" IDNo="10.1038/dvn333">Peets, J., & Stumptown, J. (2013). Roasting at Home. New England Journal of Coffee, 3(1), 22-34.</dcterms:isReferencedBy>

In lieu of an update at our Monday meeting (which I'll miss due to vacation), here are the main changes made this week:

  • allowing (related) Publications (i.e. OJS citation) to be populated via dcterms:isReferencedBy
  • added dvn.dataDeposit.maxUploadInBytes JVM option because otherwise file size upload via SWORD appear to be unlimited (!)
  • fixed #3250 regarding duplicate filename
  • fixed #3246 to make sure studies have titles
  • persistent URL exposed in individual field
  • tested Content-MD5 header, which works
  • atom entry clean up
  • checking study locks in more places to prevent exceptions due to ingest
  • better error handling and messages
  • disabled "delete all studies" network admin operation

Finally, I made sure all of these changes and the ones from the last two weeks are ok with Jen from OJS. When I asked if there is anything she needs she said, "I'm pretty clear that I have what I need." The full chat log is at http://irclog.iq.harvard.edu/dvn/2013-08-27

#8 Updated by Kevin Condon 7 months ago

  • Status changed from In QA to Completed
  • % Done changed from 90 to 100

Basic SWORD server implementation delivered and tested. Individual tickets were opened for issued found.

Closing ticket

Also available in: Atom PDF