Data Management Plan Questionaire

following the recommendations of the Deutsche Forschungsgemeinschaft for edition projects in literary studies (DFG 105)

Answers for Digitale Gesamtedition und Übersetzung des koptisch-sahidischen Alten Testaments

General

Requirements I

Are there requirements regarding the data management from your scholarly/scientific community?

Requirements II

Which are these additional requirements regarding data management?

Support

Who provides support for research data management issues for the project?

Göttingen eResearch Alliance (https://www.eresearch.uni-goettingen.de/de/) is supporting the creation of the data management plan.

Content classification

Datasets

What kind of dataset is it?

Dataset "A (Images)": Sets of digital images of Sahidic Biblical manuscripts and manuscript fragments on a page-by-page basis. The images are produced by digital photography from heterogeneous sources ranging from the current state-of-the-art color surrogates taken from the actual objects to scans from black and white legacy photographs or microfilms. The sets are arranged on the basis of careful examination of more than 60 manuscript holding institutions' archives in order to reconstruct as far as possible the dismembered and widely dispersed codices.

Dataset "B (Metadata)": Descriptive Metadata of Sahidic Biblical manuscripts covering the areas of provenance and date, original and current state, material and artisanal characteristics, textual and paratextual features as well as content and current location. These metadata are aggregated from first-hand study of the original items, collection of information from secondary literature, and integration of existing databases, (e.g., Pleiades).

Dataset "C (Transcriptions)": Digital transcriptions of the text of the extant Sahidic Biblical manuscripts containing TEI compliant markup: The transcriptions are produced within an Online Transcription Editor (https://sourceforge.net/projects/wfce-ote/) that was custom made for the transcription of Greek New Testament manuscripts as part of a DFG/AHRC funded project for developing a Collaborative Research Environment; the OTE received some special tuning to accommodate Sahidic Old Testament data and is used following an elaborate collection of Transcription Guidelines to be applied. The final transcription is achieved by one (expert) person conforming a base text into a diplomatic transcription of a given manuscript with a second (expert) person reviewing the first person's work and finally resolving any discrepancies first among themselves and as deemed necessary with the help of additional expertise. The resulting final transcriptions are published as self-containing diplomatic manuscript editions as well as used to produce critical editions of every book of the Sahidic Old Testament.

Dataset "D (Edition Rules)": A set of editorial decisions applied to automatic collations of digital transcriptions by means of the collation engine CollateX (https://collatex.net/): Different user interfaces allow to create rules that - when applied to the automatic collation - shape the resulting collation into an apparatus criticus that reflects the philological and editorial principles of the edition.

Dataset "E (Citations)": A collection of Old Testament citations from extra-Biblical Sahidic literature, ie. Sahidic Church Fathers' writings, Sahidic translations of writings from other traditions, prayer-books, letters, etc. as well as inscriptions and ostraca (pot-sherds).

Dataset "F (Edition Text)": A critically reconstructed edition text of every book of the Sahidic Old Testament aiming at representing the oldest attainable text of the earliest translation from the Greek version of the Old Testament into the Egyptian language.

Dataset "G (Translations)": Translations of the critically reconstructed edition text of the Sahidic Old Testament into English, German, and Arabic.

Dataset "H (Bibliography)": A comprehensive bibliography of literature on the Coptic-Sahidic Biblical text tradition.

Reuse

Which individuals, groups, or institutions could be interested in re-using this dataset? What are possible scenarios? What consequences does the reuse potential have for the provision of the data later?

Dataset "A": This is the most comprehensive collection of manuscript images for the Sahidic Old Testament. In addition, it is the most advanced reconstruction of the dismembered codices containing text of the Sahidic Old Testament. As such it is of interest to Copologists and experts in the field of Biblical textual research from other ancient languages, in particular, Greek, Latin, Syriac, Arabic, Armenian, and Ethiopic as well as art historians and manuscript researchers more generally. Our aim is to get fresh digital images for every item necessary for the edition from every holding institution, to secure the right to display these images freely as part of the edition, and to encourage holding institutions to make them available via their own digital outlets so that the public has as many options as possible to access such images. At the same time, we already grant colleagues access to the images as soon as they are available to us and provided we have secured the right to share them via our own digital workspace well ahead of the completion of our editorial work.

Dataset "B": This is the most comprehensive set of metadata for the manuscripts of the Sahidic Old Testament. Through it, we justify our reconstruction of the dismembered codices and stimulate scholarly discussion to advance the work on these manuscripts. As such it is of interest to Copologists and experts in the field of Biblical textual research from other ancient languages, in particular Greek, Latin, Syriac, Arabic, Armenian, and Ethiopic. Some of the metadata are of special interest to Greek paleographers and are expected to contribute to the dating of Greek handwriting as well. In cooperation with the Institut für neutestamentliche Textforschung (Münster), who contribute metadata for the Sahidic New Testament manuscripts, our metadata is part of the official manuscript checklist of Sahidic Biblical manuscripts. We are planning on committing (part of) our metadata to the Leuven Database of Ancient Books (LDAB) and to the Archeological Atlas of Coptic Literature (Paths). Since we expose all of our data and metadata via a web API anyway, there will be only very modest additional costs to sharing our metadata with whoever expresses interest in them.

Dataset "C": This is the most comprehensive and hopefully the most accurate collection of published manuscript transcriptions of the Sahidic Old Testament. As such it is of interest to Copologists and experts in the field of Biblical textual research from other ancient languages, in particular, Greek, Latin, Syriac, Arabic, Armenian, and Ethiopic as well as historians of liturgy and church historians more broadly. Users can already download any page of our published manuscript transcriptions directly from the transcription viewer in either tei or html format. Through our web API we offer plaintext as well.

Dataset "D": The rule set containing all the editorial decisions that create the critical apparatus on the fly is closely tied to the utilization of CollateX and the form and format of the manuscript transcription files produced in this project. It is of interest to Coptologists who plan to create their own edition based on the same array of resources. The rule set can be called through our web API and utilized by everybody who has access to a workspace accommodating said array of resources.

Dataset "E": This is the most comprehensive collection of citations of Old Testament passages collected from Sahidic literature, i.e. Church Fathers' writings, prayer books, etc. as well as inscriptions and ostraca (pot-sherds). As such it is of interest to Copologists and experts in the field of Biblical textual research from other ancient languages, in particular, Greek, Latin, Syriac, Arabic, Armenian, and Ethiopic as well as archeologists, historians of liturgy, and church historians more broadly. In part, we are reusing material that has been gathered and digitally published by others, e.g., Koptische Ostraca Online (https://www.koptolys.gwi.uni-muenchen.de/) or The Canons of Apa Johannes the Archimandrite (https://coptot.manuscriptroom.com/web/apa-johannes). Hence, the material is already available through other venues. But once our citation database is reasonably complete and integrated into the edition we plan on exposing the data via a web API as well.

Dataset "F": This is the most up-to-date philologically and text-historically accomplished hypothesis on the original translation of the Sahidic Old Testament. As such this text will be of interest not just to Copologists and experts in the field of Biblical textual research from other ancient languages, in particular Greek, Latin, Syriac, Arabic, Armenian, and Ethiopic but to modern communities of Copts in Egypt and all over the world as well as all modern-day churches engaging in ecumenical dialogue. This text will be free to use and download for everyone through our web API as well as directly from our digital edition.

Dataset "G": These translations into English, German, and modern Arabic will be the first translations of the edition text into modern languages attempting to broaden the reception of the edition as well as the appreciation of a historically unique translation project from late antiquity. They are not only of interest to scholars with limited expertise in the Coptic language but especially to laypeople with no expertise at all. In addition, they will be of interest to Bible software developers and Bible Societies who create and publish modern-day iterations of the Bible for all sorts of special audiences. These translations will be free to use and accessible through our web API and directly downloadable from our digital edition.

Dataset "H": This is the most comprehensive catalog of secondary literature on the Coptic-Sahidic Biblical tradition. As such it is of interest to Copologists and experts in the field of Biblical textual research from other ancient languages, in particular, Greek, Latin, Syriac, Arabic, Armenian, and Ethiopic as well as art historians and manuscript researchers more generally.

Technical classification

Formats

Which file formats are used?

Dataset "A": The images that are used for the edition and exposed online are in jpg format with an average size between 1-2 MB. For a number of them, lossless quality images in tif format are available as well. But they are not exposed online. These decisions have been made based on the economics of online bandwidth.

Dataset "B": The metadata is stored in a utf-8 encoded relational SQL database. A REST-like API exposes the metadata in XML, JSON, and various other formats (https://coptot.manuscriptroom.com/community/vmr/api/metadata/).

Dataset "C": The transcriptions are text files containing TEI-compliant XML markup saved in a versioned git repository. The XML specifications that are utilized are based on the specifications devised for other digital Biblical transcription work produced in Birmingham (Institute for Textual Scholarship and Electronic Editing) and Münster (Institut für neutestamentliche Textforschung); they are found at http://epapers.bham.ac.uk/1892/5/IGNTP_XML_guidelines_1-5.pdf

Dataset "D": The rule set governing the creation of the critical apparatus is stored in a utf-8 encoded relational SQL database with XML-based markup.

Dataset "E": Larger citations, e.g. from ostraca, are captured and treated like transcriptions, i.e., they are text files containing TEI-compliant XML markup saved in a versioned git repository. Shorter citations and mere allusions are stored in a utf-8 encoded relational SQL database with XML-based markup.

Dataset "F": The edition text is a text file containing TEI-compliant XML markup saved in a versioned git repository.

Dataset "G": The translations are text files containing TEI-compliant XML markup saved in a versioned git repository.

Dataset "H": The bibliography is managed as a Zotero library (https://www.zotero.org/groups/1062845/coptic_bible/library) and can be exported in several formats (e,g. CSL-JSON, BibTeX, Endnote-XML) as part of the Zotero application.

Data usage

Data organisation

Where is the dataset stored during the project?

Dataset "A": The images are to a large extent stored on servers of the GWDG under the domain coptot.manuscriptroom.com. A small subset is stored under the domains of holding institutions and included in our system via IIIF.

Dataset "B": This dataset is stored on servers of the GWDG under the domain coptot.manuscriptroom.com.

Dataset "C": This dataset is stored on servers of the GWDG under the domain coptot.manuscriptroom.com.

Dataset "D": This dataset is stored on servers of the GWDG under the domain coptot.manuscriptroom.com.

Dataset "E": This dataset is stored on servers of the GWDG under the domain coptot.manuscriptroom.com.

Dataset "F": This dataset is stored on servers of the GWDG under the domain coptot.manuscriptroom.com.

Dataset "G": This dataset is stored on servers of the GWDG under the domain coptot.manuscriptroom.com.

Dataset "H": This dataset is stored on servers of the GWDG under the domain coptot.manuscriptroom.com and on Zotero's servers (https://www.zotero.org/groups/1062845/coptic_bible/library).

Data sharing and re-use

Will this dataset be published or shared?

Dataset "A": Yes, externally limited with individual approval

Dataset "B": Yes, externally for everyone

Dataset "C": Yes, externally for everyone

Dataset "D": Yes, externally for everyone

Dataset "E": Yes, externally for everyone

Dataset "F": Yes, externally for everyone

Dataset "G": Yes, externally for everyone

Dataset "H": Yes, externally for everyone

Under which terms of use or license will the dataset be published or shared?

Dataset "A": Individual licenses apply for digital images of manuscript pages and fragments depending on the holding institutions.

Dataset "B":

Creative Commons Attribution (CC-BY)
Creative Commons Attribution-ShareAlike (CC-BY-SA)

Dataset "C": Creative Commons Attribution (CC-BY)

Dataset "D": Creative Commons Attribution (CC-BY)

Dataset "E": Creative Commons Attribution (CC-BY)

Dataset "F": Creative Commons Attribution (CC-BY)

Dataset "G": Creative Commons Attribution (CC-BY)

Dataset "H":

Creative Commons Attribution (CC-BY)
Creative Commons Attribution-NonCommercial (CC-BY-NC)

When will the data be published (if they are)?

Dataset "H": Nov. 16, 2021

Quality assurance

Which measures of quality assurance are taken for this dataset?

Dataset "A": During the project phase, holding institutions with no or poor manuscript images available are regularly contacted to remedy the situation. Dedicated funds are used to get good digital images and permission to show them unrestrictedly.

Dataset "B": As the manuscript images are constantly worked on, metadata are constantly reviewed and updated as appropriate.

Dataset "C": As diplomatic manuscript transcriptions are published well ahead of the publication of the edition, these transcriptions will be used by peers (faculty and students) through the VMR. A report functionality is in place for easy feedback.

Dataset "D": Users are invited to comment on the critical apparatus. A report functionality is in place for easy feedback.

Dataset "E": Users are invited to comment on the dataset. A report functionality is in place for easy feedback.

Dataset "F": Users are invited to comment on the dataset. A report functionality is in place for easy feedback.

Dataset "G": Users are invited to comment on the dataset. A report functionality is in place for easy feedback.

Dataset "H": Users are invited to comment on the dataset. A report functionality is in place for easy feedback.

Documentation

Data documentation

Which components of the data documentation are available together with the dataset?

Dataset "A": Every single data point from this dataset is exposed via a web API (https://coptot.manuscriptroom.com/community/vmr/api/); every API call is documented there.

Dataset "B": Every single data point from this dataset is exposed via a web API (https://coptot.manuscriptroom.com/community/vmr/api/); every API call is documented there.

Dataset "C": Every single data point from this dataset is exposed via a web API (https://coptot.manuscriptroom.com/community/vmr/api/); every API call is documented there.

Dataset "D": Every single data point from this dataset is exposed via a web API (https://coptot.manuscriptroom.com/community/vmr/api/); every API call is documented there.

Dataset "E": Every single data point from this dataset is exposed via a web API (https://coptot.manuscriptroom.com/community/vmr/api/); every API call is documented there.

Dataset "F": Every single data point from this dataset is exposed via a web API (https://coptot.manuscriptroom.com/community/vmr/api/); every API call is documented there.

Dataset "G": Every single data point from this dataset is exposed via a web API (https://coptot.manuscriptroom.com/community/vmr/api/); every API call is documented there.

Dataset "H": Every single data point from this dataset is exposed via a web API (https://coptot.manuscriptroom.com/community/vmr/api/); every API call is documented there.

Legal and ethics

Personal data

Does this dataset contain personal data?

Dataset "A": No

Dataset "B": No

Dataset "C": No

Dataset "D": No

Dataset "E": No

Dataset "F": No

Dataset "G": No

Dataset "H": No

Sensitive data

Will the data be anonymized or pseudonymized?

Intellectual property rights I

Does the project use and/or produce data that is protected by intellectual or industrial property rights?

Intellectual property rights II

Does copyright law apply to this dataset?

Do other intellectual property rights apply to this dataset?

Storage and long-term preservation

Long-term preservation

Does this dataset have to be preserved for the long term?

Dataset "A": Yes

Dataset "C": Yes

Dataset "D": Yes

Dataset "E": Yes

Dataset "F": Yes

Dataset "G": Yes

Dataset "H": Yes

What are the reasons this dataset has to be preserved for the long term?

Dataset "A":

Basis of a publication/proof of good scientific practice
Re-use in subsequent projects or by others

Dataset "B":

Basis of a publication/proof of good scientific practice
Re-use in subsequent projects or by others

Dataset "C":

Basis of a publication/proof of good scientific practice
Re-use in subsequent projects or by others

Dataset "D":

Basis of a publication/proof of good scientific practice
Re-use in subsequent projects or by others

Dataset "E":

Basis of a publication/proof of good scientific practice
Re-use in subsequent projects or by others

Dataset "F":

Basis of a publication/proof of good scientific practice
Re-use in subsequent projects or by others
Documentation, because of their societal relevance

Dataset "G":

Basis of a publication/proof of good scientific practice
Re-use in subsequent projects or by others
Documentation, because of their societal relevance

Dataset "H":

Basis of a publication/proof of good scientific practice
Re-use in subsequent projects or by others

How long will the data be stored?

Dataset "A": open end

Dataset "B": open end

Dataset "C": open end

Dataset "D": open end

Dataset "E": open end

Dataset "F": open end

Dataset "G": open end

Where will the data (including metadata, documentation and, if applicable, relevant code) be stored or archived after the end of the project?

Dataset "A": Generic data center: https://www.gwdg.de/

Dataset "B": Generic data center: https://www.gwdg.de/

Dataset "C": Generic data center: https://www.gwdg.de/

Dataset "D": Generic data center: https://www.gwdg.de/

Dataset "E": Generic data center: https://www.gwdg.de/

Dataset "F": Generic data center: https://www.gwdg.de/

Dataset "G": Generic data center: https://www.gwdg.de/

Dataset "H": Own institution

Is the repository or data center chosen certified (e.g. Core Trust Seal, nestor Seal, or ISO 16363)? (If the dataset is archived at several places, you may answer this question with yes, if this applies to at least one of these.)