Data Publication


Possibilities for Publishing Research Data

Publishing data means making it accessible and citable. A persistent identifier or PID is an important compoent of a dataset's citability.

Different models exist for the publication of research data. These models can be separated into three basic groups:

  • As an independent object in a research data repository
  • With textual documentation in a data journal
  • In the form of a data supplement to enrich an interpretive text publication


Persistent Identifier (PID)

An identifier is the unique identification of a (digital) resource. The International Standard Book Number or ISBN is an example of an identification system used in print media. The Uniform Resource Locator or URL is often used for digital objects. Due to their short lifespan, some URLS are not suitable for longterm and clear scientific citability of research data. So called persistent identifiers, PIDs, are needed in these cases.

The PID gives research data a permanent and unchangeable identifier (URI), which it retains during its entire lifecycle. Examples of such PIDs include the Handle System®, Digital Object Identifier (DOI), and Archival Research Key (ARK). With the help of PIDs research data can be clearly identified, found, and cited similar to a print publication. There are also PIDS from third-party suppliers (e.g. DOI) and processes that must implemented locally (e.g. HANDLE).

DOI is the most well known and widespread system for citing and searching for research data. The members of DataCite e.V serve as the DOI allocation office for research data.

I. Research Data Repositories

In a repository the publication of research data occurs independent from the time and place of an interpretive publication.

Depending on the discipline there or more or less numerous research data centers and repositories. The project Re3Data (Registry of Research Data Repositories), funded by the DFG, deals with the quality requirements for research data repositories in various scientific disciplines and aims to gather these in a central, web-based search and documentation system.

The World Data System of the International Council for Science (ICSU) is a network of data centers and other institutions that store research data for individual disciplines or offer other services in this field. Through evaluation the ICSU World Data System guarantees quality, long term availability, interoperability, and data service to its members. The association DataCite offers a list of repositories, whose quality has been verified, with which it collaborates when allocating DOIs. The list also containts the repositories of the ICSU World Data System.

Important, interdisicplinary research data repositiories, which also support the allocation of DOIs, are:



Examples of subject-specific data archives are Dryad for the biosciences/medicine and Pangaea for the geosciences.

II. Data Journals

Research data are published with a, generally, reviewed, and non-interpretive documentation in data journals. This form of data publication requires that data are also composed in a text by the researchers. This text does not contain any interpretation but documents how the data was gathered (methods, structure, relevance, access). This contextual information, which has been reviewed for its quality, enable interested individuals to re-use the data. Review processes ensure the quality of the published datasets. Like text publications, published datasets cannot be altered.

With this type of publication only the dataset documentation is often published in journals, while the data itself is located in an external repository. In this case the data journal and repository are separate.

Examples of Data Journals

Scientific Data (NPG)

Earth System Science Data (ESSD)


Journal of Open Archaeology Data (JOAD)

Journal of Chemical and Engineering Data

Scientific Technical Report Data (STR-Data), GFZ

III. Data Supplements

It's possible to supplement interpretive text publications with so called data supplements. The publication is based on the data that particularly supports the interpretive publication. In this publication model the research data is most closely linked to traditional publication processes.

Originally supplemental data were published as an appendix to interpretive articles in the form of illustrations, tables, or detailed information about the methodology. In this case the data was published on the same platform as the textual publications, but not individually addressed and standardized. In this model the data are a part of the article.

In newer models research data can also be published as independent objects on the publisher's platform or at the same time as the interpretive article in an external data repository. This citability is ensured through the persistent identifier. This identification is then provided in the interpretive article. In some cases there are collaborations between publishing houses and data repositories (e.g. Elsevier and PANGEA). In this model textual publications and research data can be individually used and cited.