Mapping the “Polytechnisches Journal” to DM2E

Johann Gottfried Dingler, born 1778, was a German chemist and industrialist. He realised that the reporting on technological innovations was insufficient in his time. In 1820 he started to publish his “Polytechnisches Journal” on a monthly basis, which included scientific articles in the field of electrical technology, mining and chemical engineering, and the translation and discussions of European patent specifications. The journal is often referred to as “Dingler” and seen as a valuable resource:

“The journal was published over a period of 111 years and has hence became an important and European-wide source for the history of knowledge, culture, and technology — in Germany at least it is without compare” (Polytechnische Journal website).

The Humboldt-Universität zu Berlin provides the digitised edition of the “Polytechnisches Journal” (figure 1) which was created by the Institute for Cultural History and Theory at Humboldt-Universität zu Berlin in cooperation with Saxon State and University Library Dresden (SLUB). During the DM2E project, the Berlin School of Library and Information Science created a mapping from the metadata of the digitised journal to the DM2E model.

Figure 1: Screenshot from the digitised edition of the Polytechnische Journal

Facsimile: CC-BY-NC-ND 3.0 (SLUB Dresden), Text: CC-BY-SA 3.0 (HU Berlin)

The schema language used to describe the metadata is non modified TEI-P5 XML. The logical description of the records follows the recommendations of the TEI guidelines.

Background: The Metadata Format TEI

TEI stands for Text Encoding Initiative, which is a consortium for the contributed development of a standard metadata format for the representation of texts in digital form. The provided guidelines by the initiative are standard specifications for encoding methods for machine-readable texts. The TEI guidelines are widely used by libraries, museums, publishers and individual scholars to present texts for online research, teaching and preservation. The most recent version of the guidelines is TEI-P5.

The provided metadata for the mappings in DM2E came directly from the owner and creator of the records, the Institute for Cultural History and Theory. For the finalised version of the mapping to the DM2E model, DM2E got local copies of the last modified TEI-XML metadata records of the complete journal on volume and on article level.

The current mapping is based on the first test mappings which were carried out using the DM2E model v1.0 schema in MINT. Two different ore:Aggregation and edm:ProvidedCHO classes were created: one for a journal issue, another for a journal article. After the first mapping circle with MINT, which already included about two-thirds of the first mapping, further mapping steps were carried out by manually working on the MINT output (supported by the Oxygen editor). This was mainly done due to readability reasons (the output file was split up into different files for the creation of journal issues and articles), to reduce redundant steps in the mapping workflow (URIs of all classes were created as variables instead of typing them repeatedly) and to include steps that were not possible to proceed with MINT (e.g. normalising URIs or the creation of titles for smaller CHOs). Furthermore, the mappings were first created for the DM2E model v1.0 and then manually adapted to DM2E v1.1. It was much easier and faster to do this step by hand than by repeating the whole mapping in MINT.

The structure of the XSLT custom script is based on the XSLT script provided by the Berlin State Library and further developed for the requirements of the Institute for Cultural History and Theory at Humboldt-Universität zu Berlin.

The TEI data of the Dingler records are mapped on journal, issue, article and page level since almost all TEI documents encode full texts. Basic provider descriptive metadata from the TEI header is transformed in DM2E without any loss of data. Missing mandatory elements that the DM2E model requires are completed by default values.

Although all TEI-encoded full texts are based on philological methods, there are almost no semantically marked up persons, corporate bodies, or other subjects. In order to produce not only RDF literals, but URI references (resources), full text literals have to be transformed into URIs during the mapping or have to be extracted and processed by SILK in a second step, the contextualisation.

Representation of Hierarchical Levels

The TEI records include a representation of the hierarchical structure of the journal. The top-level is described within the TEI-header on article level and includes the basic metadata about the physical journal and about the online journal as well. The metadata on the journals is mapped to the top-level CHO, which is related to the sub-level-CHOs on the next level, the issues of the journal, via the dcterms:hasPart property. Issues include articles, which in turn gather CHOs on the lowest representational level in the object hierarchy: the pages. All top-down hierarchical relations are described by dcterms:hasPart and respectively with dcterms:isPartOf for all bottom-up relations, as these are inverse properties.

Figure 2 illustrates the hierarchical concept in the Dingler records. The linear relations between the resources on one level are defined with the property edm:IsNextInSequence as proposed in the Europeana Data Model specification.

Bild1.png

Figure 2: Hierarchical concept in the Dingler records

 

Julia Iwanowa, Evelyn Dröge and Violeta Trkulja

Berlin School of Library and Information Science, Humboldt Universität zu Berlin

Comments are closed.