DM2E

Open Humanities Awards: Early Modern European Peace Treaties Online final update

Lieke Ploeger — Fri, 30 Jan 2015 16:11:08 +0000

*This is the final in a series of posts from Dr Michael Piotrowski, one the recipients of the DM2E Open Humanities Awards – Open track. You can find the final report here.*

Europäische Friedensverträge der Vormoderne online (“Early Modern European Peace Treaties Online”) is a comprehensive collection of about 1,800 bilateral and multilateral European peace treaties from the period of 1450 to 1789, published as an open access resource by the Leibniz Institute of European History (IEG). The goal of the project funded by the DM2E-funded Open Humanities Award is to publish the treaties metadata as Linked Open Data, and to evaluate the use of nanopublications as a representation format for humanities data.

The Final Countdown

Due to the problems at the outset of the project (see my first blog post), we lost about a month, and with the holiday season in between, it’s hard to catch up. So we’ve always been trailing behind the other projects, or at least that’s what it felt like when reading their status updates.

On Friday we finally got the virtual machine to run our stuff on. We had ordered the VM from our computing center before Christmas, but an acute shortage of personnel our order was delayed by several weeks. We immediately started setting up the software and migrating the data from the development machine to the production server and had it up and running by midnight…

We now have Fuseki running on http://data.ieg-friedensvertraege.de. Since you can’t really “see” Linked Open Data, we’ve also set up Pubby, a Linked Data frontend for SPARQL endpoints, which gives the data a friendlier face. For example, the screenshot below shows how the information on the Friedenspräliminarien von Breslau (in English known as Treaty of Breslau) is presented in Pubby.

Figure 1: Screenshot of Pubby

You may note that there are still some errors, and in fact the published state of the data is not quite final at this point—we’re still working on some last-minute issues—but it should give you an idea.

Onging Work

Still missing in this version are links to other LOD sources, but these will be there after the next update in a few days.

As I’ve said above, you can’t really “see” the data, so there isn’t really much to show; what is exciting is the potential it has for automatic processing. As a very low-key example, the homepage at http://data.ieg-friedensvertraege.de currently shows a “live” list of all treaty partners, i.e., when you load the Web page, a query is sent to the SPARQL endpoint to retrieve all entities of typeedm:Agent ¹. We’re planning to replace this rather boring list with something more interesting, for example, a map showing the treaty locations, which could look like this:

Figure 2: Mockup of a map display

However, this will need to wait for the next update, because it is currently not possible to automatically retrieve the geographical coordinates of the places, as they’re not yet linked to suitable data sources.

Winding Down

With DM2E coming to an end, this is also the final blog post for this project on this blog; I’ll report on further updates on my personal blog. I’d like to thank the DM2E project for giving me the award, which has made it possible to make some important steps toward opening up historical research data. I’d also like to thank Magnus and his team for doing the “heavy lifting” in this project.

Footnotes:

¹See the documentation of the Europeana Data Model (EDM).

Open Humanities Hack report

Lieke Ploeger — Thu, 29 Jan 2015 17:57:08 +0000

On 28 November 2014, King’s College London, the Digital Manuscripts to Europeana (DM2E) project and the Open Humanities working group of Open Knowledge held an Open Humanities Hackday as part of the DM2E project. Iain Emsley of the Open Humanities group reports on the event in this guest blog.

Whilst opening data is a useful and important issue, making tools to allow for use and enrichment of the data is perhaps even more important. This is exactly what the Open Humanities Hackday is all about.

The day began with some talks from Net7 about the Pundit tool, King’s College about mobile data and myself giving a talk about the First Folio data that the Bodleian Library have released under CC BY SA. Following these talks, participants split into groups to start hacking with humanities data.

There was a hack to look at personal diaries from the First World War and linking them to Twitter as part of a bot that personalized the war using the soldiers’ own voices. The Pundit team worked with them to enrich the data and add it to their platform. You can now follow @ww1witness on Twitter to see the results.

One of the Kings team worked on mining mobile data and finding out what can be extracted from it in preparation for another hack.

I worked on a citation plugin for WordPress to embed the citations from Shakespeare plays into a post and also a visualization of the speakers and their lines within a play. Taking the idea from citing data sets in the sciences, the citation plugin looks at using existing marked up texts as data sources rather than expecting the user to recite the text and link to it. The visualization of Shakespeare plays was further developed in a subsequent Shakespeare hack day.

These projects explored two of the day’s strands: linking data together and building tools to use open data sets. I found it encouraging that we are able to use an existing tool as well as develop new ones and look at new ideas. With data sets such as the Text Creation Partnership becoming open, these issues can only become more important and we had a really positive day working towards these goals.

Open Humanities Awards: FinderApp WITTFind final update

Lieke Ploeger — Tue, 27 Jan 2015 17:06:12 +0000

*This is the final in a series of posts from Dr Maximilian Hadersbeck, the recipient of the DM2E Open Humanities Awards – DM2E track. You can read the final research paper here (in German).*

The research group “Wittgenstein in Co-Text” worked on extending the FinderApp WiTTFind tool, which is currently used for exploring and researching Wittgenstein’s Big Typescript TS-213 (BT), to the rest of the 5000 pages of Wittgenstein’s Nachlass that are made freely available by the Wittgenstein Archives at the University of Bergen and are used as linked data software from the DM2E project. Our work in January focused on finishing the final milestone of our project, finalizing the device and browser independent multidoc web frontend, successful placing a publication of our project. We close off our blogging series with some final remarks and a look in the future

Delivering the final milestone

We finished our gitlab and docker environment to produce a high quality FinderApp for Digital Humanity projects running under different operating systems. The git-server site is developed and offers all software modules. After extensive working and testing the development server we finalized and transferred the new FinderApp WiTTFind to our master-server, reachable under the permalink http://wittfind.cis.uni-muenchen.de.

To present our extensive work during our git-server based program development, we show in figure 1 a small excerpt of our issue management and branching concept at our git-server. There you see around 20 features, which where finished within the final milestone.

fig. 1 Issue-management and branching-concept of our git-server based development

Finalizing the device and browser independent multidoc web-frontend

After finishing the development work of our bootstrap based FinderApp, we finalized our new multidoc web frontend and will also transfer it to our WiTTFind master server.

Starting in February, we will offer a browser- and device-independent web frontend which works with the same look & feel on mobile devices, tablet computers and arbitrary browsers. In figure 2 you see a screenshot of our new bootstrap driven webpage, reachable under our permalink address: http://wittfind.cis.uni-muenchen.de.

Fig. 2: Our new multidoc webpage (http://wittfind.cis.uni-muenchen.de)

Successful placing a publication

One aim in our project was to place a publication of our work at an important congress. We followed the call for papers of the conference Digital Humanities im deutschsprachigen Raum – DhD 2015 (23-27 February 2015, Graz, Austria) and sent a paper and a poster to the conference committee. Both were evaluated well and accepted. In Graz, we will be presenting the paper “Wittgensteins Nachlass: Erkenntnisse und Weiterentwicklung der FinderApp WiTTFind“. The speaker will be Maximilian Hadersbeck. Co-authors to the text are Alois Pichler, Florian Fink, Daniel Bruder and Ina Arends. The poster which will be presented gives a demo of our project and has the title “Wittgensteins Nachlass: Aufbau und Demonstration der FinderApp WiTTFind und ihrer Komponenten.” The authors of the poster are Yuliya Kalasouskaya, Matthias Lindinger, Stefan Schweter and Roman Capsamun. Both presentations can be found in the programme of the conference: https://www.conftool.pro/dhd2015/sessions.php

Final remarks and a look into the future

With this blog we finish our five months work for the DM2E Open Humanities Award. I want to say thank you very much to the DM2E management and partners for the fruitful cooperation. I also want to say thank you Lieke Ploeger, who organized the award work and the Pisa event absolutely perfect. Thank you Lieke!

We also want to say thank you to Dr. Alois Pichler and his Wittgenstein-Archive Group in Bergen for their continuous, long lasting support and idea delivering. It was always and it is still a great pleasure for us to cooperate with him and his group. Thank you Alois!

But last not least I want to mention those colleagues and students, who really did the fantastic and absolute engaged programming around our git-lab, docker-environment, WITTfind and webpage programming, facsimile integration and so on. Those people are my students Roman Capsamun, Yuliya Kalasouskaya, Matthias Lindinger and Stefan Schweter and my colleagues Daniel Bruder und Florian Fink. Their work, not comparable to the little money they got, brought the FinderApp WiTTFind a great big step in front. Thank you very much to all of them.

Thank you also to the support from LEHRE@LMU from our Ludwig Maximilians University Munich, from Prof. Hinrich Schuetze and Prof. Klaus Schulz from my institute and the Wittgenstein-Trust in Cambridge. They will fund the attendance of the award group to the conference in Graz, where we will celebrate the award work of our FinderApp WiTTFind.

A look in the future: now we are best prepared for the conference in Graz, where we can present an EU-awarded, modern software tool, which is ready for new cooperation in the field of Digital Humanities.

Announcing the EuropeanaTech conference 2015

Lieke Ploeger — Mon, 26 Jan 2015 16:00:39 +0000

On 12-13 February 2015 the 2nd EuropeanaTech Conference will take place at the National Library of Fr ance in Paris. The title of this year’s conference is ‘Making the beautiful thing – Transforming technology and culture’. Presenters and participants from Europe and around the globe will be sharing knowledge and collaborating on the themes of data modelling (including the Europeana Data Model), content re-use, discovery, multilingualism and open data. For more information about the conference, including the themes of the breakout sessions and topics of the renowned international keynote speakers, take a look at the conference programme.

Registration costs 60 Euro and is possible through this Eventbrite page.

This blog entry was written by Kristin Dill (Austrian National Library) and is a condensed version of the one originally posted on the Europeana Professional Website.

Open Humanities Awards: finderApp WITTFind update 4

Lieke Ploeger — Wed, 21 Jan 2015 13:00:59 +0000

*This is the fourth in a series of posts from Dr Maximilian Hadersbeck, the recipient of the DM2E Open Humanities Awards – DM2E track.*

The research group “Wittgenstein in Co-Text” is working on extending the FinderApp WiTTFind tool, which is currently used for exploring and researching Wittgenstein’s Big Typescript TS-213 (BT), to the rest of the 5000 pages of Wittgenstein’s Nachlass that are made freely available by the Wittgenstein Archives at the University of Bergen and are used as linked data software from the DM2E project. The work in December focused on the implementation of new features in WiTTFind, extensive work for the PISA-Demo Milestone, the speech and presentation at the DM2E final event on 11 December in Pisa and following discussions.

Extensive work for our PISA-Demo Milestone

We continued to strengthen out gitlab and docker environment to reach our aim of producing a high quality FinderApp for Digital Humanity projects running under different operating-systems. For our presentation in Pisa we defined a PISA-Demo milestone which introduced new features, defined in 27 issues like: setting permanent webpage configuration-values; redesign webpage with bootstrap and adding multi-doc behavior; new logo and header line; adding scrollbar to our webpage; sort display of hits; switching to HD-facsimile; adapting the facsimile-reader to HD-facsimile; rewriting help-page and new E2E tests. To show the extensive software-activities we did in our gitlab short before the PISA-Demo Milestone, see figures 1, 2 and 3.

fig. 1 Feature- and Issue-List

fig. 2 New Mulitdoc WEB-Frontend see: http://dev.wittfind.cis.uni-muenchen.de

fig. 3 git lab Activities before PISA-Milestone

Speech and Demonstration of our FinderAPP at DM2E final event, 11.12.2014, Pisa

One aim of our award project was to give a speech and a demo at the DM2E Final event in Pisa. Our speech had the title: “Open Humanities Awards DM2E track: FinderApp WiTTFind, Wittgensteins Nachlass: Computational linguistics and philosophy” and the authors where Max Hadersbeck, Roman Capsamun, Yuliya Kalasouskaya, Stefan Schweter from the Centrum für Informations- und Sprachverarbeitung (CIS), LMU, München.
In our speech we first gave a short overview of Ludwig Wittgenstein‘s Nachlass and available texts for our FinderApp. Then we described, what kind of “fine-grained computational linguistic perspectives on editions” our Finder WiTTFind offers. We showed the open source aspects of our software and demonstrated the tools. After this we stepped into the details of our implementations: The rule based access to the data together with local grammars. We showed the differences of our rule-based tool, compared to statistical indexing search machines like google books, Open Library project and apache Solr. We gave a short insight into one basis of our tool, our digital full-form lexicon of Wittgenstein’s Nachlass with 46000 entries (see figure 4 The Digital Lexicon WiTTLex)

fig. 4 The Digital Lexicon WiTTLex

In the next part of our speech, we informed the attendees about other important aims of our project, like extending data to 5000 pages of Wittgenstein’s Nachlass and making our finder openly available to other digital humanity projects by defining APIs and a XML-TEI-P5 tagset. We presented OCR tools for facsimile-integration and a facsimile-reader for the new multidoc environment. The last aim of our project was, that our software should work as an interoperable distributed application (Linux, Macos, Windows) and it should be browser and device independent. We reached this aim by using gitlab, docker and bootstrap software. In the final part of our speech we presented our new multidoc browser frontend (see figure 2).

Discussion after the speech in Pisa

After our speech and at the evening-meeting we had very interesting discussions with the DM2E partners about our rule-based and not statistically access of our FinderApp to the Wittgenstein Nachlass. We showed that the rule-based access works perfect on limited data, like we have here, because with the help of rules (local grammars) we can make a lot of disambiguations in the field of semantics and syntax. The second remarkable point in the discussions were oberservations within the cooperation work with “Humanity”-researchers, in our case philosophers. We found the phenomenon that the philosophers fully accept and use only tools, if they find their specific scientific-language and categories present and if the search tool offers almost 100% precision and 100% recall. We admit, that these limits can never be reached, but what is the important: “Humanists” are not interested in sophisticated programming tricks and features, which computer-scientists love so much, they expect solid and clear algorithms behind finding the sentences around their specified word-phrases. They also expect interactive menus with fine grained setting possibilities, to investigate and influence the way of finding the specific text, which fits to their question.

Open Humanities Awards: Early Modern European Peace Treaties Online update 3

Lieke Ploeger — Mon, 19 Jan 2015 12:04:00 +0000

*This is the third in a series of posts from Dr Michael Piotrowski, one the recipients of the DM2E Open Humanities Awards – Open track.*

On December 11, I was invited to speak about the project at the DM2E Final Event in Navacchio, Italy. I gave a talk entitled “Early Modern European Peace Treaties Online—The LOD Remix.” You can find the slides on SlideShare; for a full report of the event, see the blog post Final DM2E & All-WP meeting, 11–12 December, Pisa.

I gave the talk the subtitle “The LOD Remix” because it started with a brief account of the prehistory, i.e., the project that created the database we used as the starting point for our project: “Europäische Friedensverträge der Vormoderne – online,” which was funded by the DFG from 2005 to 2010. I then went on to describe the current state of our work at that time; you can find the main points in my previous blog post.

I could also report on a surprising discovery I had made just a few days before the event. If you’ve read my previous posts, you may have noticed that we’re missing one interesting type of information: the names of the negotiators involved in the negotiation of the treaties. So I was happy to tell the audience that in the context of the BMBF-funded project Übersetzungsleistungen von Diplomatie und Medien im vormodernen Friedensprozess. Europa 1450–1789 (June 2009–May 2012), researchers at the University of Augsburg have gathered all the negotiators relevant to the treaties contained in our database, as well as the languages they’re written in. Some of the data is published as lists on their website, but these lists are actually exported from a Microsoft Access database, which contains additional information.¹ We’re in contact with our colleagues in Augsburg and working towards a way to combine their data with our data and to publish everything as LOD. We may not be able to complete the merge in time for the first release, but we hope to finish it soon afterwards.

I also had some interesting conversations during the event, which prompted me to think a bit more about the modeling from a conceptual perspective. Our current modeling essentially represents the contents of the original relational database in RDF. For a future version I’d like to re-examine the relations between the various entities involved, such as those between a conclusion of peace(an event), a peace treaty (perhaps a work in the sense of FRBR), and various copies and versions of a treaty (manifestations).

We hope to release the first version next week, and in the next post I will then describe this release and maybe do a little retrospective of the project.

Footnotes:

¹The database is described in: German Penzholz, Andrea Schmidt-Rösler (2014). “Die Sprachen des Friedens – eine statistische Annäherung”. In: Johannes Burkhardt, Kay Peter Jankrift, Wolfgang E. J. Weber (eds.): Sprache. Macht. Frieden. Augsburg: Wißner. PDF

Modeling the Scholarly Domain

Lieke Ploeger — Thu, 15 Jan 2015 14:29:28 +0000

1 Introduction

The aim of DM2E’s Task 3.4 was to investigate how a digital humanist uses digital research tools and how his actions can be modeled. With respect to the environment of Linked Data and Europeana, the initial question were “What does the humanist want to do with the digital tools?” and “What are the ‘functional primitives’ of the digital humanities.”

With our model, the Scholarly Domain Model (SDM), we try to initiate and encourage more reflection on the methods of humanities scholars in a digital environment, and, at the same time, to connect the development of applications closer to scholarly practices.

2 Scholarly Domain Model

The model groups together the activities of a generic research process and constitutes the primitives of scholarly work. The model itself consists of different layers of abstraction, each one describing the field with more granularity. The top layer displays Research as the central aspect of the SDM, but it makes it clear that is dependent on input and will ideally produce output. The arrows leading back to input indicate that the output of one iteration can be used as input to a following research process.

Additionally, Research is embedded in a social context, which includes collaborative aspects and a documentary context, like reporting to a funding organization or blogging about a project.

Each layer zooms more in on the different activities that can be part of the scholarly domain. The lowest level, Scholarly Operations, being designed to be domain-specific implementations of the upper levels: all scholars use references, but how is a reference specifically done e.g. in linguistics?

We see the SDM as a living model and a framework for discussions on the humanities and the digital domain.

3 Application Scenarios

We hope that with the implementation of the SDM as an RDFS/OWL ontology we can build a bridge between the model and applications recurring to this model. Also, the model will help to identify gaps in the scholarly workflow and design tools that fill these gaps. Thirdly, aspects of scholarly work that are not yet covered by tools can be integrated. By providing this model we aim to contribute a significant building block to the digital humanities and try to enhance the sustainability of infrastructures.

A full report on the Scholarly Domain Model and the work conducted in Task 3.4 will be published at the end of DM2E in January 2015.

Steffen Hennicke, Humbolt-Universität zu Berlin

Final DM2E & All-WP meeting, 11-12 December, Pisa

Lieke Ploeger — Mon, 05 Jan 2015 22:16:24 +0000

The DM2E consortium in Pisa

Last month the DM2E project organised its final event under the title ‘Enabling humanities research in the Linked Open Web‘. Over 50 participants gathered at the Auditorium Incubatore of the Polo Tecnologico di Navacchio near Pisa, Italy to hear more about the final results of the three-year project, as well as those of the winners of the second round of the Open Humanities Awards.

DM2E: context and background

The day started with a welcome on behalf of the project coordinator, Humboldt Universität zu Berlin, by Violeta Trkulja. She briefly introduced the project, the partners involved and the main activities and outcomes.

Then it was time for the keynote: Sally Chambers (DARIAH-EU and Göttingen Centre for the Digital Humanities) gave an inspiring talk on sustainable digital services for humanities research communities in Europe, and the role that the DARIAH infrastructure can play in this regard.

Antoine Isaac (Europeana) followed with an illustration of the relevance of the DM2E results in the wider context of Europeana, Europe’s platform to access cultural heritage. Because of the work done within the project, there is now more material available in Europeana that is relevant to digital humanities researchers, workflows for data aggregation have been improved and of course the EDM (Europeana Data Model) has been specialized for the manuscript domain.

DM2E results

The work package leaders of the four DM2E work packages then went on to present what has been achieved over the course of the project in the areas of:

Content aggregation to Europeana (WP1 – Doron Goldfarb, Austrian National Library);

The interoperability infrastructure, including the DM2E data model, ingestion and contextualization – even with an ‘Oh, yeah?’ button (WP2 – Kai Eckert, University of Mannheim);

The Pundit tool for semantic annotation and enrichment (WP3 – Christian Morbidoni, Università Politecnica delle Marche / Net7 & Alessio Piccioli, Net7);

Experiments conducted to investigate how humanists work with linked data and tools such as Pundit and to better understand their reasoning process (WP3 – Steffen Hennicke, Humboldt Universität zu Berlin);

And community building around open data for cultural heritage and humanities research (WP4 – Lieke Ploeger, Open Knowledge).

Open Humanities Awards – round 2

Another part of the programme was reserved for the winners of the second round of the Open Humanities Awards. The first winner of the Open track, Michael Piotrowski (Leibniz Institute for European History) talked about how the metadata of European peace treaties from the Early Modern period can be published as Linked Open Data and evaluate the use of the nanopublications format for such content, so that it can become a more valuable resource for the humanities. Read more

Also for the Open track, Rainer Simon (Austrian Institute of Technology) shared the success story of the SEA CHANGE (SEmantic Annotation for Cultural Heritage And Neo-GEography) workshops, where people helped turn raw geographical data into Linked Open Data with the Recogito geo-annotation tool. In just two workshops, nearly 15.000 contributions were made, a great crowdsourcing achievement. Read more

For the DM2E track of the awards, Max Hadersbeck (University of Munich) demonstrated the impressive FinderApp WITTFind, with which it is possible to search Wittgenstein’s Nachlass in an extensive way, because of the incorporation of a full-form lexicon and features such as highlighting the hits in the displayed facsimile – all very valuable for researchers. Read more

It was very inspiring to see the impact the Open Humanities Awards have had on furthering teaching and research in the humanities – once again congratulations to all winners!

DM2E Final All-WP meeting

On the following day, Friday 12 December, the project consortium held the final All-WP meeting in Pisa. The first part of the meeting was dedicated to wrapping up the achievements of the four work packages: each WP lead gave an overview of the main achievements in the project, with a focus on the final six months.

(for WP2 and 3, the slides were similar to those presented at the final event on 11 December, so you can find the slides above under ‘DM2E results’)

Then some time was reserved for a look at applications developed for creative use of cultural contents in another FP7 research project, AthenaPlus, presented by Gábor Palkó (Petőfi Literary Museum).

The day concluded with a collective lookback at lessons learned throughout the project, and preparation for the final month of reporting, wrap up and review. After two successful days it was time for a final concluding consortium lunch.

Many thanks to all participants for joining our final event, and of course you can follow us on either Twitter or our blog to stay updated on all final deliverables!

Open Humanities Awards: SEA CHANGE final update

Lieke Ploeger — Tue, 09 Dec 2014 10:27:18 +0000

This is the final post from Dr Rainer Simon, one the recipients of the DM2E Open Humanities Awards – Open track. The final report is available here.

Last Thursday, the University of Applied Sciences Mainz was the scene of our second SEA CHANGE annotation workshop. First things first: Mainz broke the record! Despite a few participants less than last time in Heidelberg and, overall, a few minutes less time, they made it. Below, I’ll speculate a bit on how and where Mainz may have scored those extra points. But for the sake of completeness, I should point out that this public boasting on twitter by our host Kai-Christian Bruhn would have made a day without breaking the record SOMEWHAT embarrasing

A day without a new record? No longer an option now for Mainz, it seems.

But Kai’s students did not let him down. The day ended with a breathtaking 7.511 contributions, totally smashing our previous record of 6.620 by almost 900! Totes amaze. We were knocked out by their efforts. Kudos to the students at Mainz!

Annotation Session

Like last time, Leif and I kicked off the day by introducing the Pelagios project (into which the SEA CHANGE results feed into) to our audience. Participants had a mixed background (engineering and archaeology), attending a joint course in both universities in Mainz. After a guided tour of our Web based geo annotation tool Recogito people got to work.

One of our participants at work on a 15th century Portolan by Grazioso Benincasa.

At times, the silence in the room was almost eerie as the students set about working on, in a highly concentrated way, a selection of both texts (Medieval geographic narratives and travel writing) and maps (a set of beautiful maritime charts from the 14th and 15th century). Despite the fact that most students missed the expert knowledge about the texts and historical background, they obviously found it meaningful to add annotations. They grasped the idea and, to quote Kai, They won the race because what they were doing during the annotation session was meaningful to them.

To back this up with some numbers: here’s the sum total of what the day ended with.

Approx. 2.600 place name identifications in text. That’s almost an identical number to our first workshop. So far so good.
Almost 3.200 place name identifications on images. Wow! That’s almost 700 more than last time!
About 620 map transcriptions. That’s a bit less than last time, where we had 830.
My personal favourite: 544 gazetteer resolutions. That’s almost four times as many as last time! Gazetteer resolution is the type of activity that’s most complex and time-consuming. Since our last workshop, we completely overhauled the user interface for this, and it’s great to see such an improvement.
537 other activities such as corrections, comments, deletions, etc.

It’s good to see how stable the number of place name identifications in text was. This seems to show that (despite the occasional glitch and known issue) Recogito has really reached a level of maturity now. It’s also interesting to see how many more place name identifications we had in images this time. My personal take on this is that the different material may have played a small part in there, too. Portolan charts are very “dense” in place names, and the place names are typically arranged in sequence, in the same orientation. So there is less need to search and navigate the map. That may have allowed for slightly speedier tagging this time. On the other hand, though, the style of lettering in these maps was rather different from last time and much more challenging for the non expert to decipher. This may well be the reason why we got a lower number of transcriptions on this occasion. But in any case: the overall result speaks for itself.

Data Re-Use Session

The late afternoon was again dedicated to the topic of data re-use. This time, however, we tried something a little different. We ran two sessions in parallel. Participants could choose between them, depending on their own interest and background. Leif once again walked his half of the audience through a tutorial that uses the Open Source Geographical Information System QGIS to explore a medieval travel itinerary embedded in 3D terrain. (The resulting 3D visualization is available online here). In the meantime I ran a small “Pelagios hack tutorial” in which I guided the other half of the audience through three JavaScript examples that demonstrate how you can easily re-use Pelagios data in your own applications and mashups through our API, e.g. to create Web maps, timelines or network graphs. (The tutorial examples are on GitHub here.)

Well, I guess this concludes the SEA CHANGE project. Leif, Elton, Pau and I are happy to have gotten the opportunity to do this, and are very excited about what came out of it. We would love to repeat workshops like these at some point. (Maybe also in virtual online form?) If you’re interested in participating or hosting: by all means, do get in touch!

Last but not least (my standard reminder…): above all, our project is about gathering data and making it openly available to everyone. So do take a look at the CC-licensed annotation data that is now available for download through Recogito, as well as through the Pelagios API. We’d love to hear from you!

Open Humanities Awards: finderApp WITTFind update 3

Lieke Ploeger — Mon, 08 Dec 2014 09:00:31 +0000

*This is the third in a series of posts from Dr Maximilian Hadersbeck, the recipient of the DM2E Open Humanities Awards – DM2E track.*

The research group “Wittgenstein in Co-Text” is working on extending the FinderApp WiTTFind tool, which is currently used for exploring and researching Wittgenstein’s Big Typescript TS-213 (BT), to the rest of the 5000 pages of Wittgenstein’s Nachlass that are made freely available by the Wittgenstein Archives at the University of Bergen and are used as linked data software from the DM2E project. In November, they concentrated on delivering a development milestone for the final DM2E event in Pisa, redesigning the WiTTFind web frontend and integrating of a new facsimile (typoscript/manuscript).

Delivering development milestone

We continued to strengthen out gitlab and docker environment to reach our aim of producing a high quality FinderApp for Digital Humanities projects running under different operating systems. All software development is done within a git-branching model, as proposed by professional software teams. We run one tagged “master” branch, which is deployed on our master server and a “development”-branch running on our development server. All new software features are programmed, added, maintained and tested within a specific feature-branch (see Fig. 1). We define milestones, where all different feature developments have to be finished and are merged into the development-branch and deployed at the development server (http://dev.wittfind.cis.uni-muenchen.de). After extensive working and testing on the development-server we finish the programs at the development-branch and transfer it to the master-branch, which will be the new release of our FinderApp.

Fig. 1: git branching model

Enlarging our E2E tests for continuous tests

To detect software errors almost during software development and during the integration into the development branch we wrote a lot of automatic E2E tests (End-to-end-tests), which must succeed before we accept and integrate a new feature. E2E-tests are very similar to integration tests, because they test the interaction between different software components. They are mainly used in web frameworks and development of webpages, because they simulate web users activities in an automatic way. As testing environment software we use casper.js.

Web frontend for multidoc

Discussions with a web specialist and ideas from the “Nietzsche-Source” webpage led to the decision to rewrite our WiTTFind webpage:

Many documents have to be searched and the user should not lose the overview
There should be the same look & feel for different browsers and web devices
Our webage should use much more modern browser-features to offer dynamic behavior
We use suggestions from the Web Corporate Identity Team of our university

To reach this goals, we decided to use the software bootstrap, one of the most popular HMTL, CSS and javascript framework for web development. With this framework, our WiTTFind webpage can be called with the same look & feel from mobile devices, tablet computers and arbitrary browsers. In Fig. 2 you can see a first screenshot of our new bootstrap driven webpage.

Fig. 2: Our new multidoc webpage – http://dev.wittfind.cis.uni-muenchen.de

Integration and OCR of new HD-Facsimile (Typescript/Manuscript)

After we integrated the new high density-facsimile in our WiTTFind project-structure, we started to OCR the scans and can show first OCR results with the use of the OCR-Software tesseract. The OCR results of typescripts are rather good, compared to the results of OCR scanning of manuscripts. In Fig. 3 and Fig. 4 you see the results in the right column. Currently we are working on a multiuser-semiautomatic web-based correction-tool for OCR errors.

Fig. 3 OCR of a Wittgenstein typescript-scan

Fig. 4 OCR of a Wittgenstein manuscript-scan

Video: Extensive work for our PISA demo milestone

For our presentation in Pisa at the DM2E final event we defined a new milestone, where we fixed 27 issues, including setting permanent webpage configuration-values; redesign webpage with bootstrap and adding multidoc behavior; new logo and header line; adding scrollbar to our webpage; sort display of hits; switching to HD-facsimile; adapting the facsimile-reader to HD-facsimile; rewriting help-page, semantic-finder and graphical finder; new E2E tests. To show the extensive software-activities we did in our GIT-Lab short before the PISA-Demo Milestone, we produced a git-activity-video.
You can watch it here: http://wast.cis.uni-muenchen.de/tutorial/gitlab-log/