Open Humanities Awards: finderApp WITTFind update 3

*This is the third in a series of posts from Dr Maximilian Hadersbeck, the recipient of the DM2E Open Humanities Awards – DM2E track.*

The research group “Wittgenstein in Co-Text” is working on extending the FinderApp WiTTFind tool, which is currently used for exploring and researching Wittgenstein’s Big Typescript TS-213 (BT), to the rest of the 5000 pages of Wittgenstein’s Nachlass that are made freely available by the Wittgenstein Archives at the University of Bergen and are used as linked data software from the DM2E project. In November, they concentrated on delivering a development milestone for the final DM2E event in Pisa, redesigning the WiTTFind web frontend and integrating of a new facsimile (typoscript/manuscript).

Delivering development milestone

We continued to strengthen out gitlab and docker environment to reach our aim of producing a high quality FinderApp for Digital Humanities projects running under different operating systems. All software development is done within a git-branching model, as proposed by professional software teams. We run one tagged “master” branch, which is deployed on our master server and a “development”-branch running on our development server. All new software features are programmed, added, maintained and tested within a specific feature-branch (see Fig. 1). We define milestones, where all different feature developments have to be finished and are merged into the development-branch and deployed at the development server (http://dev.wittfind.cis.uni-muenchen.de). After extensive working and testing on the development-server we finish the programs at the development-branch and transfer it to the master-branch, which will be the new release of our FinderApp.

Fig. 1: git branching model

Enlarging our E2E tests for continuous tests

To detect software errors almost during software development and during the integration into the development branch we wrote a lot of automatic E2E tests (End-to-end-tests), which must succeed before we accept and integrate a new feature. E2E-tests are very similar to integration tests, because they test the interaction between different software components. They are mainly used in web frameworks and development of webpages, because they simulate web users activities in an automatic way. As testing environment software we use casper.js.

Web frontend for multidoc

Discussions with a web specialist and ideas from the “Nietzsche-Source” webpage led to the decision to rewrite our WiTTFind webpage:

  • Many documents have to be searched and the user should not lose the overview
  • There should be the same look & feel for different browsers and web devices
  • Our webage should use much more modern browser-features to offer dynamic behavior
  • We use suggestions from the Web Corporate Identity Team of our university

To reach this goals, we decided to use the software bootstrap, one of the most popular HMTL, CSS and javascript framework for web development. With this framework, our WiTTFind webpage can be called with the same look & feel from mobile devices, tablet computers and arbitrary browsers. In Fig. 2 you can see a first screenshot of our new bootstrap driven webpage.

Fig. 2: Our new multidoc webpage – http://dev.wittfind.cis.uni-muenchen.de

Integration and OCR of new HD-Facsimile (Typescript/Manuscript)

After we integrated the new high density-facsimile in our WiTTFind project-structure, we started to OCR the scans and can show first OCR results with the use of the OCR-Software tesseract. The OCR results of typescripts are rather good, compared to the results of OCR scanning of manuscripts. In Fig. 3 and Fig. 4 you see the results in the right column. Currently we are working on a multiuser-semiautomatic web-based correction-tool for OCR errors.

Fig. 3 OCR of a Wittgenstein typescript-scan
Fig. 4 OCR of a Wittgenstein manuscript-scan

Video: Extensive work for our PISA demo milestone

For our presentation in Pisa at the DM2E final event we defined a new milestone, where we fixed 27 issues, including setting permanent webpage configuration-values; redesign webpage with bootstrap and adding multidoc behavior; new logo and header line; adding scrollbar to our webpage; sort display of hits; switching to HD-facsimile; adapting the facsimile-reader to HD-facsimile; rewriting help-page, semantic-finder and graphical finder; new E2E tests. To show the extensive software-activities we did in our GIT-Lab short before the PISA-Demo Milestone, we produced a git-activity-video.
You can watch it here: http://wast.cis.uni-muenchen.de/tutorial/gitlab-log/

 

 

Comments are closed.