JudaicaLink: pioneering initiative to link reference works on Jewish culture and history

The DM2E project has provided the inspiration for two of its partners ― Dr Kai Eckert of the University of Mannheim and Dov Winer of the European Association for Jewish Culture and Judaica Europeana ― to embark on an initiative to publish existing reference works on Jewish history and culture as Linked Data under the name JudaicaLink.

Reference works such as encyclopedias, glossaries, thesauries or catalogues function as guides to a scholarly domain as well as anchor points and manifestations of scholarly work. On the web of Linked Data, they can perform a key function of interlinking resources related to the described concepts. In effect, this means they can be enriched by creating new links between and within different encyclopedias. This function could revolutionize the work of digital humanists’ and become the bread and butter of their research diet.

To our almost certain knowledge JudaicaLink is the first such initiative and platform in the field of Jewish studies.

JudaicaLink: a plaform for access to Linked Data versions of encyclopedias

Like with many pioneering LOD publishing efforts, the first challenge was to persuade the publishers and maintainers of such reference works to give their permission to create a Linked Data version of their encyclopedia and publish it on JudaicaLink.org. Provided the work is already online, the minimal requirement is that the URLs of the articles in the encyclopedia remain stable.  It is also possible to publish an LOD version of a given work on the publishers own website provided they have the technical infrastructure and capacity to do so. In this case, JudaicaLink can provide information and a central search functionality.

The YIVO Encyclopedia of Jews in Eastern Europe

We have been fortunate that after some discussion the leaders of the YIVO Institute for Jewish Research in New York saw the potential of LOD for their extraordinary YIVO Encyclopedia of Jews in Eastern Europe and gave us the go ahead. From the point of view of a Linked Data enthusiast, the YIVO Encyclopedia is really a great resource. All articles are highly interlinked, often they even provide a hierarchy of sub-concepts described under a superordinate concept. Links to glossary terms provide further terminological control.

An example of topic headings from the YIVO Encyclopedia of Jews in Eastern Europe published in a LOD format

Encyclopedia of Russian Jewry

Recently, JudaicaLink announced also the first release of the Encyclopedia of Russian Jewry (Rujen), published in Moscow since 1994, as Linked Open Data. Rujen is not as interlinked as YIVO and the articles are much shorter on average, but it contains many more articles (about 20,000 compared to about 2,500 YIVO articles). The first obvious feature of Rujen for English-speaking people is the language: it’s Russian. The Cyrillic alphabet raises an important question regarding Linked Data: how to coin the URIs for the articles. We are still considering different solutions. Basically, there are three options based on the actual identifier, the Cyrillic title of the article:

  1. Use a percent-encoded URI where the Cyrillic letters are represented by numbers. For example: http://data.judaicalink.org/data/rujen/%D0%93%D0%B0%D1%82%D0%BE%D0%B2_%D0%A8%D0%B0%D0%BF%D1%81%D0%B5%D0%BB%D1%8C_%D0%B3%D0%B8%D1%80%D1%88%D0%B5%D0%B2%D0%B8%D1%87. This is technically straightforward, but has several disadvantages. First of all: no one can read it, not even Russians. We tested it. Second, people could inadvertently use the decoded form, because the browser conveniently decodes the URI when entered in the address line. But this would be a different URI, actually a IRI, which brings us to the next option:
  2. Use an Internationalized Resource Identifier, an IRI. For example: http://data.judaicalink.org/data/rujen/гатов_шапсель_гиршевич. This is perfectly readable (at least by Russians) and probably the option to be preferred. However, it is not clear if all applications support IRIs correctly and we would like to have the data as easily accessible as possible. Therefore, and also because we wanted to try it, we decided on the next option:
  3. Use a transliterated URI. For example: http://data.judaicalink.org/data/rujen/gatov_shapselj_girshevich. Again, this is perfectly readable, and since Rujen is mostly about persons and locations, people familiar with the Latin alphabet can make sense of it. However, there are drawbacks. We did not transliterate just because of our widely shared ignorance of the Cyrillic alphabet. We adopted this option because we wanted to have valid URIs, for the sake of backwards-compatibility and technical interoperability. This means using only the 26 common Latin letters, no diacritics, no special characters. And the transliteration should be simple, based on a lookup-table that translates every Cyrillic letter consistently to one or more Latin characters. This is obviously not an ideal way to transliterate and, according to our tests with our nice colleagues from Belarus and Russia, it quite often produces somewhat strange results for native speakers. However, they assured us that it is still readable and not insulting.

Actually, there is also a fourth option that is completely different: using some kind of numbering or code scheme (for example a hash value of the title), but despite leading to shorter URIs, this again has the effect that no one can make sense of it, similar to option 1. There are people who advocate this approach precisely for this reason: a URI should not contain possibly misleading semantics. And, of course, a number does not show an arbitrary preference for a language or an alphabet.

So, we settled for transliteration as our first attempt, but we are curious about your ideas and opinion. After reading this long and hopefully interesting digression, you are probably much more interested in the question: how can I access this LOD resource?

The easiest way is the following: while browsing the YIVO Encyclopedia, you can access the data representation by simply replacing www.yivoencyclopedia.org/article.aspx/ with data.judaicalink.org/data/yivo/ in the URL field of the browser. For convenience, you can also use our bookmarklets. They are provided together with additional information for each encyclopedia here. Just drag and drop them to your bookmarks and when you click on this bookmark while on an encyclopedia article, you will be directed to the Linked Data version. For an even quicker look, you can also just start at the concept used above in option 3, or for YIVO, for example here: http://data.judaicalink.org/data/html/yivo/Abramovitsh_Sholem_Yankev.

All in all, JudaicaLink now provides access to 22,808 concepts in English (approx. 10%) and in Russian (approx. 90%), mostly locations and persons.

JudaicaLink gets links

From the beginning our vision was not only about the provision of stable URIs and data for concepts described in the encyclopedias. It was also about the generation of links between these resources and other linked data resources on the Web. In a first run, we used Silk to generate links between JudaicaLink and the following sources:

All the links have been created automatically and are primarily based on the labels of the resources, so some wrong links are to be expected. Nevertheless this is an important first step. For the present, we provide the links directly together with the resource descriptions (as owl:sameAs links), but we will separate them with proper identification of the provenance as soon as we are able to use more sophisticated linking approaches. One immediate benefit from this simple linking is that we could already generate links between both encyclopedias. This works because several sources, like DBpedia, are multilingual and therefore links to both encyclopedias could be established. Whenever a single resource has two links to one resource in each encyclopedia, an additional link establishing the identity of these two resources could be inferred.

JudaicaLink arrives in the Cloud

With all these links, JudaicaLink is now also part of the famous LOD Cloud that was released recently in its latest version. You can find us at about 4 o’clock, close to the border and right beside our neighbour project DM2E.

image2

We hope the readers of this blog will spread the word and help us to convince more publishers to work with us. And do let us know what you think of JudaicaLink and what additional ideas you have. We look forward to hearing from you!

Kai Eckert, University of Mannheim

Lena Stanley-Clamp, EAJC

Comments are closed.