ExploreAt! Project.

Exploring Austrias culture through the language glass This project aims to reveal unique insights into the rich texture of the German Language, especially in Austria, by providing state of the art tools for exploring the unique collection (1911-1998) of the Bavarian Dialects in the region of the Austro-Hungarian Empire. This corpus is large and rich, estimated to contain 200,000 headwords in estimated 4 Million records. The collection includes a five-volume dictionary of about 50,000 headwords, covering a period from the beginning of German language until the present (DBÖ, WBÖ).

ExploreAt! Dublin City University, Adapt Centre

ExploreAt! Dublin focuses on research and implementation of semantic web solutions for describing, annotating, exploring and analysing cultural, historical and lexicographic data collections. This involves top-down and bottom-up ontology development, automatic and semi-automatic semantic annotation, publication and integration of the resources as a linked open data. This project develops methods and tools for semantic interlinking, exploration and analysis of spatial and temporal aspects of legacy data collections. The project further focuses on the development of an open multilingual infrastructure capable of providing access to historical data collections in a language independent manner. The project uses state-of-the-art methods and tools in natural language processing, information extraction, and database systems.

People

Yalemisew Abgaz

Postdoctoral researcher in the Adapt Centre, Dublin City University. His core interest includes semantic web, computational creativity and natural language processing with the aim of fostering a deep understanding of the fundamentals and applying it to solve practical problems. Yalemisew has hands on experience in building a large scale computational creativity system using analogical reasoning. He has expertise in artificial intelligence, programming, database systems, semantic web technologies and standards, and has experience in technologies like RDF/S, OWL, SPARQL and others. In the past, he was a member of Dr Inventor Project (http://drinventor.eu/), a lecturer and project coordinator at Addis Ababa University.

Andy Way

Full professor in Computing. Between 2011-13, he spent a period of sabbatical leave working in the translation and localisation industry in the UK. From the start of 2014, he has been back in DCU full-time as Professor in the School of Computing at Dublin City University, where he began working in 1991. From January 1st 2014, he became Deputy Director of the CNGL Centre for Intelligent Content at DCU. This programme ceased at the end of August 2015, to be replaced by the ADAPT Centre for Digital Content Technology, where he remains Deputy Director.

Semantic Model

Ontology for Lexical Data ANalysis (OLDCAN)

ExploreAt! Project deals with various types of information resources. This includes Questionnaires, answers on paper slips and dictionaries prepared from the answers collected using the paper slips. Questionnaires are used to collect information using various questions about specific objects or concepts. The response was used to prepare the Dictionary of Bavarian Dialects in Austria (WBÖ).

The purpose of this model is to support a shared understanding and formal representation of the concepts, relationships and axioms of the data collection methods used to collect historical-cultural linguistic information. The task involves domain analysis and schema analysis of the questionnaire data and representing the domain with a semantic model.

Semantic Uplifting

The purpose of this task is to convert the questionnaire data into a linked open data. The data is represented using RDF. It involves the conversion of the questionnaire and question data into a linked open data platform. The conversion process uses the ontology developed in the previous task and other available vocabularies such as FOAF, Dublin Core, SKOS etc. The conversion process is done using R2RML. R2RML is used to transform individual records to the respective entities and their attributes as data and object properties. We generated a mapping rule for the questionnaire and the questions.

Linguistic Linked Open Data

ExploreAt! Project deals with various types of information resources. This includes Questionnaires, answers on paper slips and dictionaries prepared from the answers collected using the paper slips. Questionnaires are used to collect information using various questions about specific objects or concepts. The response was used to prepare the Dictionary of Bavarian Dialects in Austria (WBÖ). Purpose of the Model The purpose of this model is to support a shared understanding and formal representation of the concepts, relationships and axioms of the data collection methods used to collect historical-cultural linguistic information. The task involves domain analysis and schema analysis of the questionnaire data and representing the domain with a semantic model.

API Endpoints

ExploreAt provides APIs to query data related to Questionnaires and questions used for historical-lexical data. Here, we provide the APIs only for testing purposes. A full version of the API will be hosted on the main page of the project.

Multiple Questionnaires

The following endpoint provides access to Multiple questionnaires with a limit (max 100) and an offset (1-120) Questionnaires. In its current states, the repository contains 120 questionnaires.

Individual Questionnaires

The following endpoint provides access to individual questionnaire identified by id (1-120). A Questionnaire

Multiple Questions

The following endpoint provides access to multiple questions with a limit (max 100) and an offset Questions.

Individual Questions

The following endpoint provides access to individual questions identified by id. Example, 1 A Question

Multiple Sources

The following endpoint provides access to multiple sources with a limit (max 100) and an offset Sources

Individual Sources

The following endpoint provides access to individual sources identified by id. A Source

Multiple Multimedia

The following endpoint provides access to multiple multimedia with a limit (max 100) and an offset Multimedia

Individual Multimedias

The following endpoint provides access to individual multimedia identified by id. A Multimedia

Multiple PaperSlips

The following endpoint provides access to multiple PaperSlips with a limit (max 100) and an offset Paperslips.

Individual PaperSlips

The following endpoint provides access to individual paperslips identified by id. A PaperSlip

Multiple PaperSlip Record

The following endpoint provides access to multiple PaperSlipRecords with a limit (max 100) and an offset Papersliprecord.

Individual PaperSlip Records

The following endpoint provides access to individual PaperSlip Records identified by id. A PaperSlip Record

Multiple Lemmas

The following endpoint provides access to multiple Lemma with a limit (max 100) and an offset Lemmas.

Individual Lemmas

The following endpoint provides access to individual lemmas identified by id. A Lemma

Multiple Persons

The following endpoint provides access to multiple Persons with a limit (max 100) and an offset Person.

Individual Persons

The following endpoint provides access to individual person identified by id. A Person

Dictionary Sort code generator API

To generate a dictionary sort code for a lemma, attach the desired lemma to the endpoint by replacing "test". The API returns a josn file containing the lemma and its sort code.

SORTING API IS AVAILABLE HERE Dictionary Sorting API

Publications

Upcoming Publications

  1. Yalemisew Abgaz, Amelie Dorn,Barbara Piringer,Eveline Wandl-Vogt. A semantic Model for Traditional Data Collection Questionnaires enabling Cultural Analysis. Submitted to: 6th Workshop on Linked Data in Linguistics LDL-2018: Towards Linguistic Data Science at LREC 2018
  2. Amelie Dorn, Eveline Wandl-Vogt, Yalemisew Abgaz, Alejandro Benito Santos, Roberto Therón. Unlocking Cultural Knowledge in Indigenous Language Resources: Collaborative Computing Methodologies Submitted to: The 3rd workshop on collaboration and computing for under-resourced languages CCURL at LREC 2018

Conference Papers

  1. 1. Piringer, Barbara, Eveline Wandl-Vogt, Yalemisew Abgaz, and Katalin Lejtovicz. 2017. Exploring and exploiting biographical and prosopographical information as common access layer for heterogeneous data facilitating inclusive, gender-symmetric research. Eveline Wandl-Vogt and Lejtovicz, Katalin. Biographical Data in a Digital World 2017. A conference in the framework of the project APIS, 6-7 November 2017.