All posts by year

Showing everything matching: clear

2021

EndoLinked: A Platform for Documenting Endonyms in Minority and Endangered Languages using Linked Data

15 Mar 2021
research proposal linked data
Given that I cannot currently focus on too many things at the same time due to existing projects, I would like to write this idea in the form of a blog post. To be addressed, hopefully, sometime in the future! There are thousands of languages around the world which are regional, minority, endangered or under-documented. Oftentimes, these languages are spoken within oppressed linguistic communities where the development and documentation of the language have received trivial...
More...
2020

10 basic but essential SPARQL queries for lexicographical data on Wikidata

10 Dec 2020
NLP linked data
The Semantic Web as an extension of the World Wide Web (WWW) represents an effective means of data representation and enables users and computers to retrieve and share information efficiently. The Resource Description Framework (RDF) is the foundational data model for Semantic Web. Unlike traditional databases, such as relational ones, where data has to adhere to a fixed schema, RDF documents are not prescribed by a schema and can be described without additional information making...
More...

A note on the release of KLPT

18 Nov 2020
Kurdish NLP Kurdish language processing Note
It was Autumn 2011. I was a second-year Bachelor’s student in Software Engineering and was attending basic computer science modules, including Automata Theory. I remember well how excited I was when I came back home from the first session of that module. A cold but sweet evening in Kurdistan. The idea of automata as very basic yet amazing machines to process information was fascinating to me. So, I took a piece of paper that same...
More...

Ph.D. in Ireland vs. Europe: a comparative overview

25 Sep 2020
Academia Research PhD life
Many postgraduate students, including myself, have been recently feeling more than before ignored and exploited in Ireland, due to the lack of regulations and fair remuneration within universities. In the past few years, many student organizations have talked about our issues and tried to raise awareness among both students and university administrations about the current challenges that we are going through. A few particular examples are the unpaid teaching hours at the National University of...
More...

A more representative solution for calculating academic citations

07 Sep 2020
Academia Science Research
Millions of scientific articles are published every year and released to the scientific communities using various bibliographic databases such as Google Scholar, CiteSeerX or Scopus. In order to understand the impact of a publication and how it shapes the future of the research field, citation index and citation impact measures are used. Such measures enable researchers to trace the succeeding papers that rely on their publications. Although there are many ways to calculate citation index...
More...

Sentence alignment using GIZA++

01 Apr 2020
Tutorial
Recently, I was working on a task for which I needed to align sentences in two languages in a parallel corpus. For this purpose, I wanted to use GIZA++, a statistical machine translation toolkit that is used to train IBM Models 1-5 and an HMM word alignment model. The toolkit is a bit outdated and does not seem to be further developped. Therefore, it was a bit challenging to find out how to install and...
More...

A Comprehensive List of Kurdish dictionaries

17 Mar 2020
languages Kurdish lexicography
In this post, all the available lexical resources, particularly dictionaries, for the Kurdish language are listed. This is the result of a recent work where we carried out a comprehensive study on the existing resources. Find out more about this work at https://sinaahmadi.github.io/resources/kurdishlex.html. Please note that all the tables are sortable (by clicking on the name of the column) and only 10 rows are shown by default. You can search in the table or change...
More...

"[v], idiot, [v], not [w]!"

20 Feb 2020
languages Kurdish
"If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart." Nelson Mandela Dyako, my classmate, was beaten just because he was incorrectly pronouncing "W" instead of "V" (Image credits) I still remember that scene in front of my eyes, just like it is happening now. This is what Dyako was told, a classmate who used to...
More...
2019

Is Kurdish a less-resourced language?

09 Nov 2019
Kurdish NLP Kurdish language processing Less-resourced languages
TLDR Yes, for the time being, but hopefully no, if we take serious actions. To know more about why a language spoken by 20-30 million speakers should still be less-resourced, read the post. 🙃 A language is called “less-resourced” when the only available resources for computationally processing it are descriptive grammars and online resources. Regarding Kurdish, it has been always highlighted by the researchers that Kurdish is a less-resourced language. As the following figure indicates,...
More...

Dynamic spreadsheets for data linking annotation

31 Jul 2019
Data mining Data linking NLP Apps Script Tutorial
On the importance of data a lot has been said. We are living in the age of data where we produce constantly data by our presence online. At some point in your career or your studies, you may need to create your own dataset according to your field knowledge for a specific task. Such a dataset, which is technically called a gold-standard, is used as a collection of trustable instances for applying machine learning techniques...
More...

Danish lexicographic data linking

26 Jun 2019
Lexicography Data linking NLP ELEXIS Danish
Joseph Kosuth's 'Titled (Art as Idea as Idea)’ [Radical]' work at Louisiana Museum of Modern Art I had the chance to visit two amazing centres in the beautiful city of Copenhagen, Denmark, during the last two weeks. This visit was a part of my Ph.D. project in lexicographic data linking within ELEXIS. ELEXIS-the European lexicographic infrastructure, aims at paving the way for efficiently creating, maintaining and updating dictionaries. Heterogeneity of data, diversity in structure and...
More...

Summer datathon in Dagstuhl

18 May 2019
Linked Data Datathon NLP
Last week, I participated in the 3rd Summer Datathon on Linguistic Linked Open Data (SD-LLOD-19) which was held in the Schloss Dagstuhl – Leibniz Center for Informatics, Wadern, Germany. As my first datathon where I was a tutor, it was such an amazing experience that I would like to write about here. 3rd Summer Datathon on Linguistic Linked Open Data (11-17 May 2019) Schloss Dagstuhl (or Dagstuhl Castle) is a historical amazing place where seminars...
More...

Resource population using Wikidata and Wiktionary

16 May 2019
Linked Data SPARQL Resource population NLP
There are an increasing number of lexical resources available online which are machine-friendly and can be accessed by linked data techniques. In this repository, we provide a program which collects linguistic information for a given word, in raw text or triples, and converts the collected data to Lemon-OntoLex ontology. The current resources being used are Wikidata and Wiktionary. However, any resource which can be accessed via a SPARQL endpoint can be used in this program....
More...

Text pre-processing in command-line

16 Apr 2019
NLP Text processing Bash Tutorial
Whether your are a newbie or professional in natural language processing or data mining, text pre-processing is a task that you need to go through at some point. Although there is a plethora of libraries which do the pre-processing of your text perfectly, none of them may have power, simplicity and flexibility that the command-line programs provide to you. In this tutorial, a few simple but essential programs are introduced for text pre-processing in command-line....
More...

Data modelling with RDF: a tutorial

21 Mar 2019
RDF Ontology Data model Linked Data Tutorial
What is RDF? RDF stands for Resource Description Framework which is a framework for describing resources on the web. It was initially designed to represent metadata on the Web. However, nowadays RDF is the foundational data model for Semantic Web. In addition, RDF along with other technologies such as SPARQL, OWL, and SKOS empower Linked Data. In other words, RDF is fun, easier than relational databases and efficient to use. RDF expressions are in the...
More...

Foreign loanwords in Modern Greek

18 Mar 2019
Modern Greek
It is not exaggeration to say that Greek is the mother language of knowledge. No other civilisation other than Hellenism played such a primordial role in seeking for knowledge and enriching human’s understanding of the world and the nature. No doubt, there were other civilisations with wonderful achievements and contributions to humanity, but none of them seems to have preserved their works as good as Greeks did by writing. Mathematics, physics, medicine, literature, philosophy are...
More...

Why does Kurdish language processing matter?

05 Mar 2019
Kurdish NLP Kurdish language processing Less-resourced languages
⚡️ Ocotber 2020 Update: Check out the Kurdish Language Processing Toolkit Since the first time that I touched my home computer keyboard in 2001, I used to ask myself if it would be possible to make computer understand my mother language, Kurdish. Well, what I was imagining was definitely something limited to a Kurdish interface for Windows, particularly when I was going through the installation descriptions of Return to Castle Wolfenstein and Mafia: The City...
More...