In the previous lab where I used to work, I used to manage training neural network models on a GPU server without any job scheduler. ssh, create a virtual environment in Python and train in a screen! That simple. I have recently changed my lab and suddenly, found myself working with a cluster requiring considerably more preparation to train models. However, as I gradually learnt more about it, it became clearer that it is actually...
✴️ This project is moved to https://sinaahmadi.github.io/SPARQLify/! Following one of my previous posts on the 10 basic but essential SPARQL queries for lexicographical data on Wikidata, I decided to create a simple SPARQL query generator that helps non-experts get more familiar with SPARQL and also, create queries to look up words in Wikidata and Dbnary quickly on their endpoints (Wikidata’s endpoint, Dbnary’s endpoint). Please note that the translation option is only available for headwords. Also,...
Given that I cannot currently focus on too many things at the same time due to existing projects, I would like to write this idea in the form of a blog post. To be addressed, hopefully, sometime in the future! There are thousands of languages around the world which are regional, minority, endangered or under-documented. Oftentimes, these languages are spoken within oppressed linguistic communities where the development and documentation of the language have received trivial...
The Semantic Web as an extension of the World Wide Web (WWW) represents an effective means of data representation and enables users and computers to retrieve and share information efficiently. The Resource Description Framework (RDF) is the foundational data model for Semantic Web. Unlike traditional databases, such as relational ones, where data has to adhere to a fixed schema, RDF documents are not prescribed by a schema and can be described without additional information making...
It was Autumn 2011. I was a second-year Bachelor’s student in Software Engineering and was attending basic computer science modules, including Automata Theory. I remember well how excited I was when I came back home from the first session of that module. A cold but sweet evening in Kurdistan. The idea of automata as very basic yet amazing machines to process information was fascinating to me. So, I took a piece of paper that same...
Updates (May 5th, 2022) Even though this blog post has remained much relevant even after two years, I don’t think I will update the collected data. However, I’ll provide a few links here for future readers as follows: 6 Reasons Not to Move to Ireland Many postgraduate students, including myself, have been recently feeling more than before ignored and exploited in Ireland, due to the lack of regulations and fair remuneration within universities. In the...
Millions of scientific articles are published every year and released to the scientific communities using various bibliographic databases such as Google Scholar, CiteSeerX or Scopus. In order to understand the impact of a publication and how it shapes the future of the research field, citation index and citation impact measures are used. Such measures enable researchers to trace the succeeding papers that rely on their publications. Although there are many ways to calculate citation index...
Recently, I was working on a task for which I needed to align sentences in two languages in a parallel corpus. For this purpose, I wanted to use GIZA++, a statistical machine translation toolkit that is used to train IBM Models 1-5 and an HMM word alignment model. The toolkit is a bit outdated and does not seem to be further developped. Therefore, it was a bit challenging to find out how to install and...
In this post, all the available lexical resources, particularly dictionaries, for the Kurdish language are listed. This is the result of a recent work where we carried out a comprehensive study on the existing resources. Find out more about this work at https://sinaahmadi.github.io/resources/kurdishlex.html. Please note that all the tables are sortable (by clicking on the name of the column) and only 10 rows are shown by default. You can search in the table or change...
"If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart." Nelson Mandela Dyako, my classmate, was beaten just because he was incorrectly pronouncing "W" instead of "V" (Image credits) I still remember that scene in front of my eyes, just like it is happening now. This is what Dyako was told, a classmate who used to...
Less-resourced languages
NLP
Kurdish language processing
Kurdish
TLDR Yes, for the time being, but hopefully no, if we take serious actions. To know more about why a language spoken by 20-30 million speakers should still be less-resourced, read the post. 🙃 A language is called “less-resourced” when the only available resources for computationally processing it are descriptive grammars and online resources. Regarding Kurdish, it has been always highlighted by the researchers that Kurdish is a less-resourced language. As the following figure indicates,...
On the importance of data a lot has been said. We are living in the age of data where we produce constantly data by our presence online. At some point in your career or your studies, you may need to create your own dataset according to your field knowledge for a specific task. Such a dataset, which is technically called a gold-standard, is used as a collection of trustable instances for applying machine learning techniques...
Joseph Kosuth's 'Titled (Art as Idea as Idea)’ [Radical]' work at Louisiana Museum of Modern Art I had the chance to visit two amazing centres in the beautiful city of Copenhagen, Denmark, during the last two weeks. This visit was a part of my Ph.D. project in lexicographic data linking within ELEXIS. ELEXIS-the European lexicographic infrastructure, aims at paving the way for efficiently creating, maintaining and updating dictionaries. Heterogeneity of data, diversity in structure and...
Last week, I participated in the 3rd Summer Datathon on Linguistic Linked Open Data (SD-LLOD-19) which was held in the Schloss Dagstuhl – Leibniz Center for Informatics, Wadern, Germany. As my first datathon where I was a tutor, it was such an amazing experience that I would like to write about here. 3rd Summer Datathon on Linguistic Linked Open Data (11-17 May 2019) Schloss Dagstuhl (or Dagstuhl Castle) is a historical amazing place where seminars...
There are an increasing number of lexical resources available online which are machine-friendly and can be accessed by linked data techniques. In this repository, we provide a program which collects linguistic information for a given word, in raw text or triples, and converts the collected data to Lemon-OntoLex ontology. The current resources being used are Wikidata and Wiktionary. However, any resource which can be accessed via a SPARQL endpoint can be used in this program....
Whether your are a newbie or professional in natural language processing or data mining, text pre-processing is a task that you need to go through at some point. Although there is a plethora of libraries which do the pre-processing of your text perfectly, none of them may have power, simplicity and flexibility that the command-line programs provide to you. In this tutorial, a few simple but essential programs are introduced for text pre-processing in command-line....
What is RDF? RDF stands for Resource Description Framework which is a framework for describing resources on the web. It was initially designed to represent metadata on the Web. However, nowadays RDF is the foundational data model for Semantic Web. In addition, RDF along with other technologies such as SPARQL, OWL, and SKOS empower Linked Data. In other words, RDF is fun, easier than relational databases and efficient to use. RDF expressions are in the...
It is not exaggeration to say that Greek is the mother language of knowledge. No other civilisation other than Hellenism played such a primordial role in seeking for knowledge and enriching human’s understanding of the world and the nature. No doubt, there were other civilisations with wonderful achievements and contributions to humanity, but none of them seems to have preserved their works as good as Greeks did by writing. Mathematics, physics, medicine, literature, philosophy are...
Kurdish
NLP
Kurdish language processing
Less-resourced languages
⚡️ Ocotber 2020 Update: Check out the Kurdish Language Processing Toolkit Since the first time that I touched my home computer keyboard in 2001, I used to ask myself if it would be possible to make computer understand my mother language, Kurdish. Well, what I was imagining was definitely something limited to a Kurdish interface for Windows, particularly when I was going through the installation descriptions of Return to Castle Wolfenstein and Mafia: The City...