All posts by year

Showing everything matching: clear

2023

A Minimalistic Guide to Clusters (and SLURM)

28 Jan 2023
SLURM Tutorial
In the previous lab where I used to work, I used to manage training neural network models on a GPU server without any job scheduler. ssh, create a virtual environment in Python and train in a screen! That simple. I have recently changed my lab and suddenly, found myself working with a cluster requiring considerably more preparation to train models. However, as I gradually learnt more about it, it became clearer that it is actually...
More...
2021

SPARQL query generator for lexicographical data

04 Oct 2021
SPARQL Linked Data Lexicography
✴️ This project is moved to https://sinaahmadi.github.io/SPARQLify/! Following one of my previous posts on the 10 basic but essential SPARQL queries for lexicographical data on Wikidata, I decided to create a simple SPARQL query generator that helps non-experts get more familiar with SPARQL and also, create queries to look up words in Wikidata and Dbnary quickly on their endpoints (Wikidata’s endpoint, Dbnary’s endpoint). Please note that the translation option is only available for headwords. Also,...
More...

EndoLinked: A Platform for Documenting Endonyms in Minority and Endangered Languages using Linked Data

15 Mar 2021
Research Proposal Linked data
Given that I cannot currently focus on too many things at the same time due to existing projects, I would like to write this idea in the form of a blog post. To be addressed, hopefully, sometime in the future! There are thousands of languages around the world which are regional, minority, endangered or under-documented. Oftentimes, these languages are spoken within oppressed linguistic communities where the development and documentation of the language have received trivial...
More...
2020

10 basic but essential SPARQL queries for lexicographical data on Wikidata

10 Dec 2020
NLP Linked data SPARQL
The Semantic Web as an extension of the World Wide Web (WWW) represents an effective means of data representation and enables users and computers to retrieve and share information efficiently. The Resource Description Framework (RDF) is the foundational data model for Semantic Web. Unlike traditional databases, such as relational ones, where data has to adhere to a fixed schema, RDF documents are not prescribed by a schema and can be described without additional information making...
More...

A note on the release of KLPT

18 Nov 2020
Kurdish NLP Kurdish language processing Note
It was Autumn 2011. I was a second-year Bachelor’s student in Software Engineering and was attending basic computer science modules, including Automata Theory. I remember well how excited I was when I came back home from the first session of that module. A cold but sweet evening in Kurdistan. The idea of automata as very basic yet amazing machines to process information was fascinating to me. So, I took a piece of paper that same...
More...

Ph.D. in Ireland vs. Europe: a comparative overview

25 Sep 2020
Academia Research PhD life
Updates (May 5th, 2022) Even though this blog post has remained much relevant even after two years, I don’t think I will update the collected data. However, I’ll provide a few links here for future readers as follows: 6 Reasons Not to Move to Ireland Many postgraduate students, including myself, have been recently feeling more than before ignored and exploited in Ireland, due to the lack of regulations and fair remuneration within universities. In the...
More...

A more representative solution for calculating academic citations

07 Sep 2020
Academia Science Research
Millions of scientific articles are published every year and released to the scientific communities using various bibliographic databases such as Google Scholar, CiteSeerX or Scopus. In order to understand the impact of a publication and how it shapes the future of the research field, citation index and citation impact measures are used. Such measures enable researchers to trace the succeeding papers that rely on their publications. Although there are many ways to calculate citation index...
More...

Sentence alignment using GIZA++

01 Apr 2020
Tutorial
Recently, I was working on a task for which I needed to align sentences in two languages in a parallel corpus. For this purpose, I wanted to use GIZA++, a statistical machine translation toolkit that is used to train IBM Models 1-5 and an HMM word alignment model. The toolkit is a bit outdated and does not seem to be further developped. Therefore, it was a bit challenging to find out how to install and...
More...

A Comprehensive List of Kurdish dictionaries

17 Mar 2020
Kurdish Lexicography
In this post, all the available lexical resources, particularly dictionaries, for the Kurdish language are listed. This is the result of a recent work where we carried out a comprehensive study on the existing resources. Find out more about this work at https://sinaahmadi.github.io/resources/kurdishlex.html. Please note that all the tables are sortable (by clicking on the name of the column) and only 10 rows are shown by default. You can search in the table or change...
More...

"[v], idiot, [v], not [w]!"

20 Feb 2020
Languages Kurdish Languages
"If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart." Nelson Mandela Dyako, my classmate, was beaten just because he was incorrectly pronouncing "W" instead of "V" (Image credits) I still remember that scene in front of my eyes, just like it is happening now. This is what Dyako was told, a classmate who used to...
More...
2019

Is Kurdish a less-resourced language?

09 Nov 2019
Less-resourced languages NLP Kurdish language processing Kurdish
TLDR Yes, for the time being, but hopefully no, if we take serious actions. To know more about why a language spoken by 20-30 million speakers should still be less-resourced, read the post. 🙃 A language is called “less-resourced” when the only available resources for computationally processing it are descriptive grammars and online resources. Regarding Kurdish, it has been always highlighted by the researchers that Kurdish is a less-resourced language. As the following figure indicates,...
More...

Dynamic spreadsheets for data linking annotation

31 Jul 2019
Data mining Data linking NLP Apps Script Tutorial
On the importance of data a lot has been said. We are living in the age of data where we produce constantly data by our presence online. At some point in your career or your studies, you may need to create your own dataset according to your field knowledge for a specific task. Such a dataset, which is technically called a gold-standard, is used as a collection of trustable instances for applying machine learning techniques...
More...

Danish lexicographic data linking

26 Jun 2019
Lexicography Research visit NLP ELEXIS Danish
Joseph Kosuth's 'Titled (Art as Idea as Idea)’ [Radical]' work at Louisiana Museum of Modern Art I had the chance to visit two amazing centres in the beautiful city of Copenhagen, Denmark, during the last two weeks. This visit was a part of my Ph.D. project in lexicographic data linking within ELEXIS. ELEXIS-the European lexicographic infrastructure, aims at paving the way for efficiently creating, maintaining and updating dictionaries. Heterogeneity of data, diversity in structure and...
More...

Summer datathon in Dagstuhl

18 May 2019
Linked Data Datathon NLP
Last week, I participated in the 3rd Summer Datathon on Linguistic Linked Open Data (SD-LLOD-19) which was held in the Schloss Dagstuhl – Leibniz Center for Informatics, Wadern, Germany. As my first datathon where I was a tutor, it was such an amazing experience that I would like to write about here. 3rd Summer Datathon on Linguistic Linked Open Data (11-17 May 2019) Schloss Dagstuhl (or Dagstuhl Castle) is a historical amazing place where seminars...
More...

Resource population using Wikidata and Wiktionary

16 May 2019
Linked Data SPARQL NLP
There are an increasing number of lexical resources available online which are machine-friendly and can be accessed by linked data techniques. In this repository, we provide a program which collects linguistic information for a given word, in raw text or triples, and converts the collected data to Lemon-OntoLex ontology. The current resources being used are Wikidata and Wiktionary. However, any resource which can be accessed via a SPARQL endpoint can be used in this program....
More...

Text pre-processing in command-line

16 Apr 2019
NLP Text processing Bash Tutorial
Whether your are a newbie or professional in natural language processing or data mining, text pre-processing is a task that you need to go through at some point. Although there is a plethora of libraries which do the pre-processing of your text perfectly, none of them may have power, simplicity and flexibility that the command-line programs provide to you. In this tutorial, a few simple but essential programs are introduced for text pre-processing in command-line....
More...

Data modelling with RDF: a tutorial

21 Mar 2019
Semantic Web Linked Data Tutorial
What is RDF? RDF stands for Resource Description Framework which is a framework for describing resources on the web. It was initially designed to represent metadata on the Web. However, nowadays RDF is the foundational data model for Semantic Web. In addition, RDF along with other technologies such as SPARQL, OWL, and SKOS empower Linked Data. In other words, RDF is fun, easier than relational databases and efficient to use. RDF expressions are in the...
More...

Foreign loanwords in Modern Greek

18 Mar 2019
Modern Greek Languages
It is not exaggeration to say that Greek is the mother language of knowledge. No other civilisation other than Hellenism played such a primordial role in seeking for knowledge and enriching human’s understanding of the world and the nature. No doubt, there were other civilisations with wonderful achievements and contributions to humanity, but none of them seems to have preserved their works as good as Greeks did by writing. Mathematics, physics, medicine, literature, philosophy are...
More...

Why does Kurdish language processing matter?

05 Mar 2019
Kurdish NLP Kurdish language processing Less-resourced languages
⚡️ Ocotber 2020 Update: Check out the Kurdish Language Processing Toolkit Since the first time that I touched my home computer keyboard in 2001, I used to ask myself if it would be possible to make computer understand my mother language, Kurdish. Well, what I was imagining was definitely something limited to a Kurdish interface for Windows, particularly when I was going through the installation descriptions of Return to Castle Wolfenstein and Mafia: The City...
More...