It was Autumn 2011. I was a second-year Bachelor’s student in Software Engineering and was attending basic computer science modules, including Automata Theory. I remember well how excited I was when I came back home from the first session of that module. A cold but sweet evening in Kurdistan.
The idea of automata as very basic yet amazing machines to process information was fascinating to me. So, I took a piece of paper that same evening and wrote a simple verb in my mother tongue, Sorani Kurdish, and conjugated it:
|xwardin (to eat)|
|xwardim '(I) ate'||xwardman '(we) ate'|
|xwardit '(you) ate'||xwardtan '(you) ate'|
|xwardî '(he/she) ate'||xwardyan '(they) ate'|
While the base “xward” remains the same, using different endings produces different outputs. That was already an automaton where the endings are the transitions and each verb-form is a state. A spark in my mind! 🎇
Already in love with linguistics and languages, this practical example helped me delve more into the topic and finally pursue my studies in natural language processing (NLP).
Now, after almost a decade, I am thrilled to release the Kurdish Language Processing Toolkit (KLPT). This toolkit provides basic language processing tools for the Kurdish language. My knowledge and passion regarding NLP, in general, and Kurdish language processing, in particular, motivated me to create this package and release it under an open-source license.
I know there is still a long way ahead for Kurdish to be a high-resource language and for myself to discover more in the amazing world of languages and computer science. However, I believe that this step paves the way for further advances in the Kurdish language processing.
Find more about this project at https://sinaahmadi.github.io/klpt/.
November 18, 2020