Wergor transliteration corpus

On This Page

About

Wergor is a transliteration system for Sorani Kurdish Latin-based and Arabic-based orthographies. In this first version, we have used a rule-based method. It is the result of a research project published in the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) and can be downloaded here: https://dl.acm.org/citation.cfm?id=3278623.

Wergor comes with a transliteration corpus which includes 20k manually transliterated Sorani Kurdish words in the Arabic-based and the Latin-based orthographies.

Get Wergor

Wergor transliterator and corpus can be downloaded at https://github.com/sinaahmadi/wergor.

Please cite the following paper if you are using Wergor:

    @article{ahmadi2019rule,
      title={A Rule-Based Kurdish Text Transliteration System},
      author={Ahmadi, Sina},
      journal={ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)},
      volume={18},
      number={2},
      pages={18},
      year={2019},
      publisher={ACM}
    }