# 10 basic but essential SPARQL queries for lexicographical data on Wikidata

###### Dec 10, 2020

The Semantic Web as an extension of the World Wide Web (WWW) represents an effective means of data representation and enables users and computers to retrieve and share information efficiently. The Resource Description Framework (RDF) is the foundational data model for Semantic Web. Unlike traditional databases, such as relational ones, where data has to adhere to a fixed schema, RDF documents are not prescribed by a schema and can be described without additional information making RDF data model self-describing. To learn more about RDF, you can read one of my previous blog posts on data modelling with RDF.

More recently, the concept of the Web of Linked Data, which makes RDF data available using the HyperText Transfer Protocol (HTTP), and Linguistic Linked Open Data has gained traction along with the Semantic Web, particularly in the natural language processing (NLP) community as a standard for linguistic resource creation. According to the official definition of W3C,

Linked Data lies at the heart of what Semantic Web is all about: large scale integration of, and reasoning on, data on the Web. Almost all applications listed in, say collection of Semantic Web Case Studies and Use Cases are essentially based on the accessibility of, and integration of Linked Data at various level of complexities.

Moreover, the unique potential which the Semantic Web and Linked Data offer to electronic lexicography enables interoperability across lexical resources by leveraging printed or unstructured linguistic data to machine-readable semantic formats.

Semantic web and linked data facilitate retrieving information from huge resources such as printed dictionaries (Photo taken at DSL in Copenhagen)

## Queries

We present 10 essential queries in SPARQL, an RDF query language, for lexicographical purposes to retrieve information. To this end, we use the SPARQL endpoint of Wikidata which comes with a few lexeme queries as example, too.

It is important to get familiar with Ontolex-Lemon and the Ontolex-Lemon lexicography module (lexicog) as lexicographical data on Wikidata are provided based on those ontologies.

Moreover, a list of other useful queries are provided at:

Unfortunately, not all languages are equally represented on Wikipedia. In this tutorial, we focus on some of the richly represented ones, e.g. English and French. So, if you modify the queries to work on another language, make sure that your language is sufficiently represented on Wikidata before double-checking the soundness of the syntax of your queries.

Run this query

Run this query

Run this query

Run this query

Run this query

Run this query

Run this query

Run this query

Run this query

#### 10- Check if a word exists in a given language (i.e. spell error detection)

Run this query

In addition to the Wikidata endpoint, you can integrate your SPARQL queries in your code. For instance, you can use the following in Python:

Last updated on 9 March 2021.