During our work on semantic and the role it can play in digital explorations, we are of course interested in DBPedia.
I will present an experimental approach that allowed me to find, by a series of automatic processes, knowledge that is not explicitly available in DBPedia: the association between a type of buildings and a religion. I would approach in a future post generalization of this approach.
In connection with the french curiculum about History of Arts, we identified that, at various levels, monuments and other architectural elements have an important role in the transmission. Can be addressed in this context questions about the materials, the shapes, the relationship between the arrangement of space and power, the role of the spiritual and religious and many other subjects.
This led us to ‘snoop’ in DBPedia (with the approach ‘Follow your nose’) to see what party we could draw. This post describes some aspects of this exploration.
First let’s look to religious architectures. quickly found in the History of Art, topics such as Abbey, Gothic church, Mosque, Romanesque church, Synagogue.
We will seek the corresponding resources in DBPedia the properties they have in common and see if other elements have these same properties.
Find a resource DBPedia
We know that many resources of type ‘Category’ are built on the template
http://fr.dbpedia.org/resource/<resource name capitalized>
A possible starting point is the request sparql (on fr.dbpedia.org/sparql)
select reduced * where {?s rdfs:label "Abbaye"@fr } LIMIT 100
which gives
http://dbpedia.org/resource/Abbey
This suggests to me 4 sets encompassing concept Abbey (and as a bonus, give me a link that has nothing to do: a functional link for readers).
For other religious architectures that I retained, it appears that both categories seem applicable to all:
On dbpeda-fr, the request
select distinct ?s where { ?s ?p <http://fr.dbpedia.org/resource/Catégorie:Édifice-type> . } LIMIT 100
provides a set of 257 items related to Catégorie:Édifice-type, essentially building types and an equivalent category (owl: sameAs) in dbpedia:
http://dbpedia.org/page/Category:Buildings_and_structures_by_type
And as building types, we found:
http://fr.dbpedia.org/resource/Synagogue
http://fr.dbpedia.org/resource/Église_(édifice)
http://fr.dbpedia.org/resource/Mosquée
but we see that for ‘church’, we did not strictly have the same category, and we have not ‘Roman church’ and ‘Gothic church ‘.
What predicates shared by multiple categories?
Search predicates and shared objects of the topics of the associated objects with respectively ‘Synagogue’ and ‘Abbey’
select distinct ?p where { ?s1 <http://dbpedia.org/ontology/type> <http://fr.dbpedia.org/resource/Synagogue> . ?s2 <http://dbpedia.org/ontology/type> <http://fr.dbpedia.org/resource/Abbaye> . ?s1 ?p ?o . ?s2 ?p ?o . } LIMIT 100
gives 33 items, in which a manual selection, allows to highlight the following, non-geographical features
and the following geographic features
Predicates the most used by a category
The request
SELECT count(distinct ?s2) WHERE { ?s2 <http://dbpedia.org/ontology/type> <http://fr.dbpedia.org/resource/Abbaye> . ?s2 ?p ?o } LIMIT 100
tells me that, at the moment I make this request, there are 631 elements of type dbpedia-fr:Abbey
The request
SELECT ?p (COUNT(?p) AS ?pTotal) WHERE { ?s2 <http://dbpedia.org/ontology/type> <http://fr.dbpedia.org/resource/Abbaye> . ?s2 ?p ?o } GROUP BY ?p ORDER BY DESC(?pTotal) LIMIT 100
gives me the predicates that apply to the Abbey, ranked by their number of appearances. This is the beginning of the results:
This allows us to move forward in the search for ‘properties’ described with most dbpedia-fr:Abbaye
We see that there are many http://dbpedia.org/ontology/wikiPageWikiLink, which correspond to links to other pages. This finding can be generalized to many categories of DBPedia. I would go back in a future post. One may wonder if some links appear more often than others.
Research for links shared by many elements of a category
The request
SELECT ?o (COUNT(?o) AS ?oTotal) WHERE { ?s2 <http://dbpedia.org/ontology/type> <http://fr.dbpedia.org/resource/Abbaye> . ?s2 <http://dbpedia.org/ontology/wikiPageWikiLink> ?o } GROUP BY ?o ORDER BY DESC(?oTotal) LIMIT 100
gives us the answer:
http://fr.dbpedia.org/resource/Abbaye appears 629 times, which can be easily explained.
http://fr.dbpedia.org/resource/Catholicisme apparait 377 fois, appears 377 times, which is also logic.
Then we have http://fr.dbpedia.org/resource/Ordre_de_Saint-Benoît 220 fois; ; it must seek the explanation. Thank you for your suggestions.
The fourth link
http://fr.dbpedia.org/resource/Révolution_française
which has consistency, although this suggests a very historical bias about DBpedia sources.
The fifth is http://fr.dbpedia.org/resource/Monument_historique_(France) which could be us well.
We can consider that all of the beginning of this response elements can contribute to our exploration/exploitation of DBPedia.
Compare to what you get on Synagogue, with a similar request and 78 synagogues:
It is found that the most common link in almost all cases, but not all, is a link to the resource type that was used.
The second link seems to be a link URI designating a religion. Perhaps we have there a way to automatically associate a religious architecture to a religion. Let’s try a few examples:
http://fr.dbpedia.org/page/Mosquée is thus associated with http://fr.dbpedia.org/page/Islam
http://fr.dbpedia.org/resource/Cathédrale is thus associated with http://fr.dbpedia.org/resource/Église_catholique_romaine and in third position with http://fr.dbpedia.org/resource/Catholicisme
http://fr.dbpedia.org/resource/Temple is thus associated with http://fr.dbpedia.org/resource/Protestantisme and in third position with http://fr.dbpedia.org/resource/Bouddhisme
In all these cases, the second answer is admissible; the last two examples show situations where, for different reasons, the third answer is also relevant.
In fact it suggests that all religions referenced after the first link would be admissible answers. So we need to see if they have one property in common that identifies them as religion in DBPedia.
Qualify links
A first visual exploration gives us
http://fr.dbpedia.org/page/Judaïsme dcterms:subject dbpedia-fr:Catégorie:Religion_au_Moyen-Orient and the property is prop-fr:religion of
http://fr.dbpedia.org/page/Catholicisme dcterms:subject dbpedia-fr:Catégorie:Branche_du_christianisme and the property is is prop-fr:religion of
It appears that the common feature is to use the predicate is prop-fr: religion of.
The request
SELECT ?o (COUNT(?o) AS ?oTotal) WHERE { ?s <http://fr.dbpedia.org/property/religion> ?o } GROUP BY ?o ORDER BY DESC(?oTotal) LIMIT 20
allows us to find elements that are objects for that predicate::
Associate buildings with religion
Seek to associate a religion to each of the building types with the request:
SELECT ?s ?o (COUNT(?o) AS ?oTotal) WHERE { ?s ?p <http://fr.dbpedia.org/resource/Catégorie:Édifice-type> . ?s2 <http://dbpedia.org/ontology/type> ?s . ?s2 <http://dbpedia.org/ontology/wikiPageWikiLink> ?o . ?s3 <http://fr.dbpedia.org/property/religion> ?o } GROUP BY ?s ?o ORDER BY DESC(?oTotal) LIMIT 100
These are the first results (slightly edited for width of the page limitation)
s | o | oTotal |
dbpedia-fr:Abbaye | dbpedia-fr:Catholicisme | 599430 |
dbpedia-fr:Église_(édifice) | dbpedia-fr:Catholicisme | 418170 |
dbpedia-fr:Basilique_(christianisme) | dbpedia-fr:Catholicisme | 357750 |
dbpedia-fr:Chapelle | dbpedia-fr:Catholicisme | 206700 |
dbpedia-fr:Mosquée | dbpedia-fr:Islam | 167970 |
dbpedia-fr:Église_(édifice) | dbpedia-fr:Église_catholique_romaine | 132086 |
dbpedia-fr:Collégiale | dbpedia-fr:Catholicisme | 89040 |
dbpedia-fr:Basilique_(christianisme) | dbpedia-fr:Église_catholique_romaine | 78914 |
dbpedia-fr:Abbaye | dbpedia-fr:Église_catholique_romaine | 71740 |
dbpedia-fr:Couvent | dbpedia-fr:Catholicisme | 62010 |
dbpedia-fr:Chapelle | dbpedia-fr:Église_catholique_romaine | 56126 |
dbpedia-fr:Monastère | dbpedia-fr:Catholicisme | 34980 |
dbpedia-fr:Collégiale | dbpedia-fr:Église_catholique_romaine | 21944 |
dbpedia-fr:Synagogue | dbpedia-fr:Judaïsme | 17640 |
dbpedia-fr:Monastère | dbpedia-fr:Église_catholique_romaine | 11394 |
dbpedia-fr:Couvent | dbpedia-fr:Église_catholique_romaine | 8862 |
dbpedia-fr:Abbaye | dbpedia-fr:Protestantisme | 8550 |
dbpedia-fr:Synagogue | dbpedia-fr:Catholicisme | 7950 |
dbpedia-fr:Basilique_(christianisme) | dbpedia-fr:Christianisme | 7436 |
dbpedia-fr:Baptistère | dbpedia-fr:Catholicisme | 6360 |
dbpedia-fr:Mosquée | dbpedia-fr:Sunnisme | 5066 |
dbpedia-fr:Temple | dbpedia-fr:Protestantisme | 4788 |
dbpedia-fr:Église_(édifice) | dbpedia-fr:Protestantisme | 4104 |
dbpedia-fr:Église_(édifice) | dbpedia-fr:Christianisme | 4004 |
dbpedia-fr:Abbaye | dbpedia-fr:Christianisme | 4004 |
dbpedia-fr:Cloître | dbpedia-fr:Catholicisme | 3180 |
We see that relevant associations are obtained, and there are sometimes several associations for the same building type.
The resulting count must be used to assess the reliability of the association.
The request
SELECT ?s (COUNT(?s) AS ?sTotal) WHERE { ?s ?p <http://fr.dbpedia.org/resource/Catégorie:Édifice-type> . ?s2 <http://dbpedia.org/ontology/type> ?s . ?s2 <http://dbpedia.org/ontology/wikiPageWikiLink> ?o . ?s3 <http://fr.dbpedia.org/property/religion> ?o } GROUP BY ?s ORDER BY DESC(?sTotal) LIMIT 100
gives us the types of buildings associated with religion and gives the following list (shortened, discarding answers with low representativity):
It would be probably useful to divide the number of association (religion-building) by the number of representatives of each type of building to have a measure of representativity of the association found.
I do not now (at the date of publication of the french version of this post) how to do that with sparql to obtain the best associations. But is it in SPARQL to do it: in Javascript, for example, I could easily make the first query and retrieve the results in JSON, and then to do the other request, and finally calculate the pertinence from the two results in JSON. To be continued.
In summary
I found that most of the predicates used on the set of elements that I thought was http://dbpedia.org/ontology/wikiPageWikiLink. Searching for links shared by a number of elements of a class, I highlighted a ‘feature’ of this category. Then, looking predicates commonly used for this feature, I could qualify this characteristic. Finally, as my original class did not have this knowledge, I could make an association between a category such as Abbey and one or more religions. This treatment is partly statistical. Should assess their level of relevance, but the first results are interesting.
That’s all for now. I have shown how, with some fairly simple queries, you can generate knowledge that are not explicitly available in DBPedia.
In a future post, I will try to systematize and generalize the approach.
(the french version of this post was first published here https://onsem.wp.imt.fr/2015/05/15/creer-des-connaissances-formalisees-pour-le-web-semantique-a-partir-de-dbpedia/)