Abstract

Knowledge Graphs are increasingly being employed to improve data interoperability, search, and recommendation, alongside fostering the adoption of semantic web technologies. The quality of data within these graphs is pivotal, often validated against expected data models or shapes to enhance accuracy. Various technologies implement knowledge graphs; RDF-based triplestores are canonical in the Semantic Web, while in the graph databases context, Property Graphs are also considered for Knowledge Graphs. Wikidata, a popular Knowledge Graph, offers RDF through its SPARQL query service, but its data model aligns closely with Property Graphs using qualifiers and references, and the recent proposal of RDF-Star can bridge the gap between RDF and Property Graphs.

Shape Expressions (ShEx) and Shapes Constraint Language (SHACL) were proposed for RDF validation while in the case of Property Graphs, PGSchema was proposed, as well as other proposals like PShEx or ProGS. Wikidata adopted Entity Schemas, which are based on ShEx as well as its own property constraint system, and there is a proposal called WShEx.

This tutorial explores different types of Knowledge Graphs and approaches for their validation. We will also review practical applications like inferring shapes from existing data and creating conforming subsets of Knowledge Graphs.

Slides

Topics

This is a half-day tutorial with the following topics:

  • Introduction to Knowledge graphs
  • Types of Knowledge Graphs:
    • RDF graphs
    • Property Graphs
    • Wikidata and Wikibase graphs
    • RDF-Star (RDF 1.2)
  • Shaping RDF:
    • Introduction to ShEx
    • Introduction to SHACL
    • ShEx & SHACL compared
  • Shaping other types of Knowledge Graphs
    • Shaping Wikidata and Wikibase graphs: Entity Schemas and WShEx
    • Shaping property graphs: P-ShEx, PGSchema, etc.
    • Shaping RDF-Star: ShEx-Star
  • Applications: Inferring shapes from data, Knowledge Graphs Subsets, etc.

We plan to devote the first slot to the first 3 items (knowledge graphs as well as Validating RDF technologies, ShEx and SHACL) which are more introductory, and the second slot for the rest of the items, which are more specialised.

Goals

  • Attendees will understand the different types of technologies to implement Knowledge Graphs
  • Users will understand the differences between the data models of RDF, Property graphs, Wikibase and RDF-Star.
  • Participants will understand use cases for defining shapes and validating Knowledge Graphs.
  • Participants will be able to create their own RDF data shapes or Schemas and validate instance data against them using ShEx and SHACL.
  • They will see how RDF validation works in ShEx and SHACL.
  • Hands-on experience will leave users comfortable using existing tools to solve practical needs in communicating schemas and verifying instance data conformance.
  • Users will be able to assess and compare the differences between ShEx, SHACL and other validation approachs for property graphs and Wikibase.

Tutorial type and intended audience

Anyone interested in Semantic Web technologies and tools can attend this tutorial. Some rudimentary knowledge of RDF and Turtle is expected, although a short introduction to the RDF data model will be done.

Tutoring team

  • Jose Emilio Labra Gayo. Full Professor at University of Oviedo, Spain. Founder and main researcher of WESO (Web Semantics Oviedo) research group, which collaborates with different companies around the world applying semantic web technologies. The development of data portals for several companies and public administrations led to his interest on RDF validation. He was a member of the W3C Data Shapes working group and of the W3C community groups: Shape Expressions and SHACL. He implemented the SHACL and ShEx library SHaclEX in Scala, maintains the online RDF validator services RDFShape and WikiShape, and is now implementing the rudof library in Rust which can also be used to validate RDF with ShEx, SHACL, DCTAP, etc.

Registration and schedule

To register, visit: ISWC'24

The tutorial will start at 9:00h and has two slots: 9:00h to 10:40h and 11:00h to 11:40h on Monday, 11th November 2024 (see Program).

Examples

Examples and other material will be available at this repository