Validating RDF data - KGSWC'19 Tutorial

Abstract

RDF was proposed as a graph-based data model which became a key part of the Semantic Web vision. Although the benefits of RDF for data representation and integration are indisputable, it has not been embraced by everyday programmers and software architects who care about safely creating and accessing well-structured data. Semantic web projects still lack some common tools and methodologies that are available in more conventional settings to describe and validate data. In particular, relational databases and XML have popular technologies for defining data schemas and validating data which had no analog in RDF.

Two technologies have been proposed for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL).

ShEx was designed as an intuitive and human-friendly high level language for RDF validation in 2014. ShEx 2.0 has recently been proposed by the W3C ShEx community group.

SHACL was proposed by the Data Shapes Working Group and accepted as a W3C Recommendation in July 2017.

This tutorial, presented by the authors of the Validating RDF data book, will describe both ShEx and SHACL using examples, showing the rationales for their designs, a comparison of the two, and some example applications.

Overview

RDF is growing in popularity for both data transfer and data storage/recall. In both of these capacities, it is important to describe and verify conformance with a particular graph structure. While the Semantic Web is an environment where anybody can say anything about any topic, we still need to make sure that clinical, genetic, manufacturing, etc. databases capture data in a predictable way.

When we record or exchange data, programs or human operators are expected to synthesize and interpret data. In order to safely process data, this additionally requires that the data maintains a specified structure and can be described by that structure.

Non-RDF data storage systems offer and rely on schemas both to increase data integrity and to enable efficient storage and static query analysis for optimization. SQL's Data Definition Language completely constrains what may appear in an SQL database (with minor exceptions like some databases that don't ensure homogeneity in a column). XML's use of W3C XML Schema and Relax NG typically involves validation on data creation and ingestion. Even JSON Schema is growing in popularity as that developer community recognizes the need for basic structural description.

RDF, and graph stores in general, don't demand an initial schema definition like SQL, but operate more like XML where the basic language allows many structural constructs but specific applications impose further practical demands. In that sense ShEx and SHACL work with the open spirit of RDF (natively schema-less), while giving developers and data architects a tool to impose and validate some specific constraints.

The practicalities of data exchange faced by the Open Services Life Cycles collaboration lead to the development of Resource Shapes, a language for communicating the data structures managed by Linked Data Platform endpoints. Likewise, the Dublin Core defined Description Set Profiles for describing constraints and expectations about bibliographic records. None of these underwent a standardization and implementation phase leading to widely deployed, general-purpose validation tools.

The current work developed by the Shape Expressions community and W3c Data Shapes Working group may help to improve RDF adoption in industrial scenarios where there is a real need to ensure the structure of RDF data, both in production and consumption.

More information about ShEx is available at the ShEx Primer and about SHACL at the recommendation.

Topics

Goals

Audience

Some rudimentary knowledge of RDF and Turtle is expected, although a short introduction to the RDF data model will be done.

Presenter

Registration

To register, visit: KGSWC'19 registration

Schedule

The tutorial will be given on 8th/9th October, 2018 (see Workshops & Tutorials Program)

Slides

Examples and other material will be available at this repository