Finding, accessing and using data disseminated through spatial data infrastructures (SDI) based on OGC web services is difficult for non-expert users. Our research has investigated how to improve this while keeping the current spatial data infrastructures intact. I.e., we have been exploring ideas how to realize synergies between the current spatial data infrastructures and the developments on the Web of data.

The approach has been to design and implement an intermediate layer using proxies that will make data and metadata from the OGC web services available using the following principles that we consider important from a Web perspective:

All resources are identified using persistent HTTP URIs
All interaction is using the HTTP protocol and consistent with its design
All resources are discoverable via search engines
Resources can be accessed and understood by citizens and developers
Resources are either explicitly linked using HTTP URIs or data is structured so that links can be established dynamically

To a large extent we were successful implementing the proxy layer. This report in particular documents

the resources that are made available by the proxies,
the mapping of the data and metadata resources to the schema.org vocabulary (plus the GeoDCAT-ap vocabulary for metadata),
our strategy for assigning URIs to the resources,
the representations (formats), in which each resource is available,
our experiences with establishing links across datasets and between data and metadata, and
our experiences with search engines crawling and indexing the resources.

There are a number of open challenges and takeaways that we have learned during our work on the research questions.

Existing infrastructure

Data integration across datasets: Existing datasets are sometimes structured in ways that make it difficult to dynamically establish links between resources in the datasets. Users of data on the Web would benefit from properly structured data using common vocabularies as this decreases the cost of data integration (in this testbed, this is linking) significantly.
Metadata today is often incomplete or inconsistent: Metadata provided in the WFS capabilities and the dataset metadata in the catalogs is either incomplete or both have inconsistent information.
Performance & data compactness: Response times and data size may have an impact on the indexing and ranking by search engines as well as on the usability in general. More work is needed to understand this aspect better and to be able to assess, if existing SDI components meet expectations.

The Web of data

Search engines are largely a black box when we look at spatial data: It is not clear to us, if and how structured spatial data is used by search engines and how users can use it to improve finding data they are interested in. In addition, it is unclear how useful search engines are for discovering data in large datasets - it seems as if indexing will take a lot of time.
schema.org is a divergent kind of vocabulary - from the Linked Data perspective: Its use is in particular intended for RDFa/JSON-LD markup of HTML pages to facilitate the search engines and mobile digital personal assistants. The capabilities for RDFa/JSON-LD to mark up geospatial data are not optimal. Therefore, a geo extension to schema.org is proposed.
Different vocabularies needed for different use cases / communities: We distinguish two general use cases for making spatial data available as Linked Data. The first is the same as the testbed: make spatial data part of the indexed web by providing a common structured format. The other use case is: make spatial data available and spatially queryable in the Linked Data cloud. For the first use case schema.org is necessary. For the second use case we recommend to use other vocabularies such as GeoSPARQL. To support both use cases, we advise to use both schema.org and GeoSPARQL.
Content negotiation based on media types insufficient: The HTTP content negotiation mechanism has its limits as it supports only selection based on the “data format”, but is unable to support selection of different vocabularies. The approach that we have taken, and which mostly solves the issue for us, is to simply provide separate URIs for the different vocabularies / user communities.

The Goal

Finding, accessing and using data disseminated through OGC services is difficult for non-expert users. This has several reasons, including:

In spatial data infrastructures, catalog services are intended to be used for discovering spatial assets, not the general purpose search engines of the Web. OGC web services do not address indexing of their content by those search engines.
By design, the catalog services only provide access to metadata - and in general metadata that is focussed on the needs of expert users - not the data itself.
Users cannot just “follow links” to access data, it is typically necessary to construct some kind of query to access data.
In addition, it is often difficult for non-expert users to understand and use the data. Part of this are domain-specific complexities that are difficult for non-experts (e.g., handling of coordinates in different coordinate reference systems), but hard to avoid entirely. But the datasets often address requirements of expert communities with diverse needs, resulting in comprehensive, but complex specifications that cover many edge cases, too. And the data is typically available in formats that are not easy to process for non-expert users.

The research documented in this report has investigated how to improve this while keeping the current spatial data infrastructure (SDI) intact. I.e., how can synergy be realized between the current spatial data infrastructures and the developments on the Web?

Our Motivation

This research topic combines two aspects that are both important to us. First of all, it allows to think ahead and investigate how to publish feature-based spatial data sets available on the Web. At the same time, it does not simply start from a clean sheet, but takes the existing infrastructure for spatial datasets and the related workflows for data management and dissemination into account.

We believe that it is important to avoid being constrained by the technical details of WFS, GML, CSW and ISO 19139 when exploring how the data should be made available on the Web, i.e. when focusing on the first aspect. In the end, it is essential that the data is useful for those that want to use it, e.g. for implementing web or mobile applications for the new environmental act.

At the same time, there is a lot of utility in leveraging the current infrastructure. This approach evaluates the extent in which the current spatial data infrastructures can support the provision of spatial data on the Web, and what are its conceptual limitations. If the approach proves to be feasible, then it provides a migration path with a potentially low effort for a data provider - or even a central operator of a spatial data infrastructure - to make his spatial dataset(s) also available “on the Web” as all the software that is used is made available under an open source license.

Since the time and resources available for this research are limited, it is also helpful that the scope of the research topic does not cover everything related to spatial data on the Web, but is focused on a few aspects that are particularly important:

crawlability and linkability, i.e. making each resource available via a persistent URI and ensure that all resources can be reached via links from a “landing page” for a data set
classification of the resources using vocabularies supported by the main search engines on the Web
discovery of spatial data by the main search engine
representations of data for consumption by humans (HTML), web-developers (JSON) and search engine crawlers (HTML with annotations)
establishing and maintaining links between data

Our Approach - an Overview

There are ongoing discussions what good practices are for publishing spatial data on the Web.

For example, a joint Spatial Data on the Web working group of W3C and OGC is currently in the process of documenting good practices. In an appendix of this report, we summarize an assessment in how far our work in this testbed follows the current draft of best practices identified by the working group.

A relevant ongoing discussion in that working group is in how far a Linked Data approach with its 5-star ranking and preference for RDF should be the basis for the best practices or whether the approach should be more open with respect to technologies (see, for example, here) as long as they are consistent with the Web architecture.

In our work we are taking more the latter approach, although we keep in mind that given the strong support for Linked Open Data in the Netherlands there is value in following also the Linked Open Data principles. I.e. our design aims at being consistent with the Web as it is today as well as with the Linked Open Data principles.

As organisations already have an infrastructure set up to publish OGC web services, this research investigates, if we can design an intermediate layer that will make data from those OGC web services available in a ‘webby’ way. With ‘webby’ we mean in the context of our research:

All resources are identified using persistent HTTP URIs (when we say “HTTP”, this includes the use of HTTPS throughout this document)
All interaction is using the HTTP protocol and consistent with its design (support for the HTTP verbs, content negotiation, etc.)
All resources are discoverable via search engines
Resources can be accessed and understood by citizens and developers
APIs to access data should be self-describing
Resources are either explicitly linked using HTTP URIs or data is structured so that links can be established dynamically

The research focusses on tools daily used by citizens, such as Google Search or Cortana. It is conducted in the scope of the upcoming wide environmental law, ‘wet van de leefgomgeving’, as the basis for use cases. Questions like ’can i build a fish pond in my garden’ , or, more general: ‘which recent announcements are available for this area’ are envisioned.

Proxy approach

To establish that intermediate layer, proxies are introduced on top of the WFS (data service) and CSW (metadata service) so the contained resources are made available for other communities and supporting the practices they follow.

The “blue” layer in the figure below represents the spatial data infrastructure that addresses needs of the community of geospatial experts.

The “green” proxy layer is intended to support the following communities and their practices (not pretending to be complete):

Search engines
Search engines generally use the schema.org ontology, use HTTP URLs as identifiers and limit their encodings to RDFa/Microdata and JSON-LD.
Linked Open Data (LOD) community
Use a minimal set of common ontologies. Largely influenced by the DBpedia project.
e-government community
Require the use of authoritative ontologies (DCAT, “basisregistraties”, INSPIRE)
Web API developer community
Don’t use ontologies, but have additional best practices, such as GeoJSON, Swagger, CKAN, etc.

An essential part of the proxies is to transform between the resources, their representations as published by the proxies and their hypermedia controls on one hand and the OGC web service requests and response messages and formats. Transformations are usually combined efforts of

selection using queries
schema transformation and
format-serialisation.

Content negotiation plays an important role in this area.

This report discusses the following aspects of the implementation:

the mapping of the data and metadata resources to the identified vocabularies, schema.org and GeoDCAT-ap
the strategy for assigning URIs to the resources
the representations (formats), in which each resource is available
linking across datasets and between data and metadata

Based on this discussion, we document our findings on the research questions.

Schema.org

Relevance of schema.org

This research focusses on common tools like Cortana and search engines. For that reason transformation to the schema.org ontology is relevant since it is the vocabulary understood, and as a result mandated, by the search engines. The scope of Schema.org is to provide a extensive data model of objects commonly advertised on the web. Not all OGC feature types and properties can be transformed without loss of information to schema.org. For that reason schema.org supports extensions. However, it is unclear, if using extensions may have a negative impact when data is indexed or ranked in searches (the testing tools for structured data that the search engine providers offer sometimes report errors when extensions are included, but it is not clear, if that would impact indexing; we do not have any evidence from our research that this is case). In our research we make use of extensions to facilitate applications that do support the extensions.

Mapping of features in datasets to schema.org

Schema.org dictates a certain hierarchy and order for the way data sources must be modeled. Details are given in the documentation of the schemas. For an easy overview the most important geo relations are depicted by us in a drawing of classes (the ovals) and properties (the arrows):

This is a partial representation of schema.org that depicts the most important classes (cirlces) and properties (arrows) in which geo information is presented.

This image makes clear that it is not possible to model every class with every property, but that sometimes intermediate classes must be used to attach properties to a class. For example: a Place does not have a latitude and longitude, but a GeoCoordinates does. So in order to add latitude and longitude to a Place, one needs to model the GeoCoordinates class into the mapping to make the mapping compliant with schema.org.

BAG data

We used the simple BAG WFS to demonstrate the working of schema.org for point datasets. The source XML Schema is listed in Appendix A. This BAG data model contains a postal address and coordinates, so if we use Place as the main class, we can use GeoCoordinates and PostalAddress as secondary classes to model every property necessary.

Mapping of BAG dataset

The BAG WFS proxy landing page can be represented by a schema:Dataset that refers to another Dataset (the layer) via the schema:hasPart property.

Mapping of BAG layer

The layer dataset contains a collection of Places. The property that links the Dataset to the Place is schema:spatial.

Mapping of BAG feature

source	type	destination	type	remark
inspireadressen:inspireadressen	complex type	instance of schema:Place	schema:Place	has URI
		instance of schema:PostalAddress	schema:PostalAddress	Blank Node
all address info		schema:address	rdf:Property	to schema:PostalAddress
straatnaam	element of type string	schema:streetAddress		concatenate these values into schema:streetAddress
huisnummer	element of type string	schema:streetAddress		concatenate these values into schema:streetAddress
huisletter	element of type string	schema:streetAddress		concatenate these values into schema:streetAddress
toevoeging	element of type string	schema:streetAddress		concatenate these values into schema:streetAddress
woonplaats	element of type string	schema:addressLocality	string
postcode	element of type string	schema:postalCode	string
geom	element of type pointPropertyType	schema:geo	rdf:Property	to schema:GeoCoordinates
		instance of type schema:GeoCoordinates	schema:GeoCoordinates	Blank Node
		schema:latitude	number
		schema:longitude	number

A JSON-LD example:

{

"@context" : "http://schema.org",

"@type" : "Place",

"@id" : "http://www.ldproxy.net/bag/inspireadressen/inspireadressen.23079",

"url" : "http://www.ldproxy.net/bag/inspireadressen/inspireadressen.23079",

"geo" : {

"@type" : "GeoCoordinates",

"longitude" : "5.456203403470931",

"latitude" : "51.34206514984284"

"address" : {

"@type" : "PostalAddress",

"streetAddress" : "de Molensteen 6",

"addressLocality" : "Valkenswaard",

"postalCode" : "5554VC"

}

Farmland data

The farmland data XML Schema is listed in Appendix A. It maps to schema.org by using Place as the main class, and GeoShape as the class that contains the polygon coordinates.

source	type	destination	type	remark
aan:aan	complexType	schema:Place		has URI
objectid	element of type integer	ignore		this attribute is a system internal id
aanid	element of type integer	qname of the resource	the QN of the identifier (URI) of the instance of schema:Place, must have the URL of the WFS proxy	see aan:aan
versiebron	element of type string	must be added in an extension to schema.org		there is no property for Place or above Place that can represent “source”
type	element of type string	schema:description	string	the information in the data contains info about the type land
geom	element of type gml:MultiSurfacePropertyType	schema:geo	rdf:Property	to schema:GeoShape
gml:MultiSurfacePropertyType	complexType	instance of schema:GeoShape	Blank Node
		schema:polygon	number list	the information in the data contains the polygon coordinates

A JSON-LD example, which also includes additional properties following the approach described in the JSON-LD standard using null values in the context for now:

{
"@context" : {
"@vocab" : "http://schema.org/",
"versiebron" : null
},
"@type" : "Place",
"@id" : "http://www.ldproxy.net/aan/aan/aan.1",
"url" : "http://www.ldproxy.net/aan/aan/aan.1",
"geo" : {
"@type" : "GeoShape",
"polygon" : "4.73669815895657,53.04210214780725 4.737386778482686,53.04225965287612 4.738189184603757,53.041560320183144 4.738334449604591,53.04144258121632 4.7383578153531065,53.04142028437658 4.737935049254453,53.04125675245224 4.737676098195687,53.04115710925473 4.737337237145229,53.041069844303244 4.737220698890215,53.04104358436285 4.736999734927018,53.0410103498384 4.736914748307701,53.04100668450981 4.736679349996972,53.04146456389503 4.736380359462239,53.04202780911646 4.73669815895657,53.04210214780725"
},
"name" : "1356489",
"versiebron" : "luchtfoto 2013",
"description" : "BTR-landbouw"
}

Bekendmakingen data

The testbed facilitates to research how search engines respond to a mapping between an existing already published Linked Data set that is linked by the WFS proxy. We have used a website that publishes government notification. This dataset supports the following use case: a person that is located at a particular “Place” wants to know “what official notifications have been published by governmental organisations at this location?”.

The data that is behind the website https://www.officielebekendmakingen.nl/ is served by more than one type of service, amongst them a WFS service and a SRU service. Output from the SRU service filtered on ‘Valkenswaard’ was used to set up two versions of a Linked Data version of “Bekendmakingen”, a spatial version with coordinates and no postal code, and an administrative version with postal codes and no coordinates. The data was not cleansed to illustrate issues with data quality.

We have used a selection of properties to represent the bekendmakingen in both services. The schema.org class used is Report. The non spatial version is represented as follows:

The spatial version like this:

It is important to mention that the address information in the original Bekendmakingen dataset that was retrieved is quite incomplete. Many address fields are empty, and the address information is a scatter of text snippets in the other fields. At best the postal code field is used with a 4 digit postal code. Needless to say this does not help with automatically trying to integrate this data with other data sets.

Both datasets are published in a SPARQL endpoint and published dereferenceable via their URLs. For details see: https://github.com/geo4web-testbed/topic4-task2.

Mapping of metadata records to schema.org

Catalog records are typically structured as ISO 19139 XML documents. As part of this research we’ve looked at optimal scenario’s to transform this metadata to schema.org and make it crawlable by search engines.

Schema.org does not have a concept of a catalog record, so we map each ISO 19139 catalog record describing a resource directly to a schema.org class.

As part of ISO 19115 the hierarchyLevel element determines the type of object described in the metadata. The available types are listed in the ScopeCode codelist of ISO 19115. For each of these concepts a different schema.org class should be used.

dataset	http://schema.org/Dataset
series	http://bib.schema.org/Collection
service	http://schema.org/Service
application	http://schema.org/SoftwareApplication
collectionHardware	?
nonGeographicDataset	http://schema.org/Dataset
dimensionGroup	?
featureType	http://schema.org/Dataset
model	?
tile	http://schema.org/Dataset
fieldSession	?
collectionSession	?
other	http://schema.org/Thing

For quite some of the hierarchylevels no suitable classes are available in schema.org, creating a schema.org extension or using one of the available extensions may make sense. In the scope of this research only metadata describing datasets, services and series is relevant. Other hierarchy levels are not commonly used in the current national spatial data infrastructure. Schema.org does not have an exact matching concept for dataset series. A potential approach is to model the series as bib:Collection.

Dataset descriptions structured as ISO 19139 map quite well on Schema.org/Dataset. Most obvious missing properties are typical spatial properties such as spatial resolution and projection info. Spatial extent of a dataset is poorly modelled in schema.org. Most of the ISO 19139 metadata uses a bounding box element to indicate the spatial extent of a dataset. Schema.org expects (in line with DCAT) an identifier of a geographic location. Off course a transformation may create a location object specifically for the dataset having a geometry property similar to the dataset bounds, but from a linking perspective less powerful. Instead we recommend dataproviders creating ISO 19139 documents, to include both a geographic identifier as a geographic extent element as the location of a dataset.

For each ISO 19139 having a link to a WFS, the original WFS endpoint is provided as a schema.org/DataDownload.contentUrl. The dutch metadata profile requires the featuretypename be made available in the name element of the gmd:online section. This property can be used to create a getfeature operation as part of the contentUrl that will return the full dataset in any of the formats supported by the WFS.

For those catalog records describing a dataset that is available via the ldproxy software an additional schema.org/DataDownload.contentUrl is added with a link to ldproxy. It would be interesting to experiment with automatic configuration of the ldproxy software for any WFS endpoint available in the catalog.

To support such a workflow from the catalogue to ldproxy automatically configured on any WFS, the catalogue should notify ldproxy of the wfs-url to be used. ldproxy supports this by a query using the WFS URL. For example, http://www.ldproxy.net/?wfsUrl=http://geodata.nationaalgeoregister.nl/inspireadressen/wfs.

This URL will return a HTTP 307 redirect to the proxy service http://www.ldproxy.net/bag/.

The WFS URLs are normalized by removing known parameters like SERVICE, VERSION, etc.

If the WFS proxy does not have a configuration for that WFS yet, it has to be set up first. An idea for the future would be to add an API to ldproxy that can be used by the catalogue (or others) to retrieve and set up such a proxy endpoint. That administration endpoint would need to be protected, so it can not be misused.

Testing in Structured Data Testing Tool

While testing the schema.org document in google structured dataset testing tool, we identified that google enforces some additional, non documented, properties as part of the contactpoint information. For example a url and contact-type are required and a phone number should have the international syntax. It is to be expected that newer versions of (implementations of) the ontology will introduce more restrictions, as soon as the search engines will start to interact with the structured data about datasets.

Mapping of other WFS resources to schema.org

WFS landing page

An OGC Web Feature Service provides access to a dataset. As described above, the proxy landing page for a WFS is therefore represented by a schema:Dataset.

Information from the capabilities document of the WFS is mapped to schema.org properties, where possible.

Service identification section:

ows:ServiceIdentification/ows:Title → schema:name
ows:ServiceIdentification/ows:Abstract → schema:description
ows:ServiceIdentification/ows:Keywords/ows:Keyword → schema:keywords
ows:ServiceIdentification/ows:AccessConstraints → schema:licence

Service Provider section:

ows:ServiceProvider → schema:provider with a schema:Organization resource as its value

ows:ProviderName → schema:name
ows:ServiceContact → schema:contactPoint with a schema:ContactPoint resource as its value.
ows:ServiceContact/ows:ContactInfo/ows:Address → schema:address with a schema:PostalAddress resource as its value.
Google requires that schema:url is provided in several places of a schema:Organization. If the information is not included in the capabilities, we have used a URL in the domain example.com, but that is of course an issue that would need to be addressed in a proper solution, for example, by working with the WFS provider to add the information or by configuring the missing information in the proxy service.

Feature type section:

ows:FeatureTypeList/ows:FeatureType → schema:hasPart with a schema:Dataset resource as its value
The combined bounding box of all feature types is also mapped to schema:spatial with a blank node schema:Place, which has a schema:geo property with a blank node schema:GeoShape with a schema:box property.

The URL of the GetCapabilities request is mapped to schema:sameAs.

In addition, we also want to include the link to the dataset metadata, which is stored in an OGC Catalog Service (CSW). The value is represented as schema:isPartOf with a schema:DataCatalog resource. The general approach is discussed in more detail here.

For the dataset metadata, ldproxy extracts the metadata URL information from the WFS capabilities document, if available. In an INSPIRE service the link to the service metadata is contained in the ows:ExtendedCapabilities.

The structured data according to schema.org from the BAG WFS is shown below (using Google’s Structure Data Testing Tool):

WFS feature type

A feature type is also represented by a schema:Dataset, which is part of the WFS dataset described above.

Information from the capabilities document of the WFS is mapped to schema.org properties, where possible.

Feature type section:

ows:FeatureTypeList/ows:FeatureType/ows:Title → schema:name
ows:FeatureTypeList/ows:FeatureType/ows:Abstract → schema:description
ows:FeatureTypeList/ows:FeatureType/ows:Keywords/ows:Keyword → schema:keywords
ows:FeatureTypeList/ows:FeatureType/ows:WGS84BoundingBox → schema:spatial with a blank node schema:Place, which has a schema:geo property with a blank node schema:GeoShape with a schema:box property.
ows:FeatureTypeList/ows:FeatureType/ows:MetadataURL → schema:isPartOf with a schema:DataCatalog resource. The URL of the DataCatalog is constructed as described above.

The WFS dataset is included using schema:isPartOf.

In case any named subsets have been configured, like the municipalities for the BAG dataset, the list of subsets is referenced using schema:hasPart with a schema:Dataset resource as its value. Note that the use of hasPart does not seem right as this is an address dataset and the list of municipalities is just derived information - distinct values of all municipality attributes; i.e., this aspect is an open issue.

In addition, (up to) 25 features of the dataset are shown, basically a page. As schema.org does not include any pagination support, links to other pages are not included.

The structured data according to schema.org from the address feature type in the BAG WFS is shown below (using Google’s Structure Data Testing Tool):

Subsets of a WFS feature type

The schema.org representation of a subset of a feature type is basically the same as for the feature type itself.

WFS feature

The mapping to schema.org will differ by feature type. The main elements of the mappings for the address and the farmland features are discussed above.

In addition:

schema:hasMap references the HTML representation which includes a map using leaflet.
schema:sameAs references the feature in the WFS.

The structured data according to schema.org from an address feature in the BAG WFS is shown below (using Google’s Structure Data Testing Tool):

GeoDCAT-ap

A proxy approach as described in this report can also be applied to facilitate spatial data discovery from the linked data community. The proxies should be configured in such a way that they can also return RDF encodings of objects structured using common ontologies from the linked data community. A widely used ontology to describe datasets in the linked data domain is DCAT.

GeoDCAT-ap is an extension on DCAT and PROV developed in the framework of the EU ISA Programme, to provide an ontology to express the full richness of INSPIRE metadata encoded according to ISO 19139. Initially the goal of the GeoDCAT-ap initiative was to facilitate discovery of INSPIRE data in the open data domain. But recent discussions tend to promote GeoDCAT-ap as an alternative metadata format within an INSPIRE discovery service.

Mapping of metadata records to GeoDCAT-ap

Together with the GeoDCAT-AP specification, an XSLT has been released which can transform ISO 19139 metadata to GeoDCAT-AP. We’ve used this XSLT to improve existing RDF export capabilities in GeoNetwork. Until recently RDF/XML could only be exported from GeoNetwork using an RSS type of search. In recent versions RDF/XML can also be exported using CSW and as a full catalog dump. Only RDF/XML is supported, no transformations to turtle or json-ld are currently available.

The GeoDCAT-ap XSLT has been improved on a number of aspects.

The GeoDCAT-ap XSLT tends to create frequent ‘blank nodes’, which is not forbidden, but a bad practice in linked data. Instead GeoNetwork ‘mints’ uri’s for new objects that are created as part of the mapping (eg organisations, locations, etc).
The GeoDCAT-ap XSLT is quite ambiguous on the type of data-links available in the metadata. The reason is that INSPIRE metadata does not require to indicate the type of resource behind a link and XSLT has no capabilities to probe a link, to derive the type of resource. The dutch metadata profile however requires data providers to indicate the type of resource behind a link, which facilitates the mapping for the type of data link.
The dutch profile requires explicit conventions for stating types of open data licences. This facilitates the link to DCAT.

An example of a transformed ISO 19139 document is available in Appendix C.

Besides DCAT also the VOID ontology is relevant in the scope of our research. DCAT is widely used to describe traditional datasets in a structure other than RDF, VOID is used to describe datasets that are structured as RDF. As part of this research we suggest ways to convert non RDF data structures to RDF. To make those structures discoverable on the semantic web, VOID is a relevant ontology.

To facilitate Semantic Web Bots a SPARQL endpoint can be set up based on a (nightly) full RDF dump of the catalogue. Alternatively semantic web users (bots and people) can follow links to metadata URI’s from external sources (using a content negotiation accept header “application/rdf+xml”).

URI strategy

General considerations

The Platform Linked Data Nederland has formulated a national strategy for the minting of URIs. The idea behind this is that by formulating a number of recommendations for minting URIs organisations have a quick start procedure. The URI strategy describes a pattern for URIs: http://{domain}/{type}/{concept}/{reference}

Since the publication of this strategy there have been a number of review sessions in which lessons learned have been discussed. One lesson was that “what could work for one use case might not work for another”, so the “URI strategy” is now more of a guideline than a strategy.

As the URI strategy may well have an impact on search engine crawling and ranking, and since it is important for APIs and developers, the work on research topic 3 has compared different URI strategies. The findings are documented in their report. In our work with the proxies, we have implemented a straightforward URI schema that is documented in this section.

Depending on the use cases, some resources which are “child” properties of higher-level resources, may not be referenced directly. Our general approach was to not assign a persistent URI to those resources, i.e. use blank nodes, as they do not need to be linked.

A typical example is geometry, which we consider as “owned” by the feature (schema:Place in schema.org terminology) as sharing geometry is out-of-scope for our experiments. In other contexts, this might be handled differently.

WFS proxy

The WFS proxy provides URIs for the following resources.

Resource: WFS landing page

URI template: {baseuri}/{wfs}[/?f={format}]

Example: http://www.ldproxy.net/aan

Remarks:

{baseuri} is the base URI of the ldproxy deployment
{wfs} is configured in the ldproxy when the proxy service is set up
The optional query parameter “format” can be used to select a specific media type, if HTTP content negotiation cannot be used. Values are:

“html” for HTML (media type: “text/html”), this is the default value
“xml” for WFS Capabilities (media type: “application/xml”)

Resource: WFS feature type

URI template: {baseuri}/{wfs}/{featuretype}[/?page={n}&f={format}]

Example: http://www.ldproxy.net/aan/aan

Remarks:

{featuretype} is taken from the unqualified feature type name in the WFS capabilities
The optional query parameter “page” can be used to select a specific page. The default page is the first page (n=1).
The optional query parameter “format” can be used to select a specific media type, if HTTP content negotiation cannot be used. Values are:

“html” for HTML (media type: “text/html”), this is the default value
“jsonld” for JSON-LD (media type: “application/ld+json”)
“json” for GeoJSON (media type: “application/vnd.geo+json”)
“xml” for GML (media type: “application/gml+xml”)

Resource: WFS feature

URI template: {baseuri}/{wfs}/{featuretype}/{id}[/?f={format}]

Example: http://www.ldproxy.net/aan/aan/aan.81

Remarks:

{id} is the gml:id of the feature
The optional query parameter “format” can be used to select a specific media type, if HTTP content negotiation cannot be used. Values are the same as for the feature type.

Resource: Named subsets of a WFS feature type

URI template: {baseuri}/{wfs}/{featuretype}/?fields={attributes}&distinctValues=true

Example: http://www.ldproxy.net/bag/inspireadressen/?fields=addressLocality&distinctValues=true

Remarks:

{attributes} is an attribute of the feature type; the attribute must have a simple value
ldproxy must be configured to support the subsetting for the attribute

Resource: WFS feature type subset

URI template: {baseuri}/{wfs}/{featuretype}/?{attributes}={value}[&page={n}&f={format}]

Example: http://www.ldproxy.net/bag/inspireadressen/?addressLocality=Valkenswaard

http://www.ldproxy.net/bag/inspireadressen/?postalCode=5551AB

Remarks:

{value} is the attribute value to select the relevant features
ldproxy must be configured to support the subsetting for {attribute}
The optional query parameter “page” can be used to select a specific page. The default page is the first page (n=1).
The optional query parameter “format” can be used to select a specific media type, if HTTP content negotiation cannot be used. Values are the same as for the feature type.

CSW proxy

If a catalog needs to be accessed, which does not offer similar transformations and access methods as implemented in GeoNetwork, an approach is envisioned in which a component within GeoNetwork acts as a CSW proxy to that service. The component propagates a request for a catalog record to the CSW service and returns the CSW results in the requested RDFa or RDF/xml format. This situation occurs, if a WFS capabilities contains a link to a CSW service which is in a domain that is not nationaalgeoregister.nl. Two approaches are possible:

Call the CSW proxy directly with the url-encoded CSW url as a query parameter, the proxy will retrieve the document and forward the request to the proper url to retrieve the result
Introduce an api on which the WFS proxy can retrieve the identification of an existing endpoint or register a new endpoint and use that identification to call the CSW proxy.

A basic implementation of the first approach is added to the catalog used in the project.

Representations

WFS proxy

This section describes how the ldproxy software, our WFS proxy implementation, represents the different types of resources.

All representations are both available via content negotiation and by adding a query parameter. The query parameter option is mainly available to support clickable links and to enable copy & paste of the URL into a browser.

Main page

HTML

The main page of an ldproxy instance simply lists the Web Feature Services that are available through the proxy using the title and abstract provided by the WFS. At the moment, it does not have any schema.org markup or any other representations.