Tyler Procko - Resources

Under construction...

Resources

This page contains Semantic Web and Linked Data resources useful for the researcher. In my experience in this domain, learning its constructs is not trivial. Despite the pretense at widespread interconnection, the Semantic Web community is fractured and confused; and its best resources are left in large technical documents, scattered across the Web, with broken links abounding. Moreover, its best practitioners are difficult to pinpoint. In this page, I intend to provide as much of my own insight as possible in this regard. If I can save one researcher even a minute, I am happy to do so. I am aware how daunting research in this domain is.

People

Michael K. Bergman

Co-founder of Cognonto and the founder of the KBPedia project; writer of many Semantic Web articles and books; his site is probably the best domain resource there is

Link

Archive

Henriette Harmse

European Bioinformatics Institute; hosts a good Semantic Web blog, relevant and responds as of 2023; experienced in DLs

Link

Archive

Barry Smith

Philosopher turned OWL ontologist; primary creator of the Basic Formal Ontology (BFO) and the OBO Foundry

Link

Ontology Repos

OBO Foundry

Open Biological and Biomedical Ontology Foundry; home of BFO and all BFO-compliant ontologies; mostly biomedical in nature

Link

Archive

Ontobee

The default server for most OBO ontologies; hosts a lot of ontologies not found on the OBO Foundry

Link

Archive

AberOWL

Semantic search engine for ontology-based access to biological data

Link

Archive

Ontology Lookup Service (OLS)

A server for ontological biology data

Link

Archive

BioPortal

Self-described "world's most comprehensive repository of biomedical ontologies"

Link

Archive

Ontohub

A central aggregate hub for ontology repositories (like BioPortal) and their ontologies

Link

Archive

Linked Open Vocabularies (LOV)

The most popular hub for Linked Data and Linked Data vocabularies; no focus on ontologies and DLs like the biology repos

Link

Archive

DBPedia Archivo

DBPedia's automatic Web crawler service that finds OWL ontologies on the Web and rates them based on their 5-star rating scheme; the "back-end" service of DBPedia's Databus

Link

Archive

Industrial Ontologies Foundry (IOF)

A project of Barry Smith's and the OBO Foundry; a BFO-hub-and-spokes ontology hub for industrial ontologies, as opposed to the biomedical focus of the OBO Foundry

Link

Archive

Industry Portal

Ontologies in various industry domains

Link

Upper Ontologies

TBD

W3C Specific

TBD

Coterie

What is a coterie?

A coterie is a small group of people with similar interests. Because of the confusing nature of the Semantic Web, I have found some related "resource dumps". These are given here.

Awesome Semantic Web

The most exhaustive and up-to-date list of Semantic Web resources, covering everything from standards to code libraries to companies; in the form of an "awesome list" in a Github repo

Link

Request For Comments (RFC) Series

What is an RFC Series? RFC stands for Request for Comments. The first RFC Series came about with the advent of the Internet as a result of the ARPANet project. It is, more or less, a series of memos, notes and technical documents intended at documenting the history of Internet projects. I would use the term "blog", but its connotations for me are unprofessional.

Semantic Web

Why did the Semantic Web fail?

Tim Berners-Lee has a new project for personal data governance called the Solid Project. In the FAQs page of that project (archive here), the Solid team mentions that Linked Data and the Semantic Web 'never took off'. I have researched in this domain for some time now. Through my research, and discussions with other expert practitioners (e.g., Nicholas del Rio, Jarno van Driel), it is apparent that the reason for the distinct failure of the Semantic Web as an ideal is due, in part, to the fact that the adoption of the W3C's standards was not clear or straightforward for data providers on the Web - it simply was not easy to do. The working groups pushing for the Semantic Web were primarily composed of academic types, not business people or developers, and so the more complex constructs, e.g., OWL, were not ever realized to any great extent on the Web. Also, the advent of social media sites replaced the vision of the Web 3.0 as a Web of interlinked concepts with interactive websites. That being said, the use of Linked Data is common in search engines, e.g., with the Schema.org vocabulary, which is used along with other microdata formats for more accurate searching, recommendations, etc.

OWL and DLs

What is OWL good for?

OWL is a manifestation of Description Logics (DLs); there are various DL profiles for OWL. For most use cases, OWL is exceptional overkill. Inasmuch as OWL is a DL implementation, it abides by the open-world assumption (OWA), which is extremely difficult for most people to reconcile with anything else they do, e.g., OOP, which is closed-world by nature. For example, in Java, we may define a class Human but never give it an attribute for a brain. In closed-world logic, it is inferred that the class of Humans do not possess brains. In an OWL Human class, unless it is explicitly stated that Humans do NOT have brains with an OWL constraint, it cannot be inferred that they do not. In other words, OWL never makes inferences without explicit assertions. This makes understanding its inferences on larger ontologies difficult; even veteran OWL modelers struggle to explain inferences. OWL is a very heavy-handed, complicated modeling language; and, unlike RDF / RDFs, which are rather simple, easily extended and only offer limited reasoning (e.g., subsumption inferencing), OWL is very difficult to extend, because, with each new OWL constraint added, the complexity of the OWA makes erroneous inferences exceptionally hard to diagnose. In my experience, OWL is only used to its full extent in the biomedical domain. Most people who work with an "OWL ontology" do not even come close to fully utilizing the DL underlying OWL: for the most part, they implement a class/relationship hierarchy (this can be done with RDFS), add annotation properties and perhaps a few necessary conditions (subClassOf constraints), and then they never run a reasoner. This is not an OWL ontology. In any case, for most use cases in business, OWL is far too complicated. Furthermore, OWL cannot be used to validate instance data against the ontology, unless one writes very specific SPARQL queries to do so; and, even then, "bad" instance triples cannot be rejected without a specific codebase written for this purpose. A language like the Shapes Constraint Language (SHACL) is a much more appropriate means of ontological modeling for business use cases, because every major graph database platform supports SHACL validation and it is a closed-world language. SHACL can be used to define an ontology by shapes, that is then used automatically on data ingestion to validate new triples. Inveterate OWL modelers like the founder of TopQuadrant have abandoned OWL in favor of SHACL. So, I point everyone I can to SHACL as an alternative to OWL.

AI

How do you reconcile AI/ML with Semantic Web constructs like ontologies?

This will remain as a rather informal response. In the historical sense, ontologies and ML went hand-in-hand: ontologies were fed into ML, and vice versa. For instance, IBM's Watson touted ontology-based ML. In any case, as it stands now, in 2023 and beyond, ontologies are nearly irrelevant in practical use and the research landscape. A Google search for 'Python ML' will return thousands of posts from the same week; a Google search for 'Python ontology' will return posts from over a decade ago. Ontologies are complicated artifacts espoused by very few, while ML consistently explodes in popularity and use. Not many individuals can put forth the effort to abandon the closed-world logic driving literally everything we do as humans to fully comprehend the open-world logic of description logics, which ontologies (at least, OWL ontologies) subscribe to. But, collecting large amounts of data, training a model, getting a resultant vector matrix and using it? That is more approachable. And the research landscape is hot, with new articles being published literally by the minute. There simply is not the support for ontology work. Steadfast ontologists like Barry Smith are probably injured (understandably) by the exploding popularity of undocumented, unexplainable ML. Veterans of the Semantic Web space like Jens Lehmann have, in a sense, tipped the proverbial hat to the rise of ML and general AI. Whatever the thinking, ontologies still have some purpose. For instance, the heart of an ontology, its taxonomy, is essential in everything: models need to be classified to be found and used, as does training data. But for ontologists, true ontology work seems to be a concern only for biomedical clients. It's a bad game to be in.