OCO for ontology architects

Audience perspective · ontology architects

Four layers, three orthogonal axes, 132 recorded ADRs — transparently auditable

OCO is not a concept paper but an implemented, validated, released distribution. Every architectural decision is documented as an ADR with rationale. Every layer is internally consistent (HermiT-validated, 0 inconsistent classes). Every licence assignment is REUSE-3.3-compliant per path. Every bridge mapping is SHA-pinned against an upstream version. Here is what you actually need as an architect to evaluate the architecture.

What you get

  • Three orthogonal classification axes (layer · audience · mechanistic), each with its own logic and version life-cycle — a consumer selects depth on each axis separately.
  • 132 documented architectural decisions, of which seven are particularly consequential (curated on the paper detail page), including negative ADRs for rejected modelling alternatives — you see not only what, but also what was deliberately not.
  • Full validation published: all 9 Norouzi quality requirements met, OOPS! pitfall audit clean, 52/52 SPARQL tests PASS, 1,120 SHACL NodeShapes, HermiT-validated with 0 inconsistent classes.
  • Open/closed mix as an architectural property: the modular cut of the three axes enables per-module licence differentiation (CC-BY for L0, CC-BY-SA for material-agnostic L1, proprietary for supplier/material detail/compliance/L2/L3) — structurally, not as an all-or-nothing workaround.

The problem the architecture solves — 94 ontologies that don’t speak to each other

Today’s materials science landscape contains dozens of ontologies, built by research projects unaware of each other. Every new initiative starts from scratch and re-models the same laboratory processes a neighbouring project already covered.

94 MSE ontologies mapped in the field Norouzi et al. 2024
40+ of those structurally incompatible Rajamohan et al. 2025
~70 % workflow duplication across PMD consortium projects Norouzi et al. 2024

The German MaterialDigital platform alone bundles more than a dozen projects — KnowNow, SmaDi, KupferDigital, GlasDigital, StahlDigital, DiProMag, iBain, Mieller-Ferrit — each starting with its own material-specific ontology. Spray dryers, XRD diffractometers, poling stations: re-modelled in every project, modelled differently in every project.

And yet only a small fraction of that knowledge is genuinely material-specific. The bulk — workflow provenance, equipment classification, measurement methods, identifier schemes — is material-agnostic and reusable across all domains. This duplication is not just wasteful; it defeats comparability between projects — the actual purpose of digital materials research. The three-axis architecture below is the direct answer to that problem.

The three orthogonal axes

The central architectural claim of the paper: OCO classifies not along one axis but along three independent ones. Every module sits on each axis at the level appropriate to its content. Consumers select subsets per axis.

AxisValuesAnswersSelected by
1 — Layer of abstractionL0 · L1 · L2 · L3How deep should consumption go? Bridge-only or with reasoning axioms?every consumer, via distribution bundle
2 — Audiencematerial · compliance (+ dual)Materials research or EU regulatory?every consumer, via audience marker
3 — Mechanistic explanation depth7 layers (symmetry → bonding)Which causal reasoning chain should be queryable?material-audience consumers, optional

The orthogonality is not a slogan but pulled through structurally: a module like oco-symmetry sits on axis 1 at L2, on axis 2 in material audience, on axis 3 at layer 1 — and each of these three placements is independently versionable. A polymer L2 would replace axis 1 without touching axes 2 and 3. A compliance consumer loads axis 2 without the others.

This is the answer to the three simultaneous challenges a productive materials ontology must solve today: horizontal fragmentation of the MSE landscape, vertical EU-regulatory convergence, mechanistic explanation depth. One architectural primitive (modular layering on an orthogonal axis) solves all three without collapsing them onto each other.

The four layers (axis 1)

LayerContentReasoner profileLicence default
L0 — BridgePure anchors to existing standards (PMDco, QUDT, EMMO, CIF, PROV-O, …)RDFSCC-BY 4.0
L1 — Material-agnostic skeletonSample, equipment, measurement, identifier, provenance, investigation, process; tensor roots, role individuals, cross-axiomsRDFSCC-BY-SA 4.0 (except supplier/material/compliance)
L2 — Material / methods specificMaterial classes (230 space groups, 32 coupled effects, Kröger-Vink, Newnham, phases), compliance detail (CSRD/LCA/CSDDD/CBAM/AI Act/…)OWL 2 ELproprietary
L3 — Categorical reasoning325 logical axioms (route templates, lifecycle constraints, symmetry-effect coupling); 5,920 reified Neumann constraintsOWL 2 DLproprietary

Layer separation is consequent: each layer has a clearly bounded responsibility, importable downwards but not coupling upwards. L0 knows nothing about L1; L1 knows nothing about L2. A materials ontology for polymers can reuse L0+L1 and replace only L2 — without re-modelling the agnostic laboratory layer.

OCO in numbers

5,196 classes across 44 modules (material 29 · compliance 15)
1,674 properties (574 object + 1,051 datatype + 49 annotation)
167,348 axioms total, of which 40,454 logical
11 substantial L0 bridges (+ 829 cross-ontology mappings across 40 sections)
325 logical L3 reasoning axioms
1,120 SHACL NodeShapes for data validation
52 / 52 SPARQL tests passing (gold-standard ABoxes)
132 recorded architecture decisions

Consistency validation: HermiT reports 0 unsatisfiable classes. Pellet validation as cross-check.

Expressivity: the full distribution sits in OWL 2 DL with 0 internal profile violations (ROBOT validate-profile DL). The L0+L1+L2 bundle (without L3) reduces to OWL 2 EL — consumers choose their reasoner profile via distribution.

Licence architecture: REUSE 3.3 compliant (FSFE standard). All 5,352 repository files carry SPDX-License-Identifier annotations. Per-path mapping in REUSE.toml (ADR-103 v3).

Norouzi validation: all nine quality requirements met

The Norouzi et al. (2024) study classifies 94 MSE ontologies and defines nine quality requirements against which OCO v0.94 is fully validated:

REQRequirementOCO compliance
REQ1Modularity44 modules, 4 layers, 2 audiences — explicitly separated
REQ2AdaptabilityCache pattern with version-pinned manifests, 40 bridge sections independently versionable
REQ3Interoperability11 substantial L0 targets, 829 cross-ontology mappings, ELN-Filetype bridge (14 systems)
REQ4Purpose (CQ anchoring)163 published CQs across 10 reasoning areas, 52 executable with gold-standard ABoxes
REQ5Equality (bilingual definitions)DE/EN as SHACL-mandatory fields — among the 94 Norouzi ontologies the only one with a hard schema constraint for this
REQ6CompatibilityOWL 2 DL compatible, ROBOT validate-profile clean, HermiT-consistent
REQ7FunctionalityNeumann engine + phase-state coupling as operative reasoning components
REQ8Authoritativeness132 ADRs with rationale; seven-layer mechanistic skeleton makes “why?” queryable
REQ9FacetednessMulti-axis classification for material parameters (role × reference × material abstraction)

Additionally: OOPS! pitfall scanner audit clean (0 critical, 0 important pitfalls). 5 of 9 OCO engineering principles (modularity, adaptability, interoperability, authoritativeness, facetedness, plus the three modularity extensions) go beyond the Norouzi REQ canon.

Sister-project reuse — the L1 reusability claim

The architecture is designed so that a metallurgy, polymer, battery, or pharmaceuticals ontology can share L0+L1 with OCO and replace only L2. The equipment, sample, provenance, identifier, investigation classes stay unchanged.

The first empirical evidence pass is in progress: a second ceramic material system (ferritic high-performance ceramics) is being developed in parallel as a sister pilot and validates L1 reusability within the ceramics family. The ferritic variant additionally stresses the seven-layer skeleton on the magnetic side (Bloch / Néel walls, superexchange, Jahn-Teller distortion).

Open invitation to sister domains: metallurgy, polymers, batteries, pharmaceuticals — we share L0+L1, you contribute your own L2. That is the first genuine cross-domain validation of the architecture — and the reason the release is called v0.94, not v1.0.

Architectural argument: open/closed mix as a property, not a workaround

A non-obvious benefit of strict layer separation: the architecture enables a mixed open/proprietary distribution model that flat ontologies cannot offer.

In the current release: L0 under CC-BY 4.0 (mirror of bridge targets PMDco, EMMO, PROV-O, FaBiO; QUDT Apache-2.0). L1 without oco-supplier under CC-BY-SA 4.0, with dual-licensing to CC-BY on request. oco-supplier, all of L2 and L3, and all compliance modules under project confidentiality and proprietary in the present release.

The same modular boundaries that make this split clean would, in a future configuration, also enable a per-module choice: a sister project working under different commercial constraints could open or close a different subset without restructuring. This mix is structurally impossible in monolithic ontologies (everything-or-nothing open) and in purely proprietary industrial schemas (no external adoption).

External caches as an architectural pattern

Depth without TBox inflation: reference-data corpora are wired in as version-pinned local caches, not embedded into the OWL hierarchy. The OWL TBox stays at ~5,200 classes (reasoner-trivial); consumers can still query against ~155,000 Materials Project DFT records, 1,934 IUCr BVPs, 1,731 Wyckoff positions, 497 Shannon radii, and 91 Pauling values. Each cache SHA-pinned in bridge/external_versions.yaml.

Architecturally critical, because a flat alternative would either embed reference data as TBox classes (intractable for the reasoner) or leave them outside the knowledge graph entirely (loss of provenance). The cache pattern handles both problems simultaneously.

Example competency questions

Six of the 163 published CQs that are particularly relevant for ontology architects and reviewers.

Which ADRs document the audience axis, and which modelling alternatives were rejected?

plan/decisions/* · ADR index

Which L3 axioms lift the distribution from OWL 2 EL to OWL 2 DL?

oco-master_full.ttl · ROBOT validate-profile · executable SPARQL

Which SHACL NodeShape verifies the cross-layer annotation between layer 2 (Energy/DFT) and layer 6 (defect chemistry)?

shapes/m72_cross_layer_shapes.ttl · ADR-134

Which Norouzi REQs are currently met, and with which concrete architectural measure?

audit/norouzi_req_coverage.md · REQ1-REQ9

Which bridges have been updated since the last release, and what mapping count do they have?

bridge_mappings.yaml · bridge/external_versions.yaml

Which module profiles exist as owl:imports wrappers, and which modules do they bundle?

oco_eln_profile.ttl · oco_materials_db_profile.ttl · oco_ml_profile.ttl

Relation to the OCO distribution

The full licence architecture — REUSE 3.3 with per-path SPDX mapping across all 5,352 repository files — is documented in the OCO distribution. CC-BY-4.0 portions (bridge/**) and CC-BY-SA-4.0 portions (L1 skeletons of material-audience modules except supplier/material/compliance) are directly downloadable; the proprietary portions (compliance complete, material/supplier detail, L2/L3, SHACL implementation, L3 axioms, generator stack) on request. ROBOT validation reports, Norouzi REQ coverage audit, and the full ADR catalogue are part of the proprietary documentation package. → Distribution & licence architecture

Paper

Full text of the architecture description, BibTeX citation, and seven selected ADRs in detail.

View paper
Distribution & licence

The four distribution variants and their terms — what’s public, what’s on request.

View distribution
Direct contact

Discuss an architecture review or sister-project cooperation — usually answered within 24 hours.

Get in touch

← Back to home