Four layers, three orthogonal axes, 132 recorded ADRs — transparently auditable
OCO is not a concept paper but an implemented, validated, released distribution. Every architectural decision is documented as an ADR with rationale. Every layer is internally consistent (HermiT-validated, 0 inconsistent classes). Every licence assignment is REUSE-3.3-compliant per path. Every bridge mapping is SHA-pinned against an upstream version. Here is what you actually need as an architect to evaluate the architecture.
What you get
- Three orthogonal classification axes (layer · audience · mechanistic), each with its own logic and version life-cycle — a consumer selects depth on each axis separately.
- 132 documented architectural decisions, of which seven are particularly consequential (curated on the paper detail page), including negative ADRs for rejected modelling alternatives — you see not only what, but also what was deliberately not.
- Full validation published: all 9 Norouzi quality requirements met, OOPS! pitfall audit clean, 52/52 SPARQL tests PASS, 1,120 SHACL NodeShapes, HermiT-validated with 0 inconsistent classes.
- Open/closed mix as an architectural property: the modular cut of the three axes enables per-module licence differentiation (CC-BY for L0, CC-BY-SA for material-agnostic L1, proprietary for supplier/material detail/compliance/L2/L3) — structurally, not as an all-or-nothing workaround.
The problem the architecture solves — 94 ontologies that don’t speak to each other
Today’s materials science landscape contains dozens of ontologies, built by research projects unaware of each other. Every new initiative starts from scratch and re-models the same laboratory processes a neighbouring project already covered.
The German MaterialDigital platform alone bundles more than a dozen projects — KnowNow, SmaDi, KupferDigital, GlasDigital, StahlDigital, DiProMag, iBain, Mieller-Ferrit — each starting with its own material-specific ontology. Spray dryers, XRD diffractometers, poling stations: re-modelled in every project, modelled differently in every project.
And yet only a small fraction of that knowledge is genuinely material-specific. The bulk — workflow provenance, equipment classification, measurement methods, identifier schemes — is material-agnostic and reusable across all domains. This duplication is not just wasteful; it defeats comparability between projects — the actual purpose of digital materials research. The three-axis architecture below is the direct answer to that problem.
The three orthogonal axes
The central architectural claim of the paper: OCO classifies not along one axis but along three independent ones. Every module sits on each axis at the level appropriate to its content. Consumers select subsets per axis.
| Axis | Values | Answers | Selected by |
|---|---|---|---|
| 1 — Layer of abstraction | L0 · L1 · L2 · L3 | How deep should consumption go? Bridge-only or with reasoning axioms? | every consumer, via distribution bundle |
| 2 — Audience | material · compliance (+ dual) | Materials research or EU regulatory? | every consumer, via audience marker |
| 3 — Mechanistic explanation depth | 7 layers (symmetry → bonding) | Which causal reasoning chain should be queryable? | material-audience consumers, optional |
The orthogonality is not a slogan but pulled through structurally: a module like oco-symmetry sits on axis 1 at L2, on axis 2 in material audience, on axis 3 at layer 1 — and each of these three placements is independently versionable. A polymer L2 would replace axis 1 without touching axes 2 and 3. A compliance consumer loads axis 2 without the others.
This is the answer to the three simultaneous challenges a productive materials ontology must solve today: horizontal fragmentation of the MSE landscape, vertical EU-regulatory convergence, mechanistic explanation depth. One architectural primitive (modular layering on an orthogonal axis) solves all three without collapsing them onto each other.
The four layers (axis 1)
| Layer | Content | Reasoner profile | Licence default |
|---|---|---|---|
| L0 — Bridge | Pure anchors to existing standards (PMDco, QUDT, EMMO, CIF, PROV-O, …) | RDFS | CC-BY 4.0 |
| L1 — Material-agnostic skeleton | Sample, equipment, measurement, identifier, provenance, investigation, process; tensor roots, role individuals, cross-axioms | RDFS | CC-BY-SA 4.0 (except supplier/material/compliance) |
| L2 — Material / methods specific | Material classes (230 space groups, 32 coupled effects, Kröger-Vink, Newnham, phases), compliance detail (CSRD/LCA/CSDDD/CBAM/AI Act/…) | OWL 2 EL | proprietary |
| L3 — Categorical reasoning | 325 logical axioms (route templates, lifecycle constraints, symmetry-effect coupling); 5,920 reified Neumann constraints | OWL 2 DL | proprietary |
Layer separation is consequent: each layer has a clearly bounded responsibility, importable downwards but not coupling upwards. L0 knows nothing about L1; L1 knows nothing about L2. A materials ontology for polymers can reuse L0+L1 and replace only L2 — without re-modelling the agnostic laboratory layer.
OCO in numbers
Consistency validation: HermiT reports 0 unsatisfiable classes. Pellet validation as cross-check.
Expressivity: the full distribution sits in OWL 2 DL with 0 internal profile violations (ROBOT validate-profile DL). The L0+L1+L2 bundle (without L3) reduces to OWL 2 EL — consumers choose their reasoner profile via distribution.
Licence architecture: REUSE 3.3 compliant (FSFE standard). All 5,352 repository files carry SPDX-License-Identifier annotations. Per-path mapping in REUSE.toml (ADR-103 v3).
Norouzi validation: all nine quality requirements met
The Norouzi et al. (2024) study classifies 94 MSE ontologies and defines nine quality requirements against which OCO v0.94 is fully validated:
| REQ | Requirement | OCO compliance |
|---|---|---|
| REQ1 | Modularity | 44 modules, 4 layers, 2 audiences — explicitly separated |
| REQ2 | Adaptability | Cache pattern with version-pinned manifests, 40 bridge sections independently versionable |
| REQ3 | Interoperability | 11 substantial L0 targets, 829 cross-ontology mappings, ELN-Filetype bridge (14 systems) |
| REQ4 | Purpose (CQ anchoring) | 163 published CQs across 10 reasoning areas, 52 executable with gold-standard ABoxes |
| REQ5 | Equality (bilingual definitions) | DE/EN as SHACL-mandatory fields — among the 94 Norouzi ontologies the only one with a hard schema constraint for this |
| REQ6 | Compatibility | OWL 2 DL compatible, ROBOT validate-profile clean, HermiT-consistent |
| REQ7 | Functionality | Neumann engine + phase-state coupling as operative reasoning components |
| REQ8 | Authoritativeness | 132 ADRs with rationale; seven-layer mechanistic skeleton makes “why?” queryable |
| REQ9 | Facetedness | Multi-axis classification for material parameters (role × reference × material abstraction) |
Additionally: OOPS! pitfall scanner audit clean (0 critical, 0 important pitfalls). 5 of 9 OCO engineering principles (modularity, adaptability, interoperability, authoritativeness, facetedness, plus the three modularity extensions) go beyond the Norouzi REQ canon.
Sister-project reuse — the L1 reusability claim
The architecture is designed so that a metallurgy, polymer, battery, or pharmaceuticals ontology can share L0+L1 with OCO and replace only L2. The equipment, sample, provenance, identifier, investigation classes stay unchanged.
The first empirical evidence pass is in progress: a second ceramic material system (ferritic high-performance ceramics) is being developed in parallel as a sister pilot and validates L1 reusability within the ceramics family. The ferritic variant additionally stresses the seven-layer skeleton on the magnetic side (Bloch / Néel walls, superexchange, Jahn-Teller distortion).
Open invitation to sister domains: metallurgy, polymers, batteries, pharmaceuticals — we share L0+L1, you contribute your own L2. That is the first genuine cross-domain validation of the architecture — and the reason the release is called v0.94, not v1.0.
Architectural argument: open/closed mix as a property, not a workaround
A non-obvious benefit of strict layer separation: the architecture enables a mixed open/proprietary distribution model that flat ontologies cannot offer.
In the current release: L0 under CC-BY 4.0 (mirror of bridge targets PMDco, EMMO, PROV-O, FaBiO; QUDT Apache-2.0). L1 without oco-supplier under CC-BY-SA 4.0, with dual-licensing to CC-BY on request. oco-supplier, all of L2 and L3, and all compliance modules under project confidentiality and proprietary in the present release.
The same modular boundaries that make this split clean would, in a future configuration, also enable a per-module choice: a sister project working under different commercial constraints could open or close a different subset without restructuring. This mix is structurally impossible in monolithic ontologies (everything-or-nothing open) and in purely proprietary industrial schemas (no external adoption).
External caches as an architectural pattern
Depth without TBox inflation: reference-data corpora are wired in as version-pinned local caches, not embedded into the OWL hierarchy. The OWL TBox stays at ~5,200 classes (reasoner-trivial); consumers can still query against ~155,000 Materials Project DFT records, 1,934 IUCr BVPs, 1,731 Wyckoff positions, 497 Shannon radii, and 91 Pauling values. Each cache SHA-pinned in bridge/external_versions.yaml.
Architecturally critical, because a flat alternative would either embed reference data as TBox classes (intractable for the reasoner) or leave them outside the knowledge graph entirely (loss of provenance). The cache pattern handles both problems simultaneously.
Example competency questions
Six of the 163 published CQs that are particularly relevant for ontology architects and reviewers.
Which ADRs document the audience axis, and which modelling alternatives were rejected?
plan/decisions/* · ADR indexWhich L3 axioms lift the distribution from OWL 2 EL to OWL 2 DL?
oco-master_full.ttl · ROBOT validate-profile · executable SPARQLWhich SHACL NodeShape verifies the cross-layer annotation between layer 2 (Energy/DFT) and layer 6 (defect chemistry)?
shapes/m72_cross_layer_shapes.ttl · ADR-134Which Norouzi REQs are currently met, and with which concrete architectural measure?
audit/norouzi_req_coverage.md · REQ1-REQ9Which bridges have been updated since the last release, and what mapping count do they have?
bridge_mappings.yaml · bridge/external_versions.yamlWhich module profiles exist as owl:imports wrappers, and which modules do they bundle?
Relation to the OCO distribution
The full licence architecture — REUSE 3.3 with per-path SPDX mapping across all 5,352 repository files — is documented in the OCO distribution. CC-BY-4.0 portions (bridge/**) and CC-BY-SA-4.0 portions (L1 skeletons of material-audience modules except supplier/material/compliance) are directly downloadable; the proprietary portions (compliance complete, material/supplier detail, L2/L3, SHACL implementation, L3 axioms, generator stack) on request. ROBOT validation reports, Norouzi REQ coverage audit, and the full ADR catalogue are part of the proprietary documentation package. → Distribution & licence architecture