The End of “Black Box” Engineering: A Blueprint for Safe AI in Asset Integrity

A technical review of the ‘Crystalyse’ framework and a practical implementation of Provenance-Enforced AI for NACE MR0175 Compliance.

*By Ahmed Aburakhia

Senior Materials & Project Engineer*

Executive Summary

The adoption of Generative AI in the heavy industries (Oil & Gas, Petrochemicals, Manufacturing) has been stalled by a single, fatal flaw: Trust. While Large Language Models (LLMs) excel at summarizing text, they notoriously “hallucinate” numerical data. In software development, a hallucinated line of code causes a syntax error. In Materials Engineering, a hallucinated yield strength or chemical composition causes catastrophic containment failure.

This article reviews the 2025 breakthrough paper “Crystalyse” (Imperial College London), which proposes a cryptographic “Render Gate” to solve this problem. To validate its industrial viability, I developed and deployed a prototype NACE Compliance Agent that utilizes this architecture to verify materials for Sour Service ($H_2S$) environments without human intervention.

1. The Liability Gap in Industrial AI

For the past two years, the “Digital Transformation” conversation has been dominated by chatbots. Vendors promise that engineers can simply ask an AI, “What is the best alloy for this well?” and receive an instant answer.

For a Senior Engineer, this proposition is terrifying.

The Probabilistic vs. Deterministic Conflict

LLMs are probabilistic engines. They predict the next word based on statistical likelihood. Engineering, however, is deterministic. Physics does not care about probability; it cares about laws.

The Hallucination Risk: If you ask a standard model (like GPT-4) for the PREN (Pitting Resistance Equivalent Number) of Inconel 718, it might generate the number “45” because it has seen that number associated with high-performance alloys in its training data.
The Reality: The PREN is a mathematical function of specific chemical elements (%Cr + 3.3%Mo + 16%N). If the AI guesses the chemistry, the PREN is wrong.

If a Project Engineer approves a Liner Hanger based on a hallucinated material property, the liability sits with the engineer, not the AI vendor. Therefore, “Chatbot AI” is functionally useless for safety-critical engineering.

2. The Academic Solution: The “Render Gate”

In 2025, researchers Nduma, Park, and Walsh published Crystalyse: a multi-tool agent for materials design. While their work focused on discovering battery materials, they introduced an architectural concept that is immediately applicable to Oil & Gas: Provenance Enforcement.

How it Works

The authors argue that an AI should never be allowed to “speak” a number unless it can prove where that number came from. They introduced a system component called the Render Gate.

Imagine a digital security guard standing between the AI and the Engineer.

The Agent generates an answer: “The formation energy is -2.5 eV.”
The Render Gate intercepts the message and demands a Provenance Tuple: (Value, Unit, Source_Tool, Timestamp, Hash).
If the Agent cannot produce a cryptographic hash linking that number to a specific execution of a physics tool (like DFT or MACE), the Render Gate blocks the output.

This shifts the paradigm from “Trust the Robot” to “Trust the Audit Trail.”

3. Case Study: Building the NACE Compliance Agent

To prove that this academic concept works in an industrial setting, I built a prototype agent tailored for Saudi Aramco Standards (NACE MR0175 / ISO 15156).

The Mission: Automate the verification of alloys for Sour Service ($H_2S$ environments) without risking hallucinations.

The “Hybrid RAG” Architecture

I moved beyond simple prompting and built a System of Systems using Python, LangChain, and Pymatgen.

System Architecture

The agent follows a strict 3-Layer Safety Protocol:

Layer 1: The Trusted Knowledge Layer (RAG)

In a real enterprise, we do not want AI to be creative; we want it to be compliant.

When a user searches for “Inconel 625”, the Agent first queries an internal CSV database (simulating an ERP or SAP system).
If the material exists in the approved vendor list, it retrieves the exact chemistry from the Mill Test Report (MTR).
Benefit: Zero hallucination risk for known materials.

Layer 2: The Deterministic Physics Layer

If the material is unknown (e.g., a new trade name like “Super Duplex 32750”), the Agent switches to Discovery Mode:

It uses an LLM to estimate the chemical formula but flags it as “Low Confidence.”
Crucially, it does not guess the PREN. It passes the chemical string to Pymatgen (Python Materials Genomics), a quantum mechanics library.
Pymatgen calculates the atomic weights and executes the NACE formula mathematically.

Layer 3: The Render Gate (Digital MTR)

Every calculation result is hashed using SHA-256.

The User Interface (Streamlit) checks this hash before rendering the result.
If the data payload has been altered by the LLM during the summary generation, the hash will fail, and the UI will show a “Security Alert” instead of the data.

4. Results & Methodology

I subjected the agent to a “Red Team” stress test to see if I could force it to hallucinate safe ratings for dangerous materials.

Scenario A: The “316L” Trap

Input: “Is 316L Stainless Steel safe for severe sour service?”
Standard AI Response: Often vague. “It is generally corrosion resistant but depends on conditions.”
My Agent’s Response:
- Identified UNS S31603.
- Calculated PREN: 30.7.
- Verdict: 🛑 FAIL / RESTRICTED. (NACE requires PREN > 40 for severe service).

Scenario B: The “Human-in-the-Loop”

Input: “Duplex” (Ambiguous).
Agent Action: The Agent identified multiple potential alloys but selected the most likely standard (UNS S31803).
Safety Feature: The UI presented the estimated chemical formula in an Editable Field and waited for my confirmation. I was able to correct the Molybdenum content to match my specific paperwork before the physics calculation ran.

👉 Try the Live Prototype Here (Hugging Face)

5. The Strategic Roadmap for Heavy Industry

This experiment proves that “Agentic AI” is ready for deployment in heavy industry, provided we abandon the “Chatbot” model.

For EPCs (Engineering, Procurement, Construction) and Operators looking to adopt this, here is the roadmap:

Phase 1: Digitize the Standards (RAG)

We must convert PDF standards (API 5CT, ASME B31.3, NACE MR0175) into structured Vector Databases. An engineer should be able to query the standard, but the answer must be a direct citation, not a summary.

Phase 2: Tool-Use over Fine-Tuning

Do not waste millions fine-tuning Llama-3 on metallurgy textbooks. It will still hallucinate math. Instead, invest in Function Calling. Teach the LLM how to use:

Corrosion Simulation Tools (e.g., OLI Systems, Norsok M-506).
Finite Element Analysis (ANSYS).
Internal Inventory Databases (SAP).

Phase 3: The “Digital MTR”

We need to establish an industry standard for AI Provenance. Just as we require a physical stamp on a steel pipe, we must require a cryptographic hash on every AI-generated engineering recommendation.

Conclusion

The future of Materials Engineering is not “AI replacing Engineers.” It is AI protecting Engineers.

By implementing architectures like Crystalyse, we create a second layer of defense—an automated auditor that checks every assumption, calculates every variable, and flags every risk before a human ever signs off on a design.

🔗 Resources

Live App: ML4MS Material Validator
Source Code: GitHub Repository
Original Paper: Crystalyse (arXiv:2512.00977)