ML4MS

Machine Learning for Materials Science (The Engineering Layer)

From Atoms to Algorithms: How AI ‘Sees’ Materials

For decades, materials scientists have been translating crystal structures into spreadsheet rows—a process that worked reasonably well until it didn’t. The rise of Graph Neural Networks (GNNs) represents not a revolution in hype, but a fundamental correction in how we encode structural information for machine learning.

This shift matters because, as any metallurgist knows, structure dictates properties far more than composition alone.

The “Excel” Problem: When Feature Vectors Break Down

Traditional machine learning for materials relied on feature engineering—the manual construction of descriptors that capture a material’s essence in vector form. A typical workflow might encode a compound as:

This approach produced rows in a design matrix, where each material became a point in feature space. Algorithms like Random Forests could then map these vectors to target properties.

The method fails spectacularly for a simple reason: topology is invisible to feature vectors.

Consider Diamond and Graphite. Both are 100% carbon. A feature vector representation sees:

Material Atomic % C Density (g/cm³) Avg. Coordination
Diamond 100 3.51 4
Graphite 100 2.26 ~3.3

The differences are secondary statistics—averaged quantities that obscure the fundamental distinction. Diamond’s tetrahedral sp³ network creates a 3D lattice with immense hardness. Graphite’s planar sp² sheets slide past each other, yielding a lubricant.

A spreadsheet sees similar materials; a structural engineer sees entirely different bonding topologies.

The Solution: Crystal Structures as Computational Graphs

The seminal 2018 paper by Xie and Grossman introduced Crystal Graph Convolutional Neural Networks (CGCNN), which recast the problem in graph-theoretic terms:

This is not metaphorical. The crystallographic information file (CIF) for any material defines a periodic graph where lattice parameters determine edge weights.

The Engineering Analogy: Truss Analysis

For mechanical engineers, the parallel is direct. Consider a steel Truss Bridge:

You cannot predict deflection just by knowing the “average beam length” (Excel method). You solve the system by propagating forces through the connectivity graph. A CGCNN does the same for crystal lattices. When predicting formation energy, the network propagates information through the bonding network, updating each atom’s representation based on its local coordination environment.

The Mechanism: Message Passing

The term “convolutional” in CGCNN derives from the message-passing operation.

  1. Message Construction: Atom i asks its neighbors: “Who are you and how far away are you?”
  2. Aggregation: The atom sums up these messages.
  3. Update: The atom updates its own state based on this new information.

Physical Interpretation: For an interstitial hydrogen atom in a BCC iron lattice, the model learns that proximity to Iron atoms (strong bonding) differs fundamentally from proximity to Carbon atoms (trapping)—distinctions invisible to composition-only features.

The Code: Demystifying the Graph Construction

Using the pymatgen library, converting a crystal structure to a graph is deterministic.

Note: CrystalNN below is not a Neural Network; it is a deterministic algorithm (based on Voronoi tessellation) used to identify which atoms are actually connected before the AI starts learning.

from pymatgen.core import Structure
from pymatgen.analysis.local_env import CrystalNN

# Load structure from CIF or Materials Project
structure = Structure.from_file("diamond.cif")

# Define neighbor-finding algorithm (cutoff-based or Voronoi)
nn_finder = CrystalNN()

# Build graph: nodes = atomic sites, edges = bonds
for i, site in enumerate(structure):
    neighbors = nn_finder.get_nn_info(structure, i)
    for neighbor in neighbors:
        j = neighbor['site_index']
        distance = neighbor['weight']  # Bond length (Å)
        
        # This is what the AI sees: A list of connections
        print(f"Atom {i} → Atom {j}: {distance:.3f} Å")

Output (diamond):

Atom 0 → Atom 1: 1.544 Å
Atom 0 → Atom 2: 1.544 Å
Atom 0 → Atom 3: 1.544 Å
Atom 0 → Atom 4: 1.544 Å

The graph structure is the material’s structure, represented in a format neural networks can process.

Why This Matters for ML4MS

The CGCNN architecture achieved a Mean Absolute Error (MAE) of ~0.039 eV/atom for formation energy.

To put that in engineering terms: This is the threshold of “Chemical Accuracy.” It means the AI can predict if a new alloy will be thermodynamically stable (does it exist?) almost as accurately as a month-long Quantum Physics simulation.

For an industrial engineer, this means we can now screen 10,000 potential alloys in an afternoon, identifying candidates for high-strength or corrosion-resistant applications before we ever melt a single bar in the lab.

The Evolution: From Distance to Angles

However, CGCNN had one major limitation: it only “saw” bond lengths (distance). It was blind to bond angles. In complex crystals, the angle determines stability.

This limitation led to the next generation of models—MACE, NequIP, and CHGNet—which incorporate full 3D geometry (Equivariance) to achieve true quantum-level accuracy.

I discuss these advanced models in my deep dive on Universal Potentials: 👉 Read: The Death of the Lookup Table: Understanding MACE & M3GNet

The shift from feature vectors to graphs isn’t about replacing human intuition. It’s about encoding materials the way we already think about them: as networks of interacting atoms where structure determines performance.