Recursive Feature Machines

Recursive Feature Machines (RFMs) provide a kernel-based approach to feature learning, iteratively reweighting features using the average gradient outer product (AGOP) to mimic neural network mechanisms in convex, interpretable models. Originally introduced in 2024 research linking neural feature learning to kernel methods, RFMs outperform baselines on tabular data and explain phenomena like grokking and lottery tickets.

Origins and Mechanism

RFMs emerge from analyzing feature learning in deep neural networks, where AGOP captures gradient-based feature adaptation. The core algorithm recursively updates a feature matrix via fixed-point iteration on kernel metrics, enabling scalable, low-dimensional representations without backpropagation. Variants like diagonal or tree-structured RFMs extend to high-dimensional tasks such as molecular modeling.

Application to AI Interpretability

Recent 2026 work by Beaglehole et al. adapts RFMs to extract abstract concepts (e.g., moods, biases) from large language models by iteratively refining neuron activation features across layers. This "Recursive Feature Machine-based approach" reveals internal representations transferable across languages and tasks, enabling detection of hallucinations and adversarial bypasses of safety guards. It supports multi-concept steering, modulating outputs by combining concept vectors for precise control.

   PDF: https://arxiv.org/pdf/2502.03708.pdf

   GitHub code (related neural controllers):
   https://github.com/dmbeaglehole/neural_controllers

This builds on earlier RFM foundations like Radhakrishnan et al. (2022) but focuses on interpretability applications.

Transformative Impact on Control

As a universal extractor, RFMs shift AI interpretability from post-hoc attribution to proactive monitoring of cognitive pathways, exposing "silent knowledge reservoirs" beyond surface outputs. This enhances model resilience, ethical alignment, and human-AI collaboration by allowing dynamic nudges toward desired reasoning. Limitations include computational demands in deep layers, but broad architecture compatibility positions RFMs as a cornerstone for trustworthy AI.

Relevance to symbolic AI KR

RFMs bridge neural feature learning with symbolic KR by extracting interpretable, hierarchical concepts from black-box models, aligning subsymbolic activations with structured representations. This relevance stems from your focus on neurosymbolic AI, ontology mapping, and agentic KR, where RFMs enable dynamic knowledge encoding for reasoning.

Feature Extraction as KR Primitive

RFMs recursively refine features via AGOP kernels, producing low-dimensional, transferable concept vectors akin to knowledge graph embeddings or ontology primitives. In neural controllers, they map activations to abstract entities (e.g., biases, moods), supporting semantic similarity and inference without explicit symbols—directly aiding your LLM concept extraction work. Neurosymbolic Integration

RFMs offer a convex alternative to LLMs for KR learning, countering Hinton/LeCun's "nail in the coffin" for symbols by hybridizing gradients with recursive hierarchies for agent interoperability. For IJCAI agentic AI, they enable "OntoMapNet"-style mappings from embeddings to taxonomies, boosting explainable multi-agent reasoning. Control and Misrepresentation Mitigation

By exposing hidden concepts, RFMs detect KR misrepresentations like hallucinations or biases, facilitating steered updates in voice-driven or ontology-based agents. This transforms control from ad-hoc prompts to principled, recursive refinement, enhancing acquisitional efficiency in your complex systems research.