verifiable reasoning
the inspiration for the project stems from my thoughts on onism. because of the realization that i won’t be able to experience the universe, i want to build a system that potentially can.
i was reading foundation recently. there’s a moment where hari seldon uses symbolic logic to strip a political treaty down to its formal structure. he scrubs away the rhetoric, the noise, and the human obfuscation, allowing him to reason about it scientifically as if it were a set of theorems. it’s fiction, but the idea landed somewhere in my brain.
why don’t we do something like that to the way LLMs think?
the problem with how current language models reason isn’t that they’re unintelligent, rather that reasoning is unverifiable and untraceable. when a model chains thoughts together, there’s no guarantee that any given step actually follows from what came before. hallucination isn’t always a knowledge problem- a model can “know” all the right facts and still connect them badly.
my idea: strip the input, the prompt, the retrieved context, everything, down to a formal set of axioms. then constrain the model’s chain-of-thought so each step only proceeds if it’s formally derivable from that axiom set. every thought has to be proved.
this is hard in at least two ways. i think the first major hurdle will be converting natural language into symbolic logic. a bunch of research is already going into this, but maybe some approach out there is good enough for the project. the main issue is that language is slippery and formal systems are not. secondly, a model constrained this way may find shortcuts through axiom space, generating valid but useless proof paths to avoid doing real work. gaming the structure could be something i need to check for.
what is interesting will be what happens with fuzzy and imperfect axiom sets. i personally think the system will still help. at least this way, when something goes wrong, you can see where it went wrong, which step broke from what, or perhaps understanding why an agent decided to perform a certain action. interpretability as a first-class output, not an afterthought.
of course, this is going to make the LLM incredibly slow. but i’m sure there are domains where this is okay and verifiability is of utmost importance. perhaps this could be a tool call?
there’s prior work on this in formal mathematics. systems like safe and hilbert can verify LLM reasoning on math proofs with high accuracy, but they require structured inputs. they’re already operating inside the formal domain. the open problem is the upstream step: how can we get there from raw language?
this is going to be a fun project to try out. maybe it can be the bridge to understanding if LLMs actually understand things rather than pattern-matching at scale.
i’ll keep you updated
rishabh