Researchers from MIT CSAIL have developed PDDL-INSTRUCT, a framework designed to improve the multi-step planning capabilities of large language models (LLMs) by combining logical reasoning with an external plan validator.
The PDDL-INSTRUCT framework trains models to recognize and explain why a candidate plan has failed, including identifying unsatisfied preconditions, incorrect effects, frame violations, or an unmet goal. This is achieved through logical chain-of-thought prompts that guide the LLM to perform step-by-step inference over state and action transitions, producing traceable sequences of state→action→state, written as ⟨sᵢ, aᵢ₊₁, sᵢ₊₁⟩.
For external validation, PDDL-INSTRUCT integrates the VAL plan validator, which checks each step of the generated plan and provides feedback that is either binary (valid/invalid) or detailed. The detailed feedback results in superior performance. The system uses a two-stage optimization process: the first stage penalizes errors in the reasoning chains, and the second stage optimizes for final planning accuracy.
The effectiveness of PDDL-INSTRUCT was evaluated using the PlanBench benchmark, which includes planning domains known to challenge LLMs, such as Blocksworld, Mystery Blocksworld, and Logistics. In the Blocksworld domain, a tuned Llama-3-8B model achieved a 94% rate of generating valid plans, significantly outperforming previous models. Notably, PDDL-INSTRUCT achieved up to a 64-fold improvement in the Mystery Blocksworld domain, where predicate names are obfuscated to prevent pattern matching.
Significant performance gains were also recorded in the Logistics domain. Across all test domains, the framework delivered up to a 66% absolute improvement compared to untuned baseline models. Researchers observed that performance improved with longer feedback budgets and more detailed output from the validator.
The current implementation of PDDL-INSTRUCT applies to classical PDDL domains and relies on the VAL validator as an external oracle. The results demonstrate a method for grounding LLM reasoning in formal semantics for use in agent systems that include a verifier during planning. Extending the framework to handle long-horizon, temporal, numeric, and cost-sensitive planning tasks remains an area for further work.




