MIT Researchers Boost LLM Planning with New Framework

Researchers from MIT CSAIL have developed PDDL-INSTRUCT, a framework designed to improve the multi-step planning capabilities of large language models (LLMs) by combining logical reasoning with an external plan validator.

The PDDL-INSTRUCT framework trains models to recognize and explain why a candidate plan has failed, including identifying unsatisfied preconditions, incorrect effects, frame violations, or an unmet goal. This is achieved through logical chain-of-thought prompts that guide the LLM to perform step-by-step inference over state and action transitions, producing traceable sequences of state→action→state, written as ⟨sᵢ, aᵢ₊₁, sᵢ₊₁⟩.

For external validation, PDDL-INSTRUCT integrates the VAL plan validator, which checks each step of the generated plan and provides feedback that is either binary (valid/invalid) or detailed. The detailed feedback results in superior performance. The system uses a two-stage optimization process: the first stage penalizes errors in the reasoning chains, and the second stage optimizes for final planning accuracy.

Google opens applications for Gemini App Trusted Tester program

Claude Voice Mode upgrade adds multilingual support and new Push-to-talk feature

Pentagon confirms use of Elon Musk’s Grok AI in missile strikes on Iran

SpaceX acquires AI coding startup Cursor for $60 billion in strategic move

The effectiveness of PDDL-INSTRUCT was evaluated using the PlanBench benchmark, which includes planning domains known to challenge LLMs, such as Blocksworld, Mystery Blocksworld, and Logistics. In the Blocksworld domain, a tuned Llama-3-8B model achieved a 94% rate of generating valid plans, significantly outperforming previous models. Notably, PDDL-INSTRUCT achieved up to a 64-fold improvement in the Mystery Blocksworld domain, where predicate names are obfuscated to prevent pattern matching.

Significant performance gains were also recorded in the Logistics domain. Across all test domains, the framework delivered up to a 66% absolute improvement compared to untuned baseline models. Researchers observed that performance improved with longer feedback budgets and more detailed output from the validator.

The current implementation of PDDL-INSTRUCT applies to classical PDDL domains and relies on the VAL validator as an external oracle. The results demonstrate a method for grounding LLM reasoning in formal semantics for use in agent systems that include a verifier during planning. Extending the framework to handle long-horizon, temporal, numeric, and cost-sensitive planning tasks remains an area for further work.

MIT Researchers Boost LLM Planning with New Framework

Google opens applications for Gemini App Trusted Tester program

Claude Voice Mode upgrade adds multilingual support and new Push-to-talk feature

Pentagon confirms use of Elon Musk’s Grok AI in missile strikes on Iran

SpaceX acquires AI coding startup Cursor for $60 billion in strategic move

You Might Be Interested

Google opens applications for Gemini App Trusted Tester program

Claude Voice Mode upgrade adds multilingual support and new Push-to-talk feature

Pentagon confirms use of Elon Musk’s Grok AI in missile strikes on Iran

SpaceX acquires AI coding startup Cursor for $60 billion in strategic move

Recent Posts

Recent Comments

MIT Researchers Boost LLM Planning with New Framework

Related Reads

You Might Be Interested

Recent Posts

Recent Comments