Home > AI Solutions > Artificial Intelligence > White Papers > Rethinking Hierarchical Text Classification: Insights from Multi-Agent Experiments with Small Language Models > Literature review
LLMs have demonstrated remarkable capabilities across a wide range of natural language tasks. See Language Models are Few-Shot Learners and GPT-4 Technical Report. However, using LLMs as autonomous agents that can interact with environments and solve complex problems remains challenging. Recent work has explored various approaches to enhance the abilities of LLMs abilities in areas such as reasoning, planning, and decision-making.
In Four AI Agent Strategies That Improve GPT-4 and GPT-3.5 Performance, the author highlights that such agentic workflows, where LLMs iterate over outputs multiple times with feedback, can lead to significant performance gains compared to single-pass generation. On benchmarks like HumanEval for code generation, agentic approaches using GPT-3.5 have achieved up to 95.1% accuracy, far surpassing the 67% of GPT-4 in zero-shot settings.
The author outlines four key design patterns for agentic workflows:
A key development has been the advent of chain-of-thought prompting, which allows LLMs to break down complex problems into steps. Building on this, several frameworks have emerged that leverage multiple LLM instances working collaboratively as agents. For example, the Reflexion framework uses verbal reinforcement learning, where an LLM agent provides feedback on its own outputs and uses that to improve in subsequent iterations. This allows for targeted improvements without requiring model fine-tuning.
Multi-agent debate (MAD) approaches take collaboration further by having multiple distinct LLM agents discuss and reason about problems together. The authors demonstrated that such debates can improve factuality and reasoning in language models. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate proposed a debate framework with "angel" and "devil" agents to encourage divergent thinking. The CAMEL framework explored role-playing between agents with distinct personas.