Gemini-Powered Automatic Evaluation and Prompt Refinement System
In the ever-evolving landscape of artificial intelligence, crafting effective prompts for nuanced simplification is a formidable challenge. The need for readability improvements without sacrificing meaning or detail creates a complex balancing act for developers and researchers. To tackle this issue, we have developed an innovative automated system that leverages Gemini models to evaluate simplification quality and facilitate self-refinement of prompts. This approach empowers rapid experimentation and fine-tuning to discover the most effective prompt strategies.
Automated Evaluation
Manual evaluation of simplification quality can be an impractical endeavor, especially in a fast-paced development environment. Our automated system introduces two groundbreaking components for evaluation:
-
Readability Assessment: Traditional readability metrics, such as the Flesch-Kincaid index, often fall short in capturing the nuances of comprehension. To overcome this limitation, we utilized a Gemini prompt to assess text readability on a scale of 1 to 10. This prompt was iteratively refined through comparisons with human judgment, leading to a more sophisticated evaluation of how easy the text is to understand. Testing revealed that our LLM-based readability assessment aligns closely with human evaluations, offering a more accurate measure of readability than conventional metrics.
- Fidelity Assessment: Preserving the original meaning during simplification is paramount. Using Gemini 1.5 Pro, we implemented a rigorous fidelity assessment process that maps claims from the original text to the simplified version. This method allows us to identify specific error types—such as information loss, gain, or distortion—each assigned a severity weight. By doing so, we can provide a detailed measure of how faithfully the simplified version reflects the original text’s meaning, encompassing both completeness and entailment.
Iterative Prompt Refinement: LLMs Optimizing LLMs
The success of the final simplification output, generated by Gemini 1.5 Flash, is heavily influenced by the quality of the initial prompt. To enhance this process, we automated the prompt optimization through a prompt refinement loop. In this loop, the auto-eval scores for readability and fidelity feed back into the system, allowing another instance of Gemini 1.5 Pro to analyze the performance of the simplification prompt. It then proposes refined prompts for subsequent iterations.
This iterative feedback loop creates a dynamic environment where one LLM continuously improves its own instructions based on performance metrics. By reducing the reliance on manual prompt engineering, we empower the system to autonomously discover effective simplification strategies. The refinement loop ran an impressive 824 iterations before performance plateaued, showcasing the system’s capability to evolve and optimize.
The Innovation of Automated Processes
The automation of the evaluation and prompt refinement processes represents a significant innovation in the realm of language models. By enabling one LLM to evaluate the output of another and refine its instructions based on specific performance metrics—like readability and fidelity—our system transcends the limitations of manual prompt engineering. This approach not only streamlines the development process but also allows for the identification of highly effective strategies for nuanced simplification over extensive iterations.
The implications of this automated system are profound. It opens up new avenues for improving the accessibility of complex information, ensuring that key details are preserved while enhancing readability. As we continue to refine and enhance this system, the potential for automating language simplification grows, promising a future where information can be easily understood by diverse audiences without losing its essence.
By harnessing the power of Gemini models, we are paving the way for advanced automated solutions that address the challenges of language simplification in an increasingly complex world.
Inspired by: Source

