Promptimizer explores mutations of your prompt — shortening, clarifying, adding examples, tightening constraints — and scores each candidate against your rubric to find the best version automatically.
Never stored — lives only in memory for this session.
Each box is a separate test input. Add more to enable train/test splitting.
Uses GPT-4o to convert your free-text rubric into structured scoring criteria.
Tree depth. Deeper = more mutations, higher cost.
Total nodes to explore before stopping.
How much to penalize prompts that are longer than the parent.
After the optimization run, the top-scoring nodes are evaluated on held-out test inputs to check for overfitting. Only applies when 2+ eval inputs are provided.
UCB bandit tracks which mutation types consistently beat their parent and allocates more budget to them. Mirrors multi-armed bandit explore/exploit tradeoff.
Adds decaying noise to beam selection so lower-scoring nodes can survive early generations — helps escape local optima. Temperature cools 35% per generation.
Stop the run if the best score improves by less than this between depths. Lower values let the run go deeper before giving up. Default 0.1.