Skip to main content
SkyDiscover includes ~200 benchmarks across math, systems, algorithms, and reasoning domains. Each benchmark demonstrates how to set up and run evolutionary search for different types of optimization problems.

Example Categories

Math Optimization

Circle packing, Heilbronn problems, autocorrelation inequalities, and geometric optimization

Systems Optimization

Cloud scheduling, load balancing, model placement, and database optimization

Algorithm Design

Competitive programming problems from Frontier-CS benchmark (172 tasks)

Custom Problems

Learn how to create your own benchmarks with custom evaluators

Quick Start

All benchmarks follow a consistent structure:
cd benchmarks/math/circle_packing

uv run skydiscover-run \
  initial_program.py \
  evaluator.py \
  -c config.yaml \
  -s adaevolve \
  -i 100
Replace adaevolve with your search algorithm: evox, openevolve, gepa, or shinkaevolve

Benchmark Structure

Every benchmark contains three core files:
1

Initial Program

The starting solution with an EVOLVE-BLOCK marking the code to be evolved:
initial_program.py
# EVOLVE-BLOCK-START
def solve(input_data):
    # Your initial solution here
    return result
# EVOLVE-BLOCK-END
2

Evaluator

A scoring function that returns a combined_score (higher is better):
evaluator.py
def evaluate(program_path: str) -> dict:
    # Load and run the program
    # Compute performance metrics
    return {"combined_score": 0.73, ...}
3

Configuration

System prompt and search settings:
config.yaml
system_prompt: "Optimize the circle packing algorithm..."
language: python
diff_based_generation: true

Available Benchmarks

DomainBenchmarksExample Problems
Math14 tasksCircle packing, Erdos problems, Heilbronn triangle
Systems5 tasksCloud routing, MoE load balancing, GPU scheduling
GPU4 tasksTriton kernel optimization (vecadd, matmul)
Algorithms172 tasksCompetitive programming (Frontier-CS)
ReasoningMultipleARC-AGI visual reasoning
Prompts1 taskNatural language prompt evolution (HotPotQA)

Installation

Install dependencies based on which benchmarks you want to run:
uv sync
Some benchmarks may have additional requirements.txt files in their directories. Install these with:
uv pip install -r benchmarks/<task>/requirements.txt

Environment Setup

Set your API key before running:
export OPENAI_API_KEY="sk-..."

Next Steps

Math Examples

Explore mathematical optimization problems

Systems Examples

Learn about systems optimization tasks

Create Custom

Build your own benchmark

View Benchmarks

Browse all benchmarks on GitHub