Implement your own search strategies by extending ProgramDatabase and DiscoveryController
SkyDiscover’s architecture lets you plug in custom search algorithms without rewriting the generate-evaluate loop. You only implement what changes—everything else is inherited.
Here’s the full implementation of TopK search (56 lines):
skydiscover/search/topk/database.py
import loggingfrom typing import List, Optional, Tuplefrom skydiscover.config import DatabaseConfigfrom skydiscover.search.base_database import Program, ProgramDatabaselogger = logging.getLogger(__name__)class TopKDatabase(ProgramDatabase): """Database for top-k programs""" def __init__(self, name: str, config: DatabaseConfig): super().__init__(name, config) self.initial_program = None def add(self, program: Program, iteration: Optional[int] = None, **kwargs) -> str: """Add a program to the database.""" # Store the initial program if iteration == 0 or program.iteration_found == 0: self.initial_program = program # Store the program self.programs[program.id] = program # Track last iteration if iteration is not None: self.last_iteration = max(self.last_iteration, iteration) # Save to disk if configured if self.config.db_path: self._save_program(program) # Update best program tracking (required) self._update_best_program(program) logger.debug(f"Added program {program.id} to top-k database") return program.id def sample( self, num_context_programs: Optional[int] = 4, **kwargs ) -> Tuple[Program, List[Program]]: """ Sample a program and context programs for the next discovery step. Top-K sampling strategy: - Parent: Top 1 program (best program) - Context programs: Next K programs (ranks 2 to K+1) """ if not self.programs: raise ValueError("Cannot sample: no programs in database") # Get top (K+1) programs: top 1 for parent, next K for context total_needed = num_context_programs + 1 top_programs = self.get_top_programs(total_needed) if len(top_programs) < 2: parent = top_programs[0] context_programs = [top_programs[0]] else: parent = top_programs[0] context_programs = top_programs[1:min(len(top_programs), num_context_programs + 1)] return parent, context_programs
Use this when you need behavior that spans across iterations:
Tracking improvement history
Reacting to stagnation
Filtering results before they enter the population
Multi-island search with migration
Acceptance gating
The key point: you do not rewrite generate-evaluate logic. You call _run_iteration(), which runs the full sample → prompt → LLM → evaluate cycle, then decide what to do with the result.