Blaze

Built for hackers, not hyperscalers. A tiny, hackable full-text search engine you can actually fit in your head. Features inverted index with skip lists, roaring bitmaps, BM25 ranking, and type-safe query builder.

View on GitHub Documentation →

Overview

Blaze is a Go engine that provides fast, full-text search capabilities through an inverted index implementation. It's designed for applications that need to search through text documents efficiently without relying on external search engines.

Inverted Index

Maps terms to document positions for instant lookups

Skip Lists

Probabilistic data structure providing O(log n) operations

Query Builder

Type-safe, fluent API for boolean queries with roaring bitmaps

BM25 Ranking

Industry-standard relevance scoring with IDF and length normalization

Text Analysis

Tokenization, stemming, stopword filtering, case normalization

Hybrid Storage

Roaring bitmaps for speed + skip lists for precision

Installation

go get github.com/wizenheimer/blaze

Quick Start

package main

import (
    "fmt"
    "github.com/wizenheimer/blaze"
)

func main() {
    // Create a new inverted index
    idx := blaze.NewInvertedIndex()

    // Index some documents
    idx.Index(1, "The quick brown fox jumps over the lazy dog")
    idx.Index(2, "A quick brown dog runs fast")
    idx.Index(3, "The lazy cat sleeps all day")

    // Search with BM25 ranking
    matches := idx.RankBM25("quick brown", 10)

    // Print results
    for _, match := range matches {
        fmt.Printf("Document %d (score: %.2f)\n", match.DocID, match.Score)
    }
}

Query Builder API

The Query Builder provides a type-safe, fluent API for constructing complex boolean queries with roaring bitmaps. No string parsing, no syntax errors - just clean, composable code.

Basic Queries

Single Term

// Find all documents containing "machine"
results := blaze.NewQueryBuilder(idx).
    Term("machine").
    Execute()

AND Query

// Find documents with BOTH "machine" AND "learning"
results := blaze.NewQueryBuilder(idx).
    Term("machine").
    And().
    Term("learning").
    Execute()

OR Query

// Find documents with "python" OR "javascript"
results := blaze.NewQueryBuilder(idx).
    Term("python").
    Or().
    Term("javascript").
    Execute()

NOT Query

// Find documents with "python" but NOT "snake"
results := blaze.NewQueryBuilder(idx).
    Term("python").
    And().Not().
    Term("snake").
    Execute()

Complex Grouped Query

// (machine OR deep) AND learning
results := blaze.NewQueryBuilder(idx).
    Group(func(q *blaze.QueryBuilder) {
        q.Term("machine").Or().Term("deep")
    }).
    And().
    Term("learning").
    Execute()

BM25 Ranked Results

// Get top 10 results ranked by relevance
matches := blaze.NewQueryBuilder(idx).
    Term("machine").
    And().
    Term("learning").
    ExecuteWithBM25(10)

for _, match := range matches {
    fmt.Printf("Doc %d: score=%.2f\n", match.DocID, match.Score)
}

Convenience Functions

// AllOf: Documents containing ALL terms (AND operation)
results := blaze.AllOf(idx, "machine", "learning", "python")

// AnyOf: Documents containing ANY term (OR operation)
results := blaze.AnyOf(idx, "cat", "dog", "bird")

// TermExcluding: Term with exclusion (AND NOT operation)
results := blaze.TermExcluding(idx, "python", "snake")

Architecture

Query Processor Architecture

The query processor uses a hybrid storage approach combining roaring bitmaps for document-level operations and skip lists for position-level operations.

┌─────────────────────────────────────────────────────────────────────────┐ │ QUERY PROCESSOR ARCHITECTURE │ └─────────────────────────────────────────────────────────────────────────┘ User Query "machine AND learning" │ ▼ ┌─────────────────────────────┐ │ Text Analyzer │ │ (tokenize, stem, etc.) │ └──────────────┬──────────────┘ │ ["machine", "learning"] │ ▼ ┌─────────────────────────────┐ │ Query Builder │ │ (constructs query tree) │ └──────────────┬──────────────┘ │ Query Tree: AND(machine, learning) │ ┌─────────────────────┼─────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Bitmap Ops │ │ Skip List │ │ BM25 Scorer │ │ (fast AND/OR)│ │ (positions) │ │ (ranking) │ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │ │ │ └─────────────────────┼─────────────────────┘ │ ▼ ┌───────────────┐ │ Results │ │ (ranked docs)│ └───────────────┘

Hybrid Storage Model

Blaze uses a sophisticated hybrid storage model that combines the speed of roaring bitmaps with the precision of skip lists.

┌─────────────────────────────────────────────────────────────────────────┐ │ HYBRID STORAGE MODEL │ └─────────────────────────────────────────────────────────────────────────┘ For each term "machine": ┌─────────────────────────────────────────────────────────────────────────┐ │ DOCUMENT LEVEL (Roaring Bitmap) │ │ ────────────────────────────────────────────────────────────────────── │ │ │ │ DocBitmaps["machine"] = {1, 2, 4, 5, 100, 500, 1000, ...} │ │ │ │ Compressed representation of ALL documents containing "machine" │ │ Use: Fast boolean operations (AND, OR, NOT) │ │ Size: ~60 KB for 500k documents (400x compression!) │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ Links to ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ POSITION LEVEL (Skip List) │ │ ────────────────────────────────────────────────────────────────────── │ │ │ │ PostingsList["machine"] = SkipList: │ │ │ │ Level 2: [Doc1:Pos5] ────────────────────────> [Doc100:Pos12] │ │ │ │ │ │ Level 1: [Doc1:Pos5] ──> [Doc2:Pos3] ───────────> [Doc100:Pos12] │ │ │ │ │ │ │ Level 0: [Doc1:Pos5] -> [Doc2:Pos3] -> [Doc4:Pos1] -> [Doc5:Pos7] │ │ -> [Doc100:Pos12] -> [Doc500:Pos2] -> ... │ │ │ │ Detailed position information for EVERY occurrence │ │ Use: Phrase search, proximity ranking, snippets │ │ Size: ~24 MB for 500k positions │ │ │ └─────────────────────────────────────────────────────────────────────────┘ WHY HYBRID? ─────────── 1. Bitmaps: Lightning-fast document filtering (AND, OR, NOT in microseconds) 2. Skip Lists: Precise position tracking for phrases and proximity 3. Best of both worlds: Speed + Precision

Query Execution Flow

Here's how a complex query executes step-by-step, showcasing the power of the hybrid storage approach.

QUERY: (machine OR deep) AND learning AND NOT neural ┌─────────────────────────────────────────────────────────────────────────┐ │ STEP 1: BITMAP PHASE (Fast Document Filtering) │ └─────────────────────────────────────────────────────────────────────────┘ Term Lookups (O(1) hash map): DocBitmaps["machine"] = {1, 2, 4, 5, 7, 8, 9, 10} DocBitmaps["deep"] = {2, 3, 5, 6, 8, 9} DocBitmaps["learning"]= {1, 2, 4, 5, 6, 7, 8, 9, 10} DocBitmaps["neural"] = {3, 6, 8, 9} Boolean Operations (O(1) per chunk): Step 1: machine OR deep {1, 2, 4, 5, 7, 8, 9, 10} ∪ {2, 3, 5, 6, 8, 9} = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} Step 2: (machine OR deep) AND learning {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} ∩ {1, 2, 4, 5, 6, 7, 8, 9, 10} = {1, 2, 4, 5, 6, 7, 8, 9, 10} Step 3: Result AND NOT neural {1, 2, 4, 5, 6, 7, 8, 9, 10} \ {3, 6, 8, 9} = {1, 2, 4, 5, 7, 10} ← CANDIDATE DOCUMENTS Time: ~10 microseconds for 1M documents! ┌─────────────────────────────────────────────────────────────────────────┐ │ STEP 2: POSITION PHASE (Optional - for phrases/proximity) │ └─────────────────────────────────────────────────────────────────────────┘ IF phrase search needed: For each candidate doc {1, 2, 4, 5, 7, 10}: Use skip lists to verify exact positions Check consecutive positions for phrases Extract position data for snippets Time: O(log n) per position lookup ┌─────────────────────────────────────────────────────────────────────────┐ │ STEP 3: RANKING PHASE (BM25 Scoring) │ └─────────────────────────────────────────────────────────────────────────┘ For each candidate document: 1. Calculate IDF (Inverse Document Frequency): - Uses bitmap cardinality for instant document counts - IDF("machine") = log((N - df + 0.5) / (df + 0.5)) - df = DocBitmaps["machine"].GetCardinality() 2. Calculate TF (Term Frequency): - Retrieves from pre-computed DocStats - TF("machine", Doc1) = termFreqs["machine"] 3. Apply BM25 formula: - Combines IDF, TF, and length normalization - Score = IDF × (TF × (k1 + 1)) / (TF + k1 × length_norm) 4. Sum scores for all query terms Results sorted by score: Doc 5: 8.45 Doc 2: 7.23 Doc 1: 6.91 ... Time: O(candidates × terms)

Roaring Bitmap Internals

Roaring bitmaps provide exceptional compression and performance through adaptive container selection.

┌─────────────────────────────────────────────────────────────────────────┐ │ ROARING BITMAP STRUCTURE │ └─────────────────────────────────────────────────────────────────────────┘ Document IDs: {1, 2, 3, 100, 101, 102, 500000, 500001, 999999} Traditional Bitmap (naive): [1,1,1,0,0...0,1,1,1,0...0,1,1,0...0,1] Size: 1,000,000 bits = 125 KB (wasteful for sparse data) Roaring Bitmap (smart): Split into 65,536 chunks (high 16 bits = chunk ID): Chunk 0 (docs 0-65535): [1,2,3,100,101,102] Chunk 7 (docs 458752-524287): [500000, 500001] Chunk 15 (docs 983040-1048575): [999999] Storage per chunk (adaptive): ┌────────────────────────────────────────────────────┐ │ If cardinality < 4096: │ │ → Use Array Container │ │ → Store sorted uint16 values directly │ │ → Size: 2 bytes × cardinality │ │ │ │ If cardinality > 4096: │ │ → Use Bitmap Container │ │ → Store 65536-bit bitmap (8 KB) │ │ → Size: 8 KB fixed │ │ │ │ If cardinality = 65536 (all docs): │ │ → Use Run Container │ │ → Store: [0-65535] │ │ → Size: 4 bytes │ └────────────────────────────────────────────────────┘ Total Size: ~60 bytes (vs 125 KB!) Operations: AND: Container-by-container intersection Skip non-matching chunks (O(1)) Intersect matching chunks (O(min(n,m))) OR: Container-by-container union Merge all chunks (O(n+m)) NOT: Complement within document space Flip all bits in each chunk

Memory Layout

Understanding the memory layout helps optimize for large-scale deployments.

┌─────────────────────────────────────────────────────────────────────────┐ │ INVERTED INDEX STRUCTURE │ └─────────────────────────────────────────────────────────────────────────┘ InvertedIndex { ┌─────────────────────────────────────────────────────────────────┐ │ DocBitmaps: map[string]*roaring.Bitmap │ │ ───────────────────────────────────────────────────────────────│ │ "machine" → [Compressed Bitmap: 512 bytes] │ │ "learning" → [Compressed Bitmap: 448 bytes] │ │ "deep" → [Compressed Bitmap: 256 bytes] │ │ ... │ │ │ │ Memory: ~100 bytes per term (compressed) │ └─────────────────────────────────────────────────────────────────┘ │ │ Parallel Storage ▼ ┌─────────────────────────────────────────────────────────────────┐ │ PostingsList: map[string]SkipList │ │ ───────────────────────────────────────────────────────────────│ │ "machine" → SkipList with 10,000 position nodes │ │ "learning" → SkipList with 8,000 position nodes │ │ "deep" → SkipList with 5,000 position nodes │ │ ... │ │ │ │ Memory: ~48 bytes per position (node overhead) │ └─────────────────────────────────────────────────────────────────┘ │ │ Statistics ▼ ┌─────────────────────────────────────────────────────────────────┐ │ DocStats: map[int]DocumentStats │ │ ───────────────────────────────────────────────────────────────│ │ Doc1 → {Length: 150, TermFreqs: {"machine": 3, ...}} │ │ Doc2 → {Length: 200, TermFreqs: {"learning": 5, ...}} │ │ ... │ │ │ │ Memory: ~16 bytes per term per document │ └─────────────────────────────────────────────────────────────────┘ Mutex for thread safety (sync.RWMutex) } MEMORY BREAKDOWN (for 1M documents, 10M unique positions): ──────────────────────────────────────────────────────────── DocBitmaps: ~10 MB (compressed bitmaps) PostingsList: ~480 MB (skip list nodes) DocStats: ~500 MB (per-doc statistics) Overhead: ~10 MB (maps, pointers, etc.) ──────────────────────────────────────────────────────────── TOTAL: ~1 GB

Performance

Benchmarks

Operation	Ops/sec	Time/op	Allocs/op
Query Builder (Simple)	440,616	2,511 ns	39
Query Builder (Complex)	222,024	5,333 ns	98
Query Builder (BM25)	411,124	2,955 ns	46
Term Search	300,000	4,123 ns	3
Phrase Search	100,000	12,456 ns	12

Operation Costs

Operation	Complexity	Why It's Fast
Bitmap AND	O(1) per chunk	Roaring bitmap intersection
Bitmap OR	O(1) per chunk	Roaring bitmap union
Bitmap NOT	O(1) per chunk	Roaring bitmap difference
Term Lookup	O(1)	Direct hash map access
Position Lookup	O(log n)	Skip list binary search

Search Operations

Phrase Search

Find exact sequences of words. Blaze uses skip lists to efficiently verify consecutive positions.

SEARCHING FOR PHRASE: "brown fox" Document: "the quick brown dog jumped over the brown fox" Positions: 0 1 2 3 4 5 6 7 8 Phase 1: Find END (last word "fox") ┌─────────────────────────────────────────────────────────┐ │ Find "brown" → Doc:Pos2 │ │ Find "fox" after Pos2 → Doc:Pos8 ← END position │ └─────────────────────────────────────────────────────────┘ Phase 2: Walk BACKWARDS from END to find START ┌─────────────────────────────────────────────────────────┐ │ From Pos9, find previous "brown" → Doc:Pos7 ← START │ └─────────────────────────────────────────────────────────┘ Phase 3: Validate ┌─────────────────────────────────────────────────────────┐ │ Start: Pos7, End: Pos8 │ │ Distance: 8 - 7 = 1 │ │ Expected: 2 words - 1 = 1 MATCH! │ │ │ │ "brown" "fox" │ │ ▲ ▲ │ │ Pos7 Pos8 (consecutive positions) │ └─────────────────────────────────────────────────────────┘

// Find exact phrase "quick brown fox"
matches := idx.FindAllPhrases("quick brown fox", blaze.BOFDocument)

for _, match := range matches {
    start, end := match[0], match[1]
    fmt.Printf("Found in Doc %d from Pos %d to %d\n",
        int(start.DocumentID), int(start.Offset), int(end.Offset))
}

Proximity Search

Find documents containing all terms (not necessarily consecutive). Score documents by how close the terms appear together.

SEARCHING FOR: ["quick", "fox"] (any order, not necessarily consecutive) Document: "the quick brown dog jumped over the lazy fox" Positions: 0 1 2 3 4 5 6 7 8 Phase 1: Find COVER END (furthest term) ┌──────────────────────────────────────────────────────────────┐ │ Find "quick" after BOF → Doc:Pos1 │ │ Find "fox" after BOF → Doc:Pos8 ← FURTHEST (cover end) │ └──────────────────────────────────────────────────────────────┘ Phase 2: Find COVER START (earliest term before end) ┌──────────────────────────────────────────────────────────────┐ │ Find "quick" before Pos9 → Doc:Pos1 ← EARLIEST (cover start)│ │ Find "fox" before Pos9 → Doc:Pos8 │ └──────────────────────────────────────────────────────────────┘ Phase 3: Calculate Proximity Score ┌──────────────────────────────────────────────────────────────┐ │ Cover: [Pos1, Pos8] │ │ Same document? Yes │ │ All terms present? Yes │ │ │ │ "quick" ... ... ... ... ... ... ... "fox" │ │ ▲ ▲ │ │ Pos1 Pos8 │ │ └────────── Cover Range ──────────────┘ │ │ │ │ Proximity Score: 1 / (8 - 1 + 1) = 1/8 = 0.125 │ └──────────────────────────────────────────────────────────────┘ PROXIMITY SCORING EXAMPLES: ─────────────────────────── Doc 1: "machine learning is machine learning" Pos:0 1 2 3 4 Cover 1: [Pos 0-1] → score += 1/(1-0+1) = 1/2 = 0.500 Cover 2: [Pos 3-4] → score += 1/(4-3+1) = 1/2 = 0.500 ───────────────────────────── Total Score: 1.000 Doc 2: "learning about machine and learning" Pos:0 1 2 3 4 Cover 1: [Pos 0-2] → score += 1/(2-0+1) = 1/3 = 0.333 Cover 2: [Pos 2-4] → score += 1/(4-2+1) = 1/3 = 0.333 ───────────────────────────── Total Score: 0.666 Doc 3: "machine ... ... ... ... learning" Pos:0 1 2 3 4 5 Cover 1: [Pos 0-5] → score += 1/(5-0+1) = 1/6 = 0.167 ───────────────────────────── Total Score: 0.167 Ranking: Doc 1 (1.000) > Doc 2 (0.666) > Doc 3 (0.167) ▲ ▲ ▲ Terms closest Terms medium Terms far apart

// Rank documents by proximity
matches := idx.RankProximity("quick brown", 5)

for _, match := range matches {
    fmt.Printf("Doc %d: Score %.2f\n",
        int(match.Offsets[0].DocumentID),
        match.Score)
}

Skip Lists Explained

Skip lists are the core data structure for position tracking. They provide O(log n) operations while being simpler than balanced trees.

SKIP LIST STRUCTURE: Multiple levels like express lanes Level 3: HEAD ────────────────────────────────> [30] ───────> NULL ↓ ↓ Level 2: HEAD ─────────────> [15] ────────────> [30] ───────> NULL ↓ ↓ ↓ Level 1: HEAD ─────> [10] ──> [15] ──> [20] ──> [30] ───────> NULL ↓ ↓ ↓ ↓ ↓ Level 0: HEAD ─> [5] ─> [10] ─> [15] ─> [20] ─> [25] ─> [30] ─> [35] ─> NULL (ALL NODES AT LEVEL 0) ┌───────┐ │ Node │ Each node has a "tower" of forward pointers ├───────┤ │ Key │ Example: Node [15] ├───────┤ │ Lvl 3 │ ──> [30] (skip far ahead) │ Lvl 2 │ ──> [30] (skip ahead) │ Lvl 1 │ ──> [20] (skip a little) │ Lvl 0 │ ──> [20] (next node) └───────┘ SEARCHING FOR KEY = 20: ─────────────────────── Step-by-Step Search: Level 3: [HEAD] ───────────────────────────────> [30] (30 > 20, drop down) ↓ Level 2: [HEAD] ──────────────> [15] ─────────> [30] (15 < 20, advance) ↓ Level 2: [15] ─────────> [30] (30 > 20, drop down) ↓ Level 1: [15] ──> [20] (20 = 20, FOUND!) ^^^^ Journey Recorded (used for insertions/deletions): ┌───────────┬─────────────────┐ │ Level 3 │ HEAD │ │ Level 2 │ [15] │ │ Level 1 │ [15] │ │ Level 0 │ [15] │ └───────────┴─────────────────┘ Time Complexity: O(log n) on average HEIGHT DISTRIBUTION (Probabilistic): ──────────────────────────────────── For 1000 nodes: Level 0: ~1000 nodes (all) ████████████████████████████████████████ Level 1: ~500 nodes ████████████████████ Level 2: ~250 nodes ██████████ Level 3: ~125 nodes █████ Level 4: ~62 nodes ██

Text Analysis Pipeline

Blaze transforms raw text into searchable tokens through a multi-stage pipeline.

┌─────────────────────────────────────────────────────────────────────┐ │ Text Analysis Pipeline │ └─────────────────────────────────────────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────┐ │ 1. Tokenization │ │ Split on non-alphanumeric chars │ └────────────────┬───────────────────────┘ ▼ ┌────────────────────────────────────────┐ │ 2. Lowercasing │ │ Normalize case ("Quick" → "quick") │ └────────────────┬───────────────────────┘ ▼ ┌────────────────────────────────────────┐ │ 3. Stopword Filtering │ │ Remove common words (the, a, is) │ └────────────────┬───────────────────────┘ ▼ ┌────────────────────────────────────────┐ │ 4. Length Filtering │ │ Remove tokens < 2 chars │ └────────────────┬───────────────────────┘ ▼ ┌────────────────────────────────────────┐ │ 5. Stemming (Snowball/Porter2) │ │ Reduce to root ("running" → "run") │ └────────────────┬───────────────────────┘ ▼ Final Tokens EXAMPLE TRANSFORMATION: ─────────────────────── Input: "The Quick Brown Fox Jumps!" │ ├─ Step 1: Tokenization │ └─> ["The", "Quick", "Brown", "Fox", "Jumps"] │ ├─ Step 2: Lowercasing │ └─> ["the", "quick", "brown", "fox", "jumps"] │ ├─ Step 3: Stopword Filtering (remove "the") │ └─> ["quick", "brown", "fox", "jumps"] │ ├─ Step 4: Length Filtering (all pass >= 2 chars) │ └─> ["quick", "brown", "fox", "jumps"] │ └─ Step 5: Stemming ("jumps" → "jump") └─> ["quick", "brown", "fox", "jump"]

BM25 Algorithm

BM25 (Best Matching 25) is the industry-standard ranking function used by Elasticsearch, Solr, and Lucene. It considers term frequency, document length, and term rarity.

BM25 FORMULA: ───────────── IDF(q_i) × TF(q_i, D) × (k1 + 1) BM25(D, Q) = SUM ───────────────────────────────────────── q_i TF(q_i, D) + k1 × (1 - b + b × |D|/avgdl) in Q Where: D = Document being scored Q = Query (set of terms q_1, q_2, ..., q_n) q_i = Individual query term IDF = Inverse Document Frequency (term rarity) TF = Term Frequency (occurrences in document) k1 = 1.5 (term frequency saturation parameter) b = 0.75 (length normalization parameter) |D| = Document length avgdl = Average document length WHAT BM25 CONSIDERS: ──────────────────── 1. Term Frequency: How often does the term appear? More occurrences = higher relevance 2. TF Saturation: Diminishing returns 3→10 occurrences matters less than 0→3 3. Document Length: Normalize by document size Prevents long docs from dominating results 4. Term Rarity: Rare terms are more important "quantum" > "the" in importance TERM FREQUENCY SATURATION: ────────────────────────── Score Contribution (with k1=1.5, b=0.75) ^ | /--------------- (saturation) | / 3 | / | / 2 | / | / 1 | / | / 0 |______/ +---+---+---+---+---+---+---+---+---+---+---> Term Frequency 0 1 2 3 4 5 6 7 8 9 10 Key: Going from 0→3 occurrences adds more to the score than going from 7→10 occurrences (diminishing returns)

// Search and rank using BM25
results := idx.RankBM25("machine learning", 10)

for _, match := range results {
    fmt.Printf("Doc %d: Score %.2f\n",
        match.DocID,
        match.Score)
}

Real-World Examples

E-commerce Search

func SearchProducts(idx *blaze.InvertedIndex, query string, 
                    category string, excludeOutOfStock bool) []blaze.Match {
    qb := blaze.NewQueryBuilder(idx).Term(query)
    
    // Add category filter
    if category != "" {
        qb.And().Term(category)
    }
    
    // Exclude out of stock items
    if excludeOutOfStock {
        qb.And().Not().Term("outofstock")
    }
    
    return qb.ExecuteWithBM25(20)
}

Multi-Category Search

func SearchInCategories(idx *blaze.InvertedIndex, 
                        query string, categories []string) []blaze.Match {
    qb := blaze.NewQueryBuilder(idx).Term(query)
    
    if len(categories) > 0 {
        qb.And().Group(func(q *blaze.QueryBuilder) {
            q.Term(categories[0])
            for i := 1; i < len(categories); i++ {
                q.Or().Term(categories[i])
            }
        })
    }
    
    return qb.ExecuteWithBM25(50)
}

Advanced Boolean Search

// Find documents about (machine learning OR deep learning) 
// AND (python OR tensorflow)
results := blaze.NewQueryBuilder(idx).
    Group(func(q *blaze.QueryBuilder) {
        q.Phrase("machine learning").Or().Phrase("deep learning")
    }).
    And().
    Group(func(q *blaze.QueryBuilder) {
        q.Term("python").Or().Term("tensorflow")
    }).
    ExecuteWithBM25(20)

Use Cases

Document Search Systems: Build full-text search for documentation, articles, or knowledge bases
Log Analysis: Search through application logs with boolean queries and ranking
Code Search: Index and search source code repositories
E-commerce: Product catalog search with filters and facets
Email Search: Fast, local email search without external services
Content Management: Search articles, posts, and media metadata

Key Features

Search Capabilities

Term Search: Find documents containing specific terms
Phrase Search: Exact multi-word matching
Boolean Queries: Type-safe AND, OR, NOT operations
BM25 Ranking: Industry-standard relevance scoring
Proximity Ranking: Score by term proximity
Position Tracking: Track exact word positions

Text Processing

Tokenization: Unicode-aware text splitting
Stemming: Snowball (Porter2) stemmer
Stopword Filtering: Remove common words
Case Normalization: Case-insensitive search
Configurable Pipeline: Customize analysis behavior

Data Structures

Roaring Bitmaps: 400x compression for document sets
Skip Lists: O(log n) position lookups
Inverted Index: Efficient term-to-position mapping
Binary Serialization: Compact storage format