Chris Hayes' Journal

Building a Semantic Bible Search Engine with RAG: From 61% to 80% Accuracy

A journey through embeddings, hybrid search, and making the KJV Bible searchable by meaning, not just keywords


The Problem: Searching the Bible is Hard

Traditional Bible search tools are limited. Type “God’s love” and you get verses containing those exact words. But what about the hundreds of verses about divine compassion, mercy, or grace that never use the word “love”?

What if you search for “trusting God in difficult times” but the KJV says “trust in the LORD” or “lean not on thine own understanding”? Keyword search fails.

We needed semantic search - the ability to find verses by meaning, not just exact words.

The Solution: RAG (Retrieval-Augmented Generation)

We built a semantic search engine for the King James Bible using:

The result? 80% top-10 accuracy on 5,000 test queries.


The Journey: Three Critical Breakthroughs

1. The Parser Crisis: When Verses Were 7,837 Characters Long

Early on, we hit a wall. Our embeddings were failing with mysterious EOF errors from Ollama:

Error embedding verse Mark 12:24: EOF (status code: 500)

The Investigation: We discovered verses were abnormally long:

Matthew 5:3: 5,990 characters (should be ~74)
Mark 12:24: 2,889 characters
242 verses total > 1,000 characters

The Root Cause: The KJV XML uses OSIS milestone markers (sID and eID) that can cross element boundaries:

<q sID="q1">
  <verse sID="Gen.3.1" osisID="Gen.3.1"/>
  Now the serpent was more subtil...
  <verse eID="Gen.3.1"/>
</q eID="q1">

Quote boundaries can cross verse boundaries, putting markers at different nesting levels. Our recursive parser was concatenating multiple verses together.

The Fix: Complete parser rewrite using single-pass traversal with state tracking:

# Track current verse with milestone markers
for elem in root.iter():
    if elem.get('sID'):  # Verse start
        current_verse_id = elem.get('osisID')
        collecting = True

    if collecting:
        verse_elements.append(elem)

    if elem.get('eID') == current_verse_id:  # Verse end
        build_verse(verse_elements)
        collecting = False

Result: 0 verses over 1,000 characters. 100% parsing accuracy.


2. The “Subtil” Problem: When Modern Language Fails

Early testing showed poor accuracy on archaic KJV language:

Query: "serpent was crafty"
Expected: Genesis 3:1 ("serpent was more subtil")
Actual: Not in top 10 ❌

The KJV uses archaic words that modern embeddings don’t understand:

Attempt 1: Query Expansion We built a synonym expander:

query = "you shall not kill"
expanded = "you shall not kill slay slew slain"  # Add KJV variants

Result: Helped some, but limited. Accuracy improved from 61% to 65%.

Attempt 2: Hybrid Search We combined semantic search (70%) with keyword search (30%):

semantic_score = cosine_similarity(query_embedding, verse_embedding)
keyword_score = exact_word_matches(query, verse_text)
final_score = 0.7 * semantic_score + 0.3 * keyword_score

Result: Massive improvement! Accuracy jumped to 80%.

Query Type Semantic Only Hybrid Improvement
Keywords 57.3% 84.7% +27.3%
Paraphrases 37.7% 61.4% +23.8%
Modern 84.6% 95.6% +11.0%
Overall 61.1% 79.7% +18.6%

Trade-off: 7.5x slower (277ms vs 37ms), but worth it for the accuracy gain.


3. The John 3:16 Problem: Famous Verses Not Ranking High

Users expect famous verses to rank higher. When searching “God loved the world,” John 3:16 should be #1, not buried on page 2.

The Solution: Popularity Boosting

We curated a database of 76 famous verses with popularity weights:

{
  "John 3:16": {"weight": 3.0, "category": "salvation"},
  "Psalm 23:1": {"weight": 3.0, "category": "comfort"},
  "Genesis 1:1": {"weight": 3.0, "category": "creation"},
  "Romans 3:23": {"weight": 2.5, "category": "salvation"},
  // ... 72 more
}

Boost formula:

boost = (popularity_weight - 1.0) * 0.3
new_score = original_score + boost

Results:

Query Without Boost With Boost
“beginning of everything” Genesis 1:1 at #3 Genesis 1:1 at #1 ⬆️
“saved by faith” Ephesians 2:8 at #2 Ephesians 2:8 at #1 ⬆️
“all have sinned” Romans 3:23 at #1 (0.925) Romans 3:23 at #1 (1.375) ✅

Impact: 20-50% of queries benefit from boosting, with famous verses rising 1-3 positions on average.


The Architecture: How It All Works

1. Build Phase (One Time)

python build.py

What happens:

  1. Parse 31,102 KJV verses from OSIS XML
  2. Generate embeddings using nomic-embed-text (768 dimensions)
  3. Cache everything to disk (174 MB)
  4. Validate with test queries

Time: 8-10 minutes first run, <1 second subsequent runs

2. Search Phase (Every Query)

from src.rag_search import RAGBibleSearch

rag = RAGBibleSearch(
    model='nomic-embed-text',
    use_query_expansion=True,      # Handle archaic language
    use_popularity_boost=True      # Boost famous verses
)

results = rag.hybrid_search("trusting God in difficult times", top_k=10)

What happens:

  1. Query expansion: Add archaic synonyms
  2. Semantic search: Convert query to embedding, find similar verses (70% weight)
  3. Keyword search: Find exact word matches (30% weight)
  4. Popularity boost: Boost famous verses by 30%
  5. Return ranked results

Time: ~277ms average

3. API (Optional)

python api.py

RESTful API with FastAPI:

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "God so loved the world",
    "top_k": 10,
    "search_mode": "hybrid"
  }'

Response:

{
  "query": "God so loved the world",
  "results": [
    {
      "reference": "John 3:16",
      "text": "For God so loved the world...",
      "confidence": 1.623,
      "rank": 1
    }
  ],
  "processing_time_ms": 245
}

The Stack

Core Technology:

Data:

Techniques:


Key Learnings

1. Hybrid > Pure Semantic

Pure semantic search sounds cool, but real-world accuracy demands hybrid approaches. Combining semantic understanding (70%) with keyword matching (30%) gave us the best of both worlds.

2. Archaic Language is Hard

Modern embedding models trained on contemporary text struggle with 400-year-old English. Query expansion helps, but hybrid search is the real solution.

3. Parser Edge Cases Matter

OSIS milestone markers are tricky. What seems like simple XML parsing becomes complex when elements can cross boundaries. Single-pass traversal with state tracking was the key.

4. User Expectations Drive Features

Users expect John 3:16 to rank #1 when relevant. Popularity boosting addresses this without sacrificing accuracy for less famous verses.

5. Benchmarking is Essential

We ran 5,000 query benchmarks to validate improvements. Without hard numbers, we’d never know if changes helped or hurt.


Performance at Scale

5,000 Query Benchmark Results:

Metric Semantic Hybrid Improvement
Rank #1 39.0% 57.3% +18.3%
Top 3 50.9% 71.1% +20.2%
Top 10 61.1% 79.7% +18.6%
Queries/sec 27.0 3.6 -85.7%
Latency 37ms 277ms +647%

Trade-off Analysis:

Query Type Breakdown:

Type Semantic Hybrid Improvement
Keywords 57.3% 84.7% +27.3%
Modern 84.6% 95.6% +11.0%
Paraphrase 37.7% 61.4% +23.8%
Partial 61.7% 83.0% +21.2%
Typo 66.7% 80.7% +14.0%

Key insight: Hybrid search excels at keywords (+27%) and paraphrases (+24%), precisely where pure semantic search struggles.


Try It Yourself

Quick Start

# 1. Clone the repo
git clone https://github.com/chrishayescodes/biblesearch.git
cd biblesearch

# 2. Setup
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# 3. Install Ollama
# Visit https://ollama.ai
ollama serve
ollama pull nomic-embed-text

# 4. Build (first time: ~10 minutes)
python build.py

# 5. Search!
python tests/test_search_interactive.py

Example Queries

Try these to see semantic search in action:

"God made light" → Genesis 1:3
"serpent was subtil" → Genesis 3:1 (archaic spelling!)
"you shall not kill" → Exodus 20:13 (modern language)
"love your enemies" → Matthew 5:44
"beginning of everything" → Genesis 1:1 (popularity boost)

The Road Ahead

What’s Next?

Potential improvements:

  1. Contextual search - Search within specific books or chapters
  2. Multi-verse results - Return passage ranges, not just single verses
  3. Translation comparison - Add NIV, ESV, etc.
  4. Question answering - Use LLM to generate answers from retrieved verses
  5. Cross-references - Link related verses automatically

Performance Optimizations

  1. FAISS indexing - Speed up vector similarity search
  2. Keyword term indexing - Faster exact matching
  3. Query result caching - Cache popular queries
  4. Incremental embedding updates - Add verses without rebuilding

Conclusion

Building a semantic Bible search engine taught us valuable lessons about:

The result? A search tool that understands meaning, not just keywords. Type “trusting God in difficult times” and get Proverbs 3:5-6, even though those exact words don’t appear.

80% accuracy. 31,102 verses. 100% local. 0% cloud APIs.


Resources


Acknowledgments

Built with:


Questions? Issues? PRs welcome!

Open an issue on GitHub or reach out at @chrishayescodes


This project demonstrates practical RAG implementation for educational purposes. May it help others learn about semantic search, embeddings, and building intelligent text retrieval systems.

📖 “Search the scriptures” - John 5:39 📖