Multi-Domain Cache LoRA

Single LoRA adapter for improved semantic cache accuracy across medical, law, programming, and psychology domains

+21.5%
Avg Improvement
4
Domains
596 KB
Adapter Size
90.6 MB
Total Memory

Overview

Multi-domain LoRA is a single 596KB adapter trained on semantic cache triplets from medical, law, programming, and psychology domains. It improves cache accuracy by helping the base embedding model better distinguish between semantically similar but different queries.

🎯 Purpose
Enhance semantic cache hit/miss decisions by improving the separation between true duplicates and related-but-different queries across multiple specialized domains.
âš¡ Performance
Achieves 11-27% margin improvement across all tested domains with a single small adapter, no domain detection required.
💾 Efficiency
Only 596KB adapter size, 0.44% trainable parameters. Total memory: 90.6MB (base 90MB + adapter 0.6MB).

Performance Results

Comparison of baseline (no LoRA), domain-specific LoRAs, and multi-domain LoRA across test sets.

Domain Test Triplets Baseline Margin Domain-Specific Multi-Domain Winner
Medical 200 0.4416 0.6305 (+42.8%) 0.5517 (+24.9%) Domain-Specific
Law 20,862 0.4940 0.6219 (+25.9%) 0.6290 (+27.3%) Multi-Domain ✓
Programming 20,862 0.2358 0.2367 (+0.4%) 0.2651 (+12.4%) Multi-Domain ✓
Psychology N/A - - - No test set
📊 Key Finding

The multi-domain LoRA achieves ≥10% improvement on all tested domains with a single adapter. While domain-specific LoRAs excel in medical (+42.8%), the multi-domain approach wins for law and programming, making it the recommended choice for production due to simplicity and consistency.

Architecture

Base Model
sentence-transformers/all-MiniLM-L12-v2
• 12-layer transformer
• 384-dimensional embeddings
• 90MB model size
• Fast CPU inference
LoRA Configuration
Low-Rank Adaptation
• Rank: 8
• Alpha: 16
• Target: query, value projections
• 0.44% trainable params
Training
Triplet Loss
• 1 epoch training
• Combined multi-domain data
• ~40 min on 4x A10G GPUs
• Final loss: 0.0955

Training Data Generation

Triplets are synthetically generated using Qwen/Qwen2.5-7B-Instruct to create:

The 7B model provides high-quality paraphrases and challenging hard negatives that are semantically related but distinct enough to require separate LLM responses.

Usage

Download from HuggingFace

from sentence_transformers import SentenceTransformer
from peft import PeftModel

# Load base model
base_model = SentenceTransformer("sentence-transformers/all-MiniLM-L12-v2")

# Apply multi-domain LoRA
base_model[0].auto_model = PeftModel.from_pretrained(
    base_model[0].auto_model,
    "llm-semantic-router/multi-domain-cache-lora-L12"
)

# Use for embeddings
embedding = base_model.encode("What are the symptoms of diabetes?")

Resources

📊 Test Datasets
Test Sets Repository
• Medical: 200 triplets
• Law: 20,862 triplets
• Programming: 20,862 triplets
📚 Documentation
💡 Production Recommendation

Use the multi-domain LoRA for production deployments. It provides consistent performance across all domains with a single 596KB adapter, requires no domain detection logic, and is easier to maintain than managing multiple domain-specific adapters.