Case Studyllm as a judgesearch qualitynermultilingual

Zalando Deploys LLM-As-Judge For Search Quality Assurance

engineering.zalando.com

|March 17, 2026

8.2

Relevance Score

Zalando Deploys LLM-As-Judge For Search Quality Assurance

Zalando published a 2024 research paper and in 2025 applied an LLM-as-a-judge framework to evaluate search relevance proactively. The system uses NER-based query clustering, LLM translation, and visual-text context to score results at scale for new markets including Luxembourg, Portugal and Greece. This approach automates pre-launch QA, reduces manual annotation, and enables reproducible re-evaluation after fixes.

Scoring Rationale

Strong practical impact from official Zalando deployment and reproducible pipelines, limited academic novelty compared with foundational LLM research.

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.