Open-source RAG for Zurich SMEs

Canton Zurich AI Program generates a Retrieval-Augmented Generation prototype for SMEs

Started
February 1, 2026
Status
Completed
Share this project

Abstract

As part of the Canton Zurich SME AI Program, nine companies joined forces in a collaborative prototyping to design and build a Retrieval-Augmented Generation (RAG) system for regulatory compliance. Rather than working on separate solutions, participants jointly developed a shared architecture, learning from one another while tackling a concrete, realistic scenario: enabling a packaging company to navigate heterogeneous documentation to meet sustainability compliance requirements. The outcome is a functional, open-source AI framework, together with a set of shared design principles that any of the participating organizations can reuse, extend and adapt to address their own operational and regulatory needs.

People

Collaborators

SDSC Team:
Paulina Körner
Ivan-Daniel Sievering
Thibaut Loiseau
Anna Fournier

PI | Partners:

Canton of Zurich | Department of Economics:

Markus Müller
Raphael von Thiessen

Ahead Zurich:

Chantal Stäuble
Anna Zakharova

Collaborating SMEs

description

Motivation

The central question underlying the collaborative use case was: How can we design RAG systems that provide reliable outputs when the available evidence and data are incomplete, contradictory, or of uneven quality?

To address this, the nine SMEs participating in the AI Innovation Program of the Canton of Zurich identified seven practical challenges that arise when deploying AI in compliance-sensitive or regulated environments:

  1. Bringing knowledge together: important information is often spread across many different documents, formats and systems. Collecting and organizing this information into a reliable knowledge base is the first major challenge.
  2. Finding the right information: Specialized language and complex questions make it difficult to consistently identify the most relevant information needed to accurately answer a query.
  3. Trustworthy and transparent answers: Every answer must be based on verifyable sources, clearly reference where the information comes from, and avoid unsupported or fabricated claims.
  4. Reliable and consistent results: The system must produce results that are consistent, measurable, and dependable enough for real-world, production-ready use, not just demonstrations.
  5. Structured and usable outputs: Results need to be delivered in formats that fit existing business processes, reports, and software systems, rather than as unstructured text alone.
  6. Understanding  user questions: People often ask questions that are vague, incomplete, or depend on context. The system must interpret these questions correctly to retrieve the right information.
  7. Meeting regulatory compliance: In regulated industries, AI systems must provide clear records of how decisions are made and comply with governance, traceability, and audit requirements.

Solution

The prototype implemented a full end-to-end RAG pipeline covering five core stages: chunking, embedding, storage, retrieval, and answer generation. On top of this shared framework, four feature tracks were developed in parallel, allowing all nine companies to contribute to and benefit from a common architecture simultaneously:

·      Track 1 introduced an evaluation infrastructure from the outset, with metrics for faithfulness, answer relevancy, context precision, and context recall, alongside a synthetic question-answer dataset for scalable testing.

·      Track 2 enforced a structured JSON (Java Script Object Notation) response format so that every answer included explicit source references and suggested follow-up questions, making the outputs transparent, auditable, and ready for integration into downstream applications and business processes.

·      Track 3 implemented and compared advanced retrieval strategies including semantic vector search, keyword search, and hybrid retrieval with Reciprocal Rank Fusion, as well as query expansion and conversational context management for complex multi-turn questions.

·      Track 4 extended the system toward agentic workflows, enabling the model to reason across multiple steps, decompose complex questions, compare sources independently, and identify missing evidence before generating a final answer.

All components were implemented as modular, swappable building blocks within an editable Python library (conversational-toolkit) complemented with a use-case-specific application layer. This architecture makes it easy to add, replace, or customize components without rewriting the underlying system.

Figure 1: RAG prototype development structure — baseline pipeline and four feature tracks, SME AI Program in the Canton of Zürich (February–March 2026).

Figure 2. RAG prototype frontend (PrimePack AG use case), SME AI Program in the Canton of Zürich (February–March 2026).

Impact

The project showed that modular, evidence-aware RAG systems can be developed using open-source tools and shared infrastructure. More importantly, it demonstrated the value of collaborative prototyping as a way to learn, test ideas, and build common understanding across organizations.

Participants not only created a working prototype, but also developed a shared understanding of good design principles for RAG systems. These include building evaluation infrastructure early, combining multiple retrieval methods, separating stronger and weaker evidence, and structuring outputs to make answers transparent and auditable.

The resulting open-source prototype, codebase, and notebooks now provide participants with a practical reference architecture that can be reused and extended for future applications.

Open-source code available

The RAG prototype is available for SMEs open-source and maintained by the SDSC on GitHub:

https://github.com/SwissDataScienceCenter/sme-kt-zh-collaboration-rag

Gallery

Annexe

Additional resources

Bibliography

Publications

Related Pages

More projects

Timeseries Forecasting for Business Impact

Completed
Supporting Zurich SMEs in operational efficiency and decision-making
Digital Society
Private sector

CHUV: Heracles - Sepsis Model

Completed
CHUV: Improving Sepsis detection and quality of care with AI
Health & Biomedical
Public sector

AI-Driven Political Monitoring

Completed
Legislative tracking for labor advocacy at Kaufmännischer Verband Schweiz
Digital Society
Private sector

LUCID National Data Stream

In Progress
Low Value of Care in Medical Hospitalized Patients - a National Data Stream on Quality of Care in Swiss University Hospitals
Health & Biomedical

News

Latest news

Coding the Future: Energy Data Hackdays Expand to French-speaking Switzerland
May 7, 2026

Coding the Future: Energy Data Hackdays Expand to French-speaking Switzerland

Coding the Future: Energy Data Hackdays Expand to French-speaking Switzerland

Held at the SDSC headquarters at Biopôle, the Energy Data Hackdays gather 100 experts to tackle 5 energy and grid challenges.
Science des données : le SDSC et le Canton de Vaud soutiennent quatre projets appliqués
April 30, 2026

Science des données : le SDSC et le Canton de Vaud soutiennent quatre projets appliqués

Science des données : le SDSC et le Canton de Vaud soutiennent quatre projets appliqués

Le SDSC et le Canton de Vaud ont retenu quatre projets parmi les 57 soumissions reçues lors de leur deuxième appel à projets.
Le Swiss Data Science Center inaugure son siège au Biopôle de Lausanne
March 12, 2026

Le Swiss Data Science Center inaugure son siège au Biopôle de Lausanne

Le Swiss Data Science Center inaugure son siège au Biopôle de Lausanne

Le SDSC inaugure aujourd'hui son siège au campus Biopôle de Lausanne, dans le cadre d'un partenariat stratégique avec l'État de Vaud.

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!