AI and confidentiality: how your data stays private when we wire an LLM into your firm

The legitimate fear (and the wrong answers)

Every firm we meet asks the same question in the first 5 minutes: "If we plug AI into our files, do our clients' data end up inside the model?"

The short answer: no, if it is architected correctly.

The longer answer: most "AI data leak" incidents come from two sources — employees copying client data into ChatGPT, or poorly configured integrations sending raw data to public APIs. Neither is inevitable.

What Law 25 actually requires

Since Quebec's Law 25 came into full force, the Commission d'accès à l'information imposes three direct obligations when a firm uses AI:

Transparency. If a decision affects an individual (conflict check, case prioritization, request triage), the firm must be able to explain how the AI made that decision to the person concerned.
Consent off by default. Tracking and analytics tools are opt-in, not opt-out. No tracking cookies without explicit consent.
Data minimization. Only pass the LLM the data strictly necessary for the task. No full-case dumps "just in case."

Fines run up to 4% of global revenue. But the real loss is client trust.

The architecture we deploy (and why it works)

Here are the four layers of protection we put in place in every AI integration project:

1. No training on your data

The models we use (Azure OpenAI, Anthropic, self-hosted open-source) do not train on client data. Your queries do not end up in the next public model. This is contractual, not just a promise — enterprise APIs provide written non-retention guarantees.

2. Access control in RAG (the problem nobody sees)

The standard technique for wiring an LLM into your data is called RAG (Retrieval-Augmented Generation). The LLM does not "see" your entire database — it only receives the fragments relevant to the question asked.

The invisible risk: data bleed. An administrative assistant asks the system a question, and the AI surfaces a confidential document from the executive committee because the permissions filter does not exist.

The fix: we implement role-based access control (RBAC) directly in the vector database. If you do not have access to a document in Clio, you do not have access to it in the AI either. Same permissions, same perimeter.

3. Synthetic data for testing

When we develop and test a pilot, we never work with real client data. We generate synthetic datasets — data that has the same statistical properties as the real thing but contains zero actual human beings.

For a law firm: fictitious files with fictitious names, coherent dates, realistic amounts. The pilot works exactly the same way, but zero sensitive data touches the development environment.

4. Sovereign hosting when needed

For firms handling high-sensitivity cases (immigration, family law, criminal), we can deploy open-source models on private infrastructure — Azure Canada (Montreal/Toronto) or a dedicated server. Data never leaves Canadian soil.

This is not necessary for every use case. For a conflict check or email triage, enterprise APIs with non-retention guarantees are sufficient. We size the protection to the actual risk.

The question partners forget to ask

Most firms focus on "will AI leak our data?" when the real risk is already here: your employees have been using ChatGPT, Copilot, and Claude as shadow AI, with zero guardrails, since 2023.

A structured AI integration project replaces wild usage with controlled usage. Instead of blocking AI (which never works), we channel it into a system with permissions, logs, and guardrails.

What we sign before we start

Every project begins with a mutual NDA. Here is what it covers in practice:

Non-retention of data: queries sent to the LLM are not stored beyond the session.
No training: your data does not contribute to any future model.
Right to audit: you can audit our infrastructure and logs at any time.
Destruction at project end: all project data is purged per your retention policy.

If confidentiality is what is holding you back

The AI Integration Project starts with a 30-minute discovery call where we address the confidentiality question first. We do not touch anything until you are comfortable with the architecture.

Book a 30-min discovery call →

Or start lighter: download the AI readiness checklist for your vertical.