Your privacy AI is only as honest as what it's allowed to refuse
Leban · 28 February 2026 · 13 min read
A privacy lead at a UK insurance broker asked me last quarter why I had bothered putting an AI feature on Sylure at all. She had spent the previous afternoon watching a colleague paste a partially redacted CSV into ChatGPT and ask it to summarise the data subjects whose claims appeared in the export. The summary was excellent. It read like a careful paralegal had written it. It also confidently described two people whose names did not appear anywhere in the file, and miscounted the number of records by about fifteen percent. The colleague had not noticed. The privacy lead had only noticed because she had run the same query herself a week earlier and remembered the numbers.
That kind of failure is the worst kind a privacy tool can have. A wrong answer that looks right is worse than no answer at all, because no answer prompts the reviewer to do the work themselves, and a wrong answer that looks right gets signed off, attached to a DSAR response, and lives in an evidence pack that later has to be defended to a regulator. The privacy lead's question to me was not "do I trust AI." It was sharper than that. It was "what stops your AI doing exactly what we watched ours do."
This post is the long answer. It walks through the four design constraints that sit behind every AI feature in Sylure, what each one is for, and what it costs. I have written it not to make the case that AI is safe in privacy work (that case is not mine to make in general), but to describe the specific architectural decisions that determine whether one particular AI integration is worth signing off on. If you come away thinking AI is the answer to DSAR fulfilment, I have written it badly. The honest argument is narrower than that.
What "grounded" actually buys you
The term that gets used most often in vendor pitches for AI features in regulated software is "grounded." The implication is that the model is being given the relevant data alongside the prompt, so it cannot hallucinate facts that contradict the data. This is a real and useful property, and most AI products in the privacy space implement some version of it. Sylure does too. But the term is dangerous because it has been used loosely enough that buyers have started treating it as a finished safety story rather than a starting point.
The honest description of what grounding gives you is narrower than the vendor language suggests. Sending a document or a result set in the prompt does make the model less likely to invent facts that contradict that document. It does not stop the model from inventing certainty about what the document says. A grounded summary of a fifty-page export can still describe trends that are not in the data, attribute statements to people who did not make them, and produce confidence intervals on numbers it computed itself. The grounding tells the model what is true. It does not stop the model from also telling you things that are not.
Grounding stops the model inventing facts. It does not stop the model inventing certainty.
This matters in privacy work specifically because the failure modes are asymmetric. Most software domains can tolerate a small rate of confidently wrong outputs as long as the average answer is useful. Privacy operations cannot. A confident-but-wrong answer about a data subject's personal data either gets disclosed to that subject (which is a personal data accuracy problem under Article 5(1)(d)), or gets used internally to justify a decision about them (which is a fairness and transparency problem under Article 5(1)(a)), or gets quoted in a response to the ICO (which is a credibility problem with consequences that extend beyond a single request). The asymmetry is the reason a privacy AI needs a tighter safety posture than a generic productivity tool, and grounding alone does not get you there.
Everything that follows in this post is structured around what has to be true in addition to grounding, for an AI feature in privacy work to be worth shipping.
Do the maths before the model sees the data
The single most important guardrail in Sylure's AI architecture is that no number generated by the system is computed by the model. The model never counts anything. It never adds. It never averages. It never decides whether a value is high or low. Every numeric statement in a Sylure AI briefing exists in a server-side analytic that was computed deterministically before the prompt was constructed, and the model's job is to read those analytics and compose prose around them.
To make this concrete: when a DSAR briefing for a data subject says "47 occurrences of personal data across 9 source assets, weighted towards email and HR documents," both 47 and 9 come from SQL aggregations against an indexed search result. The "weighted towards email and HR documents" phrase comes from a precomputed breakdown of those occurrences by file type and asset category. The model receives a small structured object that contains those numbers, the category breakdown, and a few flags about confidence. It composes the sentence. It does not arrive at the numbers.
This pattern is older than transformer-based language models. It is the same pattern used in business intelligence reporting, where queries are computed against a warehouse and templates render the prose around them. The only thing language models add is that the prose is more fluent and can adapt to the shape of the underlying data. The arithmetic is not their job, and the moment you let it become their job, you have lost the property that makes the output defensible.
The counter-example is the failure mode the insurance broker's colleague hit. Asking a model to count rows in a CSV directly will sometimes produce the right number and will sometimes produce a number that is off by tens of percent, depending on file size, prompt structure, and a dozen other factors that no engineering team controls reliably. Even when the count is right, the way the model arrived at it is not auditable. You cannot point at a SQL query that produced it. You cannot rerun the same operation deterministically. You cannot tell a regulator how the figure was generated, because the figure was generated by a probabilistic process whose internals are not legible to anyone, including the model's authors.
The practical cost of this constraint is that the AI layer in Sylure cannot answer questions that have not been pre-computed. If an analyst wants to know the count of identity hits filtered by a combination of asset type and risk band and date range that the analytics layer does not currently expose, the AI cannot generate that count. It can describe the analytics that do exist, suggest related angles, and recommend a manual analysis. It cannot synthesise the missing aggregation by itself. That is a real ceiling on what the feature can do, and the alternative (letting the model do the aggregation) would erase the property that makes the rest of the feature worth using.
Number traceability and the show-me-the-row test
The companion property to pre-computed analytics is that every figure surfaced in an AI-generated draft has to be reachable from the draft itself. When a Sylure briefing says "47 occurrences across 9 assets," the reviewer should be able to click into a view that lists those 9 assets, drill into each one, and see the occurrences. The number is not just a number. It is an entry point into the underlying evidence.
This is what I call the show-me-the-row test, and it is the simplest single test for whether an AI feature in privacy software is doing real work or theatre. Pick any number in the briefing. Can you reach the source rows it summarises in two clicks? If yes, the figure is evidential and the briefing is doing its job. If no, the figure is decorative and the briefing is performing the appearance of insight without being usable in a regulated workflow.
The cost of enforcing this is that the analytics layer and the AI layer are tightly coupled. Sylure cannot ship an AI feature that operates over data the analytics layer does not also surface, because there would be nothing for the reviewer to click into when verifying a number. This is one of the largest engineering constraints on the AI roadmap, and it is the reason that new AI features tend to lag the underlying analytics rather than precede them. The analytics have to exist, with their own UI affordances and audit trail, before the AI layer is allowed to summarise them. Doing it the other way around would produce briefings that are pleasant to read and impossible to defend.
There is a deeper reason for this beyond regulatory defensibility. Reviewers who use AI-generated content in privacy work develop their trust calibration through the small act of clicking through to source rows. Each time a briefing's claim is verified against the underlying data, the reviewer is teaching themselves where the AI is reliable and where it is not. A briefing that cannot be drilled into prevents that calibration from happening, which means the reviewer either over-trusts the output or under-trusts it, and either failure mode is worse than the calibration that two-click verification produces.
The refusal surface
The thing that distinguishes a credible privacy AI from a confident chatbot is not what it says. It is what it refuses to say. The refusal surface (the set of questions the system is required to decline) is the most important design artefact in the whole feature, and it deserves more attention than any of the headline capabilities the briefing pages talk about.
There are four categories of refusal that Sylure AI is required to make, every time, without negotiation. The first is questions about facts that are not derivable from indexed data. "Did this person consent to marketing?" is a question Sylure has no way to answer because consent records are not part of the data Sylure ingests. The right output is not a best-guess inference from adjacent fields. The right output is a clear statement that consent records are not in scope and the question has to be answered against the consent management system, which is somebody else's product. Saying this is short, boring, and correct. Saying anything more produces a guess that a reviewer might mistake for evidence.
The second category is legal determinations. "Is this a breach under Article 33?" looks like a question an AI assistant should be able to answer because the criteria are written down and the AI has access to the data. It is not. Breach determination requires judgement about likelihood and severity of harm, awareness of the organisation's overall risk register, and an understanding of regulatory expectations that change over time. A model that produces an answer to that question is producing an answer it should not be authorised to produce, regardless of how confident the answer sounds. Sylure AI refuses this class of question explicitly, and the user interface around the refusal points the reviewer to the breach assessment workflow that lives outside the AI feature.
The third category is questions outside the indexed event surface. "How many emails did this subject send last month?" presumes that Sylure has indexed email send-events, which it has not, because that is not Sylure's job. A model that is asked this question and produces a number is fabricating, full stop. The right answer is to name the systems where the question could be answered and decline to attempt an answer that cannot be sourced from Sylure's data.
The fourth category is questions where the supporting analytic exists but the underlying confidence is too low to merit a statement. If a precomputed risk score for an asset depends on twelve findings but only three of them have been classified at high confidence, the briefing cannot describe the risk score as if it were a settled value. The honest output is that the asset has an indicative risk score that should be reviewed manually, with the unreviewed findings flagged for triage. This is less impressive than a confident risk narrative. It is also the only output that survives a serious review.
The phrase that does most of the work in these refusal cases is short and deliberately unimpressive: "Not determined from available data." It is meant to be boring. Briefings that contain this phrase are doing their job. Briefings that never contain it are either operating on a dataset rich enough to support every claim, or are quietly inventing certainty in places they should not. In practice, complex briefings should hit at least one refusal somewhere in their output, and the absence of any refusal in a long briefing is itself a signal that the system is not being honest about the limits of its data.
The reason I think the refusal surface is more important than any of the other guardrails is that it is the property that survives changes in the underlying model. The model can be upgraded, swapped for a different provider, retrained, or tuned. The refusal surface is enforced at the boundary between the application and the model. As long as the application is asking the model to compose prose around a small, structured object whose schema includes refusal flags, the model has no choice but to honour them. If the application were instead handing the model a wide-open prompt and trusting it to know when to decline, the refusal surface would be a soft property, and soft properties in privacy work are not properties at all.
The metadata-only payload contract
The fourth design constraint is the contract that governs what information Sylure is allowed to send to its AI providers. The short version is that no raw personal data ever leaves Sylure's UK-hosted environment in an AI request. The longer version is more useful because it is also more honest about where the line actually falls.
What goes into an AI prompt for a DSAR briefing is a structured object containing aggregate counts (occurrences by category, by file type, by asset), risk scores derived from those counts, severity bands, asset-level summaries that include path information and structural metadata, and a small set of flags that describe the confidence level of the underlying analytics. None of the raw matched values (the names, the email addresses, the phone numbers, the financial identifiers, the National Insurance numbers) ever appear in the payload. The model is told that there are 17 high-confidence email-format matches on a specific asset. It is not told the email addresses themselves.
This is the architectural commitment that makes the AI layer compatible with the UK-only data residency posture I wrote about in the hosting post. Sylure uses an AI provider whose infrastructure operates outside the UK. That would be a problem if customer personal data were flowing through the prompt. It is not, because the prompt is structured to contain only the kinds of metadata that aggregate descriptions are built from.
The bit I want to be honest about is that path metadata can include human-meaningful folder names. An asset path like /exports/HR/leavers/2024/q3/anonymised.csv tells the model something about the document beyond its raw file type, and a thoughtful adversary could use folder structures to make inferences about an organisation's internal categorisation. This is the most uncomfortable corner of the metadata-only contract, and the right response is not to pretend it does not exist. Sylure customers who want to tighten this further can configure path redaction for AI payloads at the workspace level, which keeps file types and asset categories in the prompt but strips folder names. The default keeps folder names because in practice the information they carry is useful for generating accurate briefings and the residual risk is small. Customers operating in high-sensitivity environments can decide the trade-off differently, and the configuration exists precisely so that decision is theirs.
The point of being specific about this is not to make the architecture sound impeccable. It is to make it inspectable. A vendor who claims that no personal data goes to AI providers and stops there is not telling the buyer enough to evaluate the claim. The buyer is left to take the claim on faith, which is what regulated buyers cannot do. The version that fits in a procurement conversation is more useful: here is exactly what goes into the prompt, here is exactly what does not, here is the one part of the boundary that has a defensible default and a configurable alternative.
What this approach cannot do
A post about architectural decisions has to include the section where those decisions turn out to be limits, because anything else reads as marketing.
Sylure AI cannot answer questions whose underlying data is not in Sylure. A privacy team asking the AI to summarise consent records, breach incidents, or vendor risk assessments will get a refusal, because those datasets live elsewhere. The AI is not a generalist privacy assistant. It is a layer that composes prose around the discovery and analytics work that Sylure does itself, and the boundaries of its competence are exactly the boundaries of that work.
Sylure AI cannot produce narrative judgement about a person. It cannot decide that a subject is a higher or lower priority for a DSAR response, that an asset is more or less sensitive than the score suggests, or that a complaint pattern indicates a particular intent. These are judgements that have to live with a human reviewer, and the AI is structured to lay out the evidence rather than to advocate for an interpretation. This is sometimes frustrating to people who want a more opinionated tool. It is also the only posture that makes the output usable in a workflow where the reviewer carries the accountability.
Sylure AI will never be the smartest-sounding model in the room. A model with full access to raw customer data and no refusal surface can produce more fluent, more confident, more specific-sounding briefings. The trade-off is that those briefings are not defensible, and the point of the AI feature in a privacy tool is not to be eloquent. It is to be the kind of draft that a reviewer can sign off on and a regulator would not laugh at. Those are different optimisation targets, and most of what feels like restraint in the Sylure AI feature is the result of choosing the second one.
The most useful sentence Sylure AI writes is the one it refuses to.
Related posts
Technical Architecture
Why Sylure is built for UK-only hosting
UK-only hosting is a deliberate architectural choice for Sylure, not a default. Here is why it matters for UK privacy buyers, and what the trade-offs are.
Privacy Operations
What actually happens during a manual DSAR
Most UK organisations underestimate the real labour cost of fulfilling a single DSAR by hand. Here is what twenty hours of manual privacy work actually looks like.