Posts by Sarah Roberts

Website

What Counts as Personal Data? A Field Guide for Anyone Who’s Ever Had to Actually Decide

Most people encounter “personal data” as a definition in GDPR training slides, a neat sentence in Article 4(1), or a diagram with arrows pointing at names, addresses, IPs, cookies, and maybe a silhouette of a person for good measure. It looks crisp on paper. It feels graspable.

Then you sit down in front of a real Subject Access Request and discover the definition is only the outline. The shape you have to fill in yourself.

In practice, deciding what is and isn’t personal data is less like checking items off a list and more like handling smoke. It gets everywhere. It curls into places you didn’t expect. And just when you think you’ve pinned it down, another strand drifts out of frame and forces you back to the definition again.

For years I treated these decisions as admin tasks. You read the request, pull the data, redact anything that belongs to someone else, and send the rest back. Straightforward. Bureaucratic. Maybe even a little boring.

Then I had to deal with a DSAR that turned the entire surface-level understanding inside out.

It was one of those requests that looks simple until you realise it isn’t. Tens of thousands of records. Old systems that predate GDPR’s fashionably neat abstractions. Internal identifiers that sit innocently in a database column but, when stitched together at scale, tell a completely different story. And an organisation with mixed instincts: good intentions, inconsistent governance muscle memory, a fondness for doing things informally because “none of this has ever been a problem before”.

If you work in information governance, you probably just felt a chill run down your spine.

That’s the bit no one tells you: the law is clean. The lived reality is messy.

So this is an attempt to bridge those worlds — not by quoting doctrine at people, but by describing the actual work of figuring out what personal data is, when it isn’t, and why you sometimes end up arguing about an ID number that looks meaningless right up until it isn’t.

The definition is simple until it isn’t

GDPR gives you two tests:

  1. Does the data relate to someone?
  2. Can the person be identified by it, directly or indirectly?

Everything else — the Recital 26 “means likely to be used”, the WP29’s four building blocks, the case law — is basically the regulatory community trying to fill in the gaps humans create by being inventive, inconsistent and context-dependent.

Take the “relates to” question. The YS, M and S case is the one everyone drags out: legal analysis about a person isn’t personal data because the document is really about the authority’s reasoning, not the applicant. Fine. Sensible.

But when you put that next to an operational system — say, a moderation log, an access audit, a workflow report — nothing slots neatly into either category. Real systems mix content, context and purpose like sediment layers.

A timestamp: is it about the person?
An action they took: yes, but does the identifier they touched then become about them too?
A record they interacted with: only sometimes.
And sometimes: absolutely not.

The law expects you to make that call, without turning everything into personal data just because a person breathed near it, and without accidentally releasing someone else’s information because you treated a linking field as meaningless.

You have to look at the substance, not the surface. It is astonishing how often that’s forgotten.

Identifiability isn’t hypothetical

One of the things GDPR got right — and which Patrick Breyer hammered home — is that identifiability is not a theoretical parlour game. It’s not “could a nation-state reidentify this with unlimited resources?” It’s: what could realistically happen, using means that are actually available?

That should be simple, but many organisations still get tangled up in the wrong direction. They tend to respond to identifiability questions by waving at how “public” the data already is, as if public equals harmless.

Public data is not safe when you rearrange it into new shapes.
Ask anyone who’s ever worked with open registers, scraped datasets or historic archives. Combinations matter. Context matters. Structure matters.

A hundred thousand identifiers sitting quietly in a relational database don’t behave the same way once they’re handed over in a single structured extract.

The identifiability shifts.
The risk shifts.
Your obligation shifts with it.

You cannot skip that step because someone insists “but she already sees this on the website”.

Mixed data isn’t exotic; it’s daily

Most guidance treats mixed data like a rare species — the kind of thing you only encounter in HR disputes, tribunal files or messy inboxes.

In reality, mixed data is the default state of almost every operational system. Especially ones built long before data protection was fashionable.

A single row can contain:

  • a timestamp relating to Person A
  • an action taken by Person B
  • a link to something about Person C
  • and a system note by Person D

If you treat all of it as personal data about A or B or C, you’ve misunderstood the test. If you treat none of it as personal data because it’s “just how the system works”, you’ve misunderstood the test in the other direction.

The balancing test in DPA 2018 for mixed data cases is famously context-heavy. What people rarely mention is the emotional reality: you will probably disappoint someone. DSARs involving mixed data are frequently exercises in expectations management, not customer service.

There’s a story nobody writes down in case studies:
The point where you’re sitting in a meeting, explaining — for the third time — that identifying who you’re releasing data about matters, not just how much they once had access to. Or that public availability is not the same thing as lawful disclosure in a structured format. Or that “trust” is not an exemption under Article 15(4).

These conversations take longer than the redaction work.

Governance is only governance if you write it down

The quickest way to undermine a compliant DSAR response is to make all your decisions in Slack threads, reference your “general sense” of past practice, and call it a day.

GDPR cares less about the outcomes we feel comfortable with and more about the processes we can demonstrate. Article 24 is quietly brutal about this: accountability is not a vibe.

What the law expects is boring:

  • a documented necessity assessment
  • a clear record of third-party considerations
  • a reasoning trail that would survive daylight
  • the alternatives you considered
  • the minimisation steps you took
  • and why you rejected or accepted them

In other words, the kind of record that means you can answer a regulator with something more substantial than: “We talked about it and we think it’s probably fine.”

The gap between those two mindsets is where most DSAR pain lives.

The weird thing you discover when you finally decide

The most surprising part of wrestling with a complex DSAR is this: the hardest decisions aren’t about what the law says. They’re about what you’re willing to stand behind.

It’s easy to talk about proportionality when everything is abstract.
It’s harder when you’re looking at a dataset that affects real people, or when senior staff want a certain outcome because it feels right, or when a volunteer questions something you’ve been arguing internally for days.

You learn very quickly that good governance isn’t about being right — it’s about being consistent. It’s about being able to say, with a straight face and a full audit trail: “We did this because the law required it, not because it was convenient.”

You also learn that “public interest” is a phrase people like to use when they simply want something to happen. And that minimisation is the principle most organisations forget exists until it becomes inconvenient.

Why this matters beyond any one case

People think DSARs are about disclosure. They’re not. They’re about the organisation’s self-understanding. They expose everything:

  • how well you know your systems
  • how shaky your access controls are
  • how much institutional memory you rely on
  • how you handle disagreement
  • how allergic your senior leadership is to slow, structured decision-making
  • how much your processes depend on personalities rather than principles

A DSAR is like turning on the lights in the server room you’ve been walking through by instinct. Suddenly you notice the tangle of cables, the unlabelled switches, the access someone still has from five years ago, and the realisation that your governance posture is basically hope with a spreadsheet.

So what counts as personal data?

After months of wrestling with it, my answer is painfully simple:

Personal data is whatever forces you to confront the fact that context is doing more work than you realised.

The law gives you the frame.
The organisation gives you the mess.
Your job is to bridge the gap without pretending one can be flattened to fit the other.

If there’s one thing I wish more organisations understood, it’s this:
DSARs are not admin.
They’re not customer service.
They’re not something you can rush out the door because the deadline is looming.

They’re governance in miniature.
And if you do them properly, they show you exactly where your weak points are — long before the ICO has to.

Mapping the Bins – How one person used FOI to fix a very ordinary, very real problem

Kamran Ali wants to build a free app so residents of North Warwickshire can find their nearest public bin. That’s it. Not a commercial venture, not a data harvesting scheme, just a straightforward civic tech project to help people dispose of their rubbish responsibly.

Which sounds like exactly the kind of thing councils should be falling over themselves to support, right? Open data, community engagement, residents taking initiative to solve local problems. All the things we’re supposed to want.

Except it’s now mid-November, and Kamran’s been navigating the council’s FOI process since September to get permission to actually use the data they’ve already given him.

Read more: Mapping the Bins – How one person used FOI to fix a very ordinary, very real problem

What Kamran’s trying to do

The concept is simple: take the council’s bin location data, convert it to a format that works on mobile devices, and display it on a map. If you’re out and about with a coffee cup or a dog waste bag, you can check the app and find the nearest bin rather than carrying it around for twenty minutes or (let’s be honest) leaving it on a wall somewhere.

It’s the sort of low-stakes, high-utility tool that proves useful not because it’s revolutionary but because it solves a minor everyday annoyance. No different from apps that show you the nearest public toilet, EV charging point, or bus stop.

The data already exists, the council maintains a spreadsheet of all their public litter and dog waste bins with locations. Kamran requested it under the Environmental Information Regulations in early September, and the council provided it within six days. Perfectly straightforward.

But here’s where most people would run into a problem they didn’t know existed.

The bit nobody tells you about

Getting data under FOI or EIR doesn’t actually give you permission to republish it. Disclosure laws require councils to show you information; they don’t automatically grant you a license to put it on a website or in an app.

That’s governed by a completely separate piece of legislation called the Re-use of Public Sector Information Regulations 2015 (RPSI). And Kamran, unlike most requesters, knew this.

Six days after receiving the bin data, he sent a second request, this time under RPSI, asking for explicit permission to reuse the dataset under the Open Government Licence v3.0. This is the standard open license that allows anyone (commercial or non-commercial) to copy, publish, and adapt public sector information as long as they attribute the source.

His request was thorough. He explained exactly what he planned to do with the data (convert coordinates to JSON for a map interface), confirmed he wouldn’t republish the raw spreadsheet, asked about attribution requirements, and even preemptively addressed potential complications around Ordnance Survey-derived coordinates that might have licensing restrictions.

It’s the kind of request that shows someone’s done their homework.

Then he waited.

Twenty working days came and went

RPSI has a statutory response deadline of 20 working days. By early November, nearly two months had passed with no reply.

Kamran followed up. Politely, professionally, asking the council to confirm either that reuse was granted under OGL or that they’d be issuing a formal refusal notice with reasons and appeal rights.

On 18 November, the council finally responded:

“Yes you can re-use as long as all personal and sensitive data is redacted.”

Which is… not really an answer.

What’s actually wrong here

The bin locations aren’t personal data.
We’re talking about geographic coordinates for public infrastructure, “outside the library on Market Street” or “junction of X Road and Y Avenue.” There’s no personal data in this dataset. Nothing to redact. The council appears to have applied a reflex data protection concern that doesn’t apply.

“Yes you can re-use” isn’t a license.
RPSI requires councils to issue clear licensing terms. What’s the attribution wording? Can the data be adapted? Can it be used commercially? Can derivatives be created? None of this is answered. Kamran still doesn’t have the legal certainty he needs to publish his app without potential IP infringement concerns.

They didn’t address third-party licensing.
Kamran specifically asked about Ordnance Survey-derived content. If the council’s bin coordinates were generated using OS products, there might be licensing restrictions. The council didn’t engage with this question at all.

They’re two months late on a statutory deadline.
RPSI responses are supposed to take 20 working days. This took nearly twice that, and only after Kamran chased.

Why this matters more than one bin app

This isn’t about one frustrated requester or one council getting RPSI wrong. It’s a pattern.

Local authorities hold enormous amounts of geographic data that would be valuable to residents: bin locations, grit bins, dog waste bins, EV charging points, accessible parking bays, public toilets, dropped kerbs, street furniture. Most of this should be open by default—it’s factual information about public infrastructure that everyone’s already entitled to see.

But there’s a persistent gap between:

  • Disclosed (handed over in response to individual FOI requests)
  • Published (proactively made available on the council website)
  • Licensed for reuse (explicitly cleared for republication and adaptation)

Kamran’s trying to bridge that gap. He wants to take public data about public infrastructure and make it useful to the public. This is civic technology in its purest form—not a startup looking for VC funding, just someone trying to solve a small problem in their community.

And he’s being stalled because the council doesn’t quite know how to handle RPSI requests, even though the regulations have been in force for nearly a decade.

The bigger open data problem

The UK has committed to open government principles through multiple international partnerships. We have strong transparency laws (FOI/EIR), good reuse regulations (RPSI), and a solid open license ready to go (OGLv3). The legal framework exists.

What’s missing is consistent implementation at the local authority level.

Most council FOI officers are excellent at processing disclosure requests. They understand FOI exemptions, they know when to apply EIR instead of FOIA, they handle sensitive requests with appropriate care.

But RPSI sits at the intersection of FOI law, copyright law, data protection law, and third-party licensing agreements. It’s a completely different skill set. The result is often well-meaning but legally incomplete responses like North Warwickshire’s, officers trying to be helpful but not quite sure what’s being asked or what answer to give.

There’s no malice here. Just under-resourced teams dealing with legislation they rarely encounter, trying to do their best without clear guidance.

What should happen

For this specific request:
The council should issue a proper RPSI decision. Something like: “We license this dataset under OGLv3. Attribution should be: ‘© North Warwickshire Borough Council 2025. Licensed under the Open Government Licence v3.0.’ No charges apply. The coordinates are derived from [source], which we believe falls within permitted reuse under OS presumption to publish guidance.”

That gives Kamran legal certainty and takes five minutes to write.

For councils generally:
Someone needs to write template RPSI responses for common datasets. Bin locations, licensing registers, planning applications—anything routinely disclosed under FOI/EIR that has obvious reuse value. Templates would save officer time and give consistent answers.

The LGA or ICO could do this. It shouldn’t fall to individual councils to reinvent the wheel every time.

For requesters:
If you’re planning to republish FOI/EIR data in any public-facing format—website, app, social media, research paper, send a separate RPSI request. Spell out exactly what license you need (usually OGLv3). Mention your intended use. Flag any potential third-party IP issues upfront.

And be prepared to wait longer than you should, then follow up politely when the deadline passes.

Where this leaves Kamran

As of publication, he’s still waiting for proper licensing terms that would let him publish his bin-finding app without legal risk.

The council has given him de facto permission (“yes you can re-use”) but hasn’t provided the license terms that would make that permission legally meaningful. He’s in limbo—he has the data, he has vague approval, but he doesn’t have the paperwork that protects him if someone later claims IP infringement.

Which means a straightforward civic project that would take a weekend to build is now stalled in its third month over licensing bureaucracy.

Meanwhile, people in North Warwickshire are still wandering around looking for bins.


Kamran’s full request thread is available on WhatDoTheyKnow.

AI agents will save the NHS £75 million (says company selling AI agents)

OneAdvanced announced this month that their AI agents could save the NHS £75 million annually and free up the equivalent of 150,000 additional appointments per week. The Clinical Coding Agent and Clinical Summarisation Agent automate paperwork in GP surgeries—suggesting SNOMED codes and extracting key information from clinical documents so GPs can spend less time on admin.

Which sounds brilliant. Genuinely useful, even. Automating the tedious parts of clinical documentation so doctors can see patients is exactly the kind of thing AI should be doing.

Except OneAdvanced is the company that was fined £3.07 million in March 2025 for security failings after hackers accessed their systems in 2022 via a customer account that didn’t have multi-factor authentication enabled. That breach compromised personal information for 79,404 people, including how to gain entry into the homes of 890 people receiving care at home. It disrupted NHS 111 services. It was a disaster.

And now they’re back, selling AI automation tools that will process clinical documents at scale across GP practices.

The Cybersecurity Bill that took a patient death to pass

The UK government introduced the Cyber Security and Resilience Bill to Parliament this week. It’s designed to stop attacks like the Synnovis ransomware incident in June 2024, which cancelled 11,000 NHS appointments in London and contributed to at least one patient death.

Which is all very commendable, except this bill has been in development since at least 2022. It was announced in the King’s Speech back in July 2024. The details were published in April 2025. And it won’t actually come into force until 2027.

So we’ve got a roughly five-year gap between “we should probably do something about this” and “organisations must actually comply with these rules.” During which time: patient death. Ministry of Defence payroll breach. Jaguar Land Rover shutdown. Synnovis is still contacting NHS trusts about stolen patient data right now.

MEPs Vote to Put People First in the Fight Against Biometric Mass Surveillance

The European Parliament has voted to put people first in the AI Act, a historic step towards protecting fundamental rights against harmful AI systems.
This article delves into the implications of the Parliament’s decision and what it means for the future of AI regulation in the EU.

Read More

The Online Safety Bill: Critical Questions That Must Be Answered Before It’s Too Late

The proposed UK Online Safety Bill has significant implications for individuals, businesses, and companies. The current political climate has created a precarious situation, there are important questions that need answers, particularly for businesses and individuals impacted by the Bill.

Read More