Most people encounter “personal data” as a definition in GDPR training slides, a neat sentence in Article 4(1), or a diagram with arrows pointing at names, addresses, IPs, cookies, and maybe a silhouette of a person for good measure. It looks crisp on paper. It feels graspable.

Then you sit down in front of a real Subject Access Request and discover the definition is only the outline. The shape you have to fill in yourself.

In practice, deciding what is and isn’t personal data is less like checking items off a list and more like handling smoke. It gets everywhere. It curls into places you didn’t expect. And just when you think you’ve pinned it down, another strand drifts out of frame and forces you back to the definition again.

For years I treated these decisions as admin tasks. You read the request, pull the data, redact anything that belongs to someone else, and send the rest back. Straightforward. Bureaucratic. Maybe even a little boring.

Then I had to deal with a DSAR that turned the entire surface-level understanding inside out.

It was one of those requests that looks simple until you realise it isn’t. Tens of thousands of records. Old systems that predate GDPR’s fashionably neat abstractions. Internal identifiers that sit innocently in a database column but, when stitched together at scale, tell a completely different story. And an organisation with mixed instincts: good intentions, inconsistent governance muscle memory, a fondness for doing things informally because “none of this has ever been a problem before”.

If you work in information governance, you probably just felt a chill run down your spine.

That’s the bit no one tells you: the law is clean. The lived reality is messy.

So this is an attempt to bridge those worlds — not by quoting doctrine at people, but by describing the actual work of figuring out what personal data is, when it isn’t, and why you sometimes end up arguing about an ID number that looks meaningless right up until it isn’t.

The definition is simple until it isn’t

GDPR gives you two tests:

  1. Does the data relate to someone?
  2. Can the person be identified by it, directly or indirectly?

Everything else — the Recital 26 “means likely to be used”, the WP29’s four building blocks, the case law — is basically the regulatory community trying to fill in the gaps humans create by being inventive, inconsistent and context-dependent.

Take the “relates to” question. The YS, M and S case is the one everyone drags out: legal analysis about a person isn’t personal data because the document is really about the authority’s reasoning, not the applicant. Fine. Sensible.

But when you put that next to an operational system — say, a moderation log, an access audit, a workflow report — nothing slots neatly into either category. Real systems mix content, context and purpose like sediment layers.

A timestamp: is it about the person?
An action they took: yes, but does the identifier they touched then become about them too?
A record they interacted with: only sometimes.
And sometimes: absolutely not.

The law expects you to make that call, without turning everything into personal data just because a person breathed near it, and without accidentally releasing someone else’s information because you treated a linking field as meaningless.

You have to look at the substance, not the surface. It is astonishing how often that’s forgotten.

Identifiability isn’t hypothetical

One of the things GDPR got right — and which Patrick Breyer hammered home — is that identifiability is not a theoretical parlour game. It’s not “could a nation-state reidentify this with unlimited resources?” It’s: what could realistically happen, using means that are actually available?

That should be simple, but many organisations still get tangled up in the wrong direction. They tend to respond to identifiability questions by waving at how “public” the data already is, as if public equals harmless.

Public data is not safe when you rearrange it into new shapes.
Ask anyone who’s ever worked with open registers, scraped datasets or historic archives. Combinations matter. Context matters. Structure matters.

A hundred thousand identifiers sitting quietly in a relational database don’t behave the same way once they’re handed over in a single structured extract.

The identifiability shifts.
The risk shifts.
Your obligation shifts with it.

You cannot skip that step because someone insists “but she already sees this on the website”.

Mixed data isn’t exotic; it’s daily

Most guidance treats mixed data like a rare species — the kind of thing you only encounter in HR disputes, tribunal files or messy inboxes.

In reality, mixed data is the default state of almost every operational system. Especially ones built long before data protection was fashionable.

A single row can contain:

  • a timestamp relating to Person A
  • an action taken by Person B
  • a link to something about Person C
  • and a system note by Person D

If you treat all of it as personal data about A or B or C, you’ve misunderstood the test. If you treat none of it as personal data because it’s “just how the system works”, you’ve misunderstood the test in the other direction.

The balancing test in DPA 2018 for mixed data cases is famously context-heavy. What people rarely mention is the emotional reality: you will probably disappoint someone. DSARs involving mixed data are frequently exercises in expectations management, not customer service.

There’s a story nobody writes down in case studies:
The point where you’re sitting in a meeting, explaining — for the third time — that identifying who you’re releasing data about matters, not just how much they once had access to. Or that public availability is not the same thing as lawful disclosure in a structured format. Or that “trust” is not an exemption under Article 15(4).

These conversations take longer than the redaction work.

Governance is only governance if you write it down

The quickest way to undermine a compliant DSAR response is to make all your decisions in Slack threads, reference your “general sense” of past practice, and call it a day.

GDPR cares less about the outcomes we feel comfortable with and more about the processes we can demonstrate. Article 24 is quietly brutal about this: accountability is not a vibe.

What the law expects is boring:

  • a documented necessity assessment
  • a clear record of third-party considerations
  • a reasoning trail that would survive daylight
  • the alternatives you considered
  • the minimisation steps you took
  • and why you rejected or accepted them

In other words, the kind of record that means you can answer a regulator with something more substantial than: “We talked about it and we think it’s probably fine.”

The gap between those two mindsets is where most DSAR pain lives.

The weird thing you discover when you finally decide

The most surprising part of wrestling with a complex DSAR is this: the hardest decisions aren’t about what the law says. They’re about what you’re willing to stand behind.

It’s easy to talk about proportionality when everything is abstract.
It’s harder when you’re looking at a dataset that affects real people, or when senior staff want a certain outcome because it feels right, or when a volunteer questions something you’ve been arguing internally for days.

You learn very quickly that good governance isn’t about being right — it’s about being consistent. It’s about being able to say, with a straight face and a full audit trail: “We did this because the law required it, not because it was convenient.”

You also learn that “public interest” is a phrase people like to use when they simply want something to happen. And that minimisation is the principle most organisations forget exists until it becomes inconvenient.

Why this matters beyond any one case

People think DSARs are about disclosure. They’re not. They’re about the organisation’s self-understanding. They expose everything:

  • how well you know your systems
  • how shaky your access controls are
  • how much institutional memory you rely on
  • how you handle disagreement
  • how allergic your senior leadership is to slow, structured decision-making
  • how much your processes depend on personalities rather than principles

A DSAR is like turning on the lights in the server room you’ve been walking through by instinct. Suddenly you notice the tangle of cables, the unlabelled switches, the access someone still has from five years ago, and the realisation that your governance posture is basically hope with a spreadsheet.

So what counts as personal data?

After months of wrestling with it, my answer is painfully simple:

Personal data is whatever forces you to confront the fact that context is doing more work than you realised.

The law gives you the frame.
The organisation gives you the mess.
Your job is to bridge the gap without pretending one can be flattened to fit the other.

If there’s one thing I wish more organisations understood, it’s this:
DSARs are not admin.
They’re not customer service.
They’re not something you can rush out the door because the deadline is looming.

They’re governance in miniature.
And if you do them properly, they show you exactly where your weak points are — long before the ICO has to.