M4RKYU.SYSEdition 2027
Skip to content
LOCEN/Ontario · CA/▸logs · the true cost of poor data quality why it matters and how to improve it 2epiStandbyOK/--:--:--EST
M4M4RK_YUportfolio
  • BuildBuild
    BuildOverview
    • WorkSelected case studies and write-ups
    • GamesPlayable prototypes and game-dev logs
  • GalleryGallery
    GalleryOverview
    • PhotosPhoto collections and visual experiments
    • ShopPrints, posters, and one-off objects
  • WritingWriting
    WritingOverview
    • BlogLong-form devlogs and field notes
    • NotesShort observations, links, snippets
  • ResourcesResources
    ResourcesOverview
    • Tools38 in-browser developer utilities
    • LinksDaily-use dev and design bookmarks
  • AboutAbout
  • ContactContact
中文

syndicated · dev.to / @markyu

Bad Data Quality Costs More Than a Slow Query

A practical data quality guide for engineers: validation, ownership, schema drift, observability, and fixing bad data before dashboards lie.

Published
Oct 16 '24
·
Reading time
2 min read
·
Reactions
5
databackenddatabaseengineering
View on dev.to

On this page

  • What Bad Data Looks Like
  • Add Validation at the Boundary
  • Add Constraints in the Database
  • Track Data Quality Like Production Health
  • Ownership Matters
  • Why This Matters More With AI
  • Final Thought

Bad data usually does not explode.

It quietly poisons reports, recommendations, billing, search, and AI features until nobody trusts the system.

That loss of trust is expensive.

What Bad Data Looks Like

Not all bad data is obviously broken.

ProblemExample
Missing valueemail = null for active users
Invalid formatphone number stored three different ways
Duplicate entitysame customer created twice
Stale valuesubscription says active after cancellation
Conflicting sourceCRM and billing disagree
Schema driftevent payload changes without warning

The worst bugs are the ones that still produce a dashboard.

Add Validation at the Boundary

Do not let obviously invalid data enter the system.

import { z } from "zod";

const SignupSchema = z.object({
  email: z.string().email(),
  plan: z.enum(["free", "pro", "team"]),
});

const input = SignupSchema.parse(request.body);

This is cheaper than cleaning the warehouse later.

Add Constraints in the Database

App validation is not enough.

ALTER TABLE users
ADD CONSTRAINT users_email_unique UNIQUE (email);

ALTER TABLE subscriptions
ADD CONSTRAINT subscriptions_status_check
CHECK (status IN ('trial', 'active', 'canceled'));

The database should reject impossible states.

Track Data Quality Like Production Health

Useful checks:

SELECT COUNT(*) FROM users WHERE email IS NULL;

SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

SELECT COUNT(*)
FROM subscriptions
WHERE status = 'active'
  AND canceled_at IS NOT NULL;

These are not glamorous, but they catch real problems.

Ownership Matters

Every important field needs an owner.

If nobody owns customer_status, nobody knows whether billing, CRM, support, or the product database is allowed to change it.

Visual map:

source system -> validation -> database -> event/log -> analytics

If quality breaks at the source, downstream tools only make the wrong answer prettier.

Why This Matters More With AI

AI agents and retrieval systems make bad data more visible.

If your internal docs, tickets, metrics, and customer records are messy, an AI assistant will confidently retrieve messy context.

In 2026, data quality is not just a BI problem. It is also an AI reliability problem.

Final Thought

Poor data quality is engineering debt with a business disguise.

Fix it close to where the data enters the system, give fields clear ownership, and monitor quality like you monitor latency.

What data quality issue created the most confusion in one of your projects?

Related reading

RedisJSON Is Useful When You Update Parts of a Document

A practical RedisJSON walkthrough: when to use it, when not to, and the commands that actually matter.

redis

Database Table Design Starts With the Queries You Need

A practical database table design guide focused on queries, keys, indexes, normalization, constraints, and production tradeoffs.

database

Debug a Slow MySQL Query Before You Guess at Indexes

A practical MySQL workflow for finding slow queries, reading EXPLAIN output, and deciding whether an index actually helps.

mysql

originally published

This post first ran on dev.to. Comments and reactions live there.

Continue on dev.to
PreviousNetwork Address Calculation: The Subnet Math That MattersA practical subnetting guide showing how to calculate a network address from an IP address and mask using binary math and simple examples.
Back to all posts
NextNext.js Images Without CLS: My LQIP Blur-Up SetupA practical Next.js image optimization guide for zero CLS layouts, blur placeholders, dimensions, remote images, and production image hygiene.
Back to archive
M4RKYUM4RKYUM4RKYUM4RKYUM4RKYUM4RKYUM4RKYUM4RKYU
Crafted since 2024
ZhenXiao Mark YuZhenXiao Mark Yu
get in touch

Saw something here?Tell me about it.

It's a portfolio, not a service · but I read every note — drop a line if anything here resonated, or just to say hi.

Start a conversation
open channel

say hi anytime · 2026

--:--:--ESTOntario, Canada
  • Email
  • GitHub
  • dev.to
  • LinkedIn
  • Twitter / X
  • Instagram
  • Facebook
  • YouTube
  • CodePen
  • Spotify
  • Snapchat

Newsletter

Get the occasional dispatch

Notes and logs from m4rkyu.com — short, dated, no noise. Unsubscribe anytime.

Work

Production builds, games, and visual archives.

  • Projects
  • Games
  • Archive
  • Logs

Resources

Daily-use tools and a personal link library.

  • Search
  • Latest
  • Tools
  • Links
  • Notes
  • Topics
  • Shop
RSSJSON feed

Studio

Background, contact, and channels for collaboration.

  • About
  • Contact
  • Changelog
  • Colophon
  • Resumepending

Socials

Find me on the usual feeds.

  • GitHub
  • dev.to
  • LinkedIn
  • Twitter / X
  • Instagram
  • Facebook
  • YouTube
  • CodePen
  • Spotify
  • Snapchat
  • Email
© 2026 ZhenXiao Mark Yumarkyu0615@gmail.com
  • Email
  • GitHub
  • dev.to
  • LinkedIn
  • Twitter / X
  • Instagram
  • Facebook
  • YouTube
  • CodePen
  • Spotify
  • Snapchat
PrivacyTermsBuilt with Next.js 16 · React 19 · Tailwind 4