M4RKYU.SYSEdition 2027
Skip to content
LOCZH/安大略 · 加拿大/▸logs · the true cost of poor data quality why it matters and how to improve it 2epi待机OK/--:--:--EST
M4M4RK_YUportfolio
  • 创作创作
    创作Overview
    • 作品精选案例与项目记录
    • 游戏可玩原型与游戏开发日志
  • 影像影像
    影像Overview
    • 照片影像合集与视觉实验
    • 商店印刷品、海报和限量物件
  • 写作写作
    写作Overview
    • 博客长篇开发日志与现场笔记
    • 笔记短观察、链接与代码片段
  • 资源资源
    资源Overview
    • 工具38 款浏览器内开发工具
    • 链接每日使用的开发与设计书签
  • 关于关于
  • 联系联系
EN

同步 · dev.to / @markyu

Bad Data Quality Costs More Than a Slow Query

A practical data quality guide for engineers: validation, ownership, schema drift, observability, and fixing bad data before dashboards lie.

发布日期
Oct 16 '24
·
阅读时长
2 min read
·
点赞
5
databackenddatabaseengineering
在 dev.to 查看

本页目录

  • What Bad Data Looks Like
  • Add Validation at the Boundary
  • Add Constraints in the Database
  • Track Data Quality Like Production Health
  • Ownership Matters
  • Why This Matters More With AI
  • Final Thought

Bad data usually does not explode.

It quietly poisons reports, recommendations, billing, search, and AI features until nobody trusts the system.

That loss of trust is expensive.

What Bad Data Looks Like

Not all bad data is obviously broken.

ProblemExample
Missing valueemail = null for active users
Invalid formatphone number stored three different ways
Duplicate entitysame customer created twice
Stale valuesubscription says active after cancellation
Conflicting sourceCRM and billing disagree
Schema driftevent payload changes without warning

The worst bugs are the ones that still produce a dashboard.

Add Validation at the Boundary

Do not let obviously invalid data enter the system.

import { z } from "zod";

const SignupSchema = z.object({
  email: z.string().email(),
  plan: z.enum(["free", "pro", "team"]),
});

const input = SignupSchema.parse(request.body);

This is cheaper than cleaning the warehouse later.

Add Constraints in the Database

App validation is not enough.

ALTER TABLE users
ADD CONSTRAINT users_email_unique UNIQUE (email);

ALTER TABLE subscriptions
ADD CONSTRAINT subscriptions_status_check
CHECK (status IN ('trial', 'active', 'canceled'));

The database should reject impossible states.

Track Data Quality Like Production Health

Useful checks:

SELECT COUNT(*) FROM users WHERE email IS NULL;

SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

SELECT COUNT(*)
FROM subscriptions
WHERE status = 'active'
  AND canceled_at IS NOT NULL;

These are not glamorous, but they catch real problems.

Ownership Matters

Every important field needs an owner.

If nobody owns customer_status, nobody knows whether billing, CRM, support, or the product database is allowed to change it.

Visual map:

source system -> validation -> database -> event/log -> analytics

If quality breaks at the source, downstream tools only make the wrong answer prettier.

Why This Matters More With AI

AI agents and retrieval systems make bad data more visible.

If your internal docs, tickets, metrics, and customer records are messy, an AI assistant will confidently retrieve messy context.

In 2026, data quality is not just a BI problem. It is also an AI reliability problem.

Final Thought

Poor data quality is engineering debt with a business disguise.

Fix it close to where the data enters the system, give fields clear ownership, and monitor quality like you monitor latency.

What data quality issue created the most confusion in one of your projects?

相关阅读

RedisJSON Is Useful When You Update Parts of a Document

A practical RedisJSON walkthrough: when to use it, when not to, and the commands that actually matter.

redis

Database Table Design Starts With the Queries You Need

A practical database table design guide focused on queries, keys, indexes, normalization, constraints, and production tradeoffs.

database

Debug a Slow MySQL Query Before You Guess at Indexes

A practical MySQL workflow for finding slow queries, reading EXPLAIN output, and deciding whether an index actually helps.

mysql

原文发布

本文首发于 dev.to,评论与点赞保留在原站。

在 dev.to 继续阅读
上一篇Network Address Calculation: The Subnet Math That MattersA practical subnetting guide showing how to calculate a network address from an IP address and mask using binary math and simple examples.
返回全部文章
下一篇Next.js Images Without CLS: My LQIP Blur-Up SetupA practical Next.js image optimization guide for zero CLS layouts, blur placeholders, dimensions, remote images, and production image hygiene.
返回档案
M4RKYUM4RKYUM4RKYUM4RKYUM4RKYUM4RKYUM4RKYUM4RKYU
始于 2024
ZhenXiao Mark YuZhenXiao Mark Yu
联系

看到什么有意思的?和我聊聊。

这是一个作品集,不是服务 · 但每一条留言我都会看 — 如果哪里让你有所触动,或者只想打个招呼,欢迎写信过来。

开启对话
频道开放

随时打个招呼 · 2026

--:--:--EST加拿大 安大略
  • 邮件
  • GitHub
  • dev.to
  • 领英
  • 推特 / X
  • Instagram
  • Facebook
  • YouTube
  • CodePen
  • Spotify
  • Snapchat

订阅

偶尔收到一封简讯

来自 m4rkyu.com 的笔记与日志——简短、标注日期、没有杂音。随时可退订。

作品

线上发布、游戏作品与视觉档案。

  • 项目
  • 游戏
  • 档案
  • 日志

资源

每日好用的工具与个人收藏的链接库。

  • 搜索
  • 最新
  • 工具
  • 链接
  • 笔记
  • 主题
  • 商店
RSSJSON Feed

工作室

背景、联系方式以及合作渠道。

  • 关于
  • 联系
  • 更新日志
  • 技术说明
  • 简历筹备中

社交

在常去的平台上找到我。

  • GitHub
  • dev.to
  • 领英
  • 推特 / X
  • Instagram
  • Facebook
  • YouTube
  • CodePen
  • Spotify
  • Snapchat
  • 邮件
© 2026 ZhenXiao Mark Yumarkyu0615@gmail.com
  • 邮件
  • GitHub
  • dev.to
  • 领英
  • 推特 / X
  • Instagram
  • Facebook
  • YouTube
  • CodePen
  • Spotify
  • Snapchat
隐私条款由 Next.js 16 · React 19 · Tailwind 4 构建