Most content schemas start with good intentions and end up as archaeology. The original structure made sense for the first dozen articles. Then exceptions crept in. Fields got added without removing old ones. Two articles use the same field for different purposes. The schema that was supposed to be the system became the thing you work around.

A clean schema does not happen by accident. It comes from a few deliberate decisions made early — and maintained consistently.

What every content record needs

At minimum, every content record in a structured publishing system should have:

  • A unique slug. Lowercase, hyphenated, stable. The slug is the identity of the record. Changing it breaks links.
  • A canonical URL. Explicit. Not derived at render time. The URL that all other references point to.
  • A status field. Published, draft, or archived. Only published records should appear in routing and sitemaps.
  • Timestamps. At minimum publish_date and updated_at. Used for sitemaps, sort orders, and freshness signals.
  • A template field. Which template renders this record. Makes rendering logic explicit rather than inferred from context.

What article records specifically need

Beyond the base fields, an article record should declare:

  • Category membership. A primary category path and optional secondary paths. The primary path drives breadcrumbs and routing. Secondary paths support cross-listing.
  • An excerpt. One to three sentences. Used in list views, sidebars, and machine summaries. Write it yourself — do not let it be auto-generated from the first paragraph.
  • Tags. Flat list. Lowercase, hyphenated. For cross-linking and filtering, not for replacing category structure.
  • Related content. An explicit list of related article slugs. Supplement with algorithmic matching, but give the system explicit relationships to start from.
  • SEO fields. Title and description as a nested object. Separate from the display title so you can optimise them independently.

What to avoid

A few patterns consistently create problems:

Derived fields stored as data. If a field can be calculated from other fields, do not store it. Store the source data and calculate at render time. Stored derived fields go stale.

Inconsistent field naming. If some records use publish_date and others use date or published_at, every consumer of the data has to handle all three. Pick one and enforce it.

Deeply nested structures for things that are actually flat. Tags do not need to be objects with IDs and labels — a flat array of strings is enough. Save nesting for things that genuinely require it.

Optional fields that become required in practice. If every article needs an excerpt to render correctly, the excerpt field should be required in your validation script, not optional in your schema.

The test of a good schema

Give a new AI coding assistant a single JSON record and ask it to create another one in the same format. If it can do that without looking at any other documentation, your schema is clear enough. If it needs to ask clarifying questions, your schema has ambiguity worth resolving.

That is not a trick. That is the practical standard for a schema that will hold up over time.