Managing 130+ Blog Posts with a Separate Content Repository

When we started CODERCOPS, our blog had 12 posts. They lived in the same repository as the website code. Life was simple.

Today we have 130+ blog posts, 11 project case studies, and 5 team member profiles. The single-repo approach broke down around post 40. Content commits polluted the Git history, merge conflicts became frequent, and non-technical team members needed access to content files without risking changes to production code.

This post documents how we solved the problem by splitting content into a separate repository, connected via Git submodules. It is a pattern we now recommend to any content-heavy site.

Content repository architecture Separating content from code is one of the best architectural decisions we have made

The Problem with a Single Repository

What Broke at 40 Posts

Issue	Impact
Git history noise	Content commits outnumbered code commits 5:1. Finding code changes in `git log` required filtering.
Merge conflicts	Two people editing different blog posts on different branches still conflicted on index files.
CI/CD waste	Every typo fix in a blog post triggered a full site rebuild with all linting, testing, and deployment steps.
Access control	We wanted content editors to push changes without access to API keys, deployment configs, or source code.
Repository size	130+ MDX files with image references made the repo clone time noticeable.

The tipping point was when a content update accidentally triggered a deployment that failed because of an unrelated linting issue in the codebase. A blog typo fix should never be blocked by a code problem.

The Two-Repository Architecture

Repository Structure

codercops-agency-website/          (Code repo)
├── src/
│   ├── pages/
│   ├── components/
│   ├── layouts/
│   └── styles/
├── public/
├── codercops-agency-content/      (Git submodule → Content repo)
│   ├── blog/
│   │   ├── post-one.mdx
│   │   ├── post-two.mdx
│   │   └── ... (130+ files)
│   ├── projects/
│   │   ├── the-venting-spot.md
│   │   ├── colleatz.md
│   │   └── ... (11 files)
│   └── team/
│       ├── anurag-verma.md
│       └── ... (5 files)
├── astro.config.mjs
└── package.json

codercops-agency-content/          (Content repo)
├── blog/
├── projects/
├── team/
├── .github/
│   └── workflows/
│       └── validate.yml
└── README.md

The content repo is included in the website repo as a Git submodule. This means:

The website repo points to a specific commit of the content repo
Content editors work in the content repo independently
The website repo updates its submodule reference when content is ready to deploy
Both repos have their own CI/CD pipelines

Why Git Submodules (Not a CMS)

We evaluated several alternatives before settling on submodules:

Approach	Pros	Cons	Why We Rejected
Headless CMS (Sanity, Strapi)	Visual editor, media management	Runtime dependency, API costs, vendor lock-in	We write in MDX with code blocks and tables. No CMS handles this well.
Git monorepo with CODEOWNERS	Simple setup	Still one repo, still polluted history	Does not solve the core problem.
npm package for content	Clean separation, versioning	Overhead of publishing, slower iteration	Too heavy for content that changes daily.
Git submodule	Clean separation, familiar Git workflow, no runtime dependency	Submodule learning curve	Best fit for our needs.
Copy files at build time (rsync)	Simple	Fragile, no version tracking	Too brittle for production.

The key insight: our content is MDX files, not database records. MDX with frontmatter, code blocks, tables, and custom components does not fit neatly into any CMS. Git is the natural version control system for text files.

The Git Submodule Setup

Initial Setup

# In the website repo
git submodule add https://github.com/codercops/codercops-agency-content.git codercops-agency-content
git commit -m "Add content as submodule"

Updating Content

# Pull latest content changes
cd codercops-agency-content
git pull origin main
cd ..
git add codercops-agency-content
git commit -m "Update content submodule"

Astro Configuration

In astro.config.mjs, we point the content collections to the submodule directory:

// astro.config.mjs
import { defineConfig } from 'astro/config';

export default defineConfig({
  // Content collections read from the submodule
  // Astro's content directory is configured to include
  // the submodule path
});

The content collections in Astro 5 handle the MDX parsing, frontmatter validation, and type safety. The submodule is transparent to Astro — it just sees a directory of MDX files.

Content Validation with GitHub Actions

The content repo has its own validation pipeline that runs on every push:

# .github/workflows/validate.yml
name: Validate Content
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Check frontmatter
        run: |
          # Validate required frontmatter fields
          for file in blog/*.mdx; do
            if ! grep -q "^title:" "$file"; then
              echo "Missing title in $file"
              exit 1
            fi
            if ! grep -q "^pubDate:" "$file"; then
              echo "Missing pubDate in $file"
              exit 1
            fi
            if ! grep -q "^author:" "$file"; then
              echo "Missing author in $file"
              exit 1
            fi
          done

      - name: Check for broken image references
        run: |
          # Ensure all image URLs are valid
          grep -rh "!\[" blog/ | grep -oP 'https?://[^\s)]+' | while read url; do
            status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
            if [ "$status" -ne 200 ]; then
              echo "Broken image: $url (status: $status)"
            fi
          done

      - name: Lint markdown
        uses: DavidAnson/markdownlint-cli2-action@v14
        with:
          globs: '**/*.mdx'

This catches problems before they reach the website build:

Missing required frontmatter fields (title, pubDate, author)
Broken image URLs
Markdown formatting issues
Invalid date formats

The Content Workflow

For Blog Posts

Writer creates post
    ↓
Push to content repo (branch)
    ↓
GitHub Actions validates frontmatter and formatting
    ↓
Pull request reviewed
    ↓
Merge to main
    ↓
Website repo updates submodule reference
    ↓
Vercel deploys updated site

For Project Case Studies

Project files follow the same flow but with additional frontmatter fields (client, tech stack, timeline, live URL).

Deployment Trigger

We have two deployment paths:

Code changes trigger a full build from the website repo
Content changes trigger a submodule update in the website repo, which triggers a Vercel deployment

This means a blog post typo fix deploys in under 2 minutes without touching any code.

What We Learned After 130+ Posts

File Naming Convention Matters

We settled on this pattern: {slug}-{year}.mdx

ai-chatbot-build-vs-buy-cost-breakdown-2026.mdx
vibe-coding-revolution-2026.mdx
web-development-costs-pricing-guide-2026.mdx

The year suffix prevents slug collisions when we update topics annually. "web-development-trends" could exist for 2025 and 2026 without conflict.

Frontmatter Schema Enforcement

Every blog post requires this frontmatter:

---
title: "String"           # Required
description: "String"     # Required, used for meta description and OG
pubDate: YYYY-MM-DD       # Required, ISO date
author: "String"          # Required
image: "URL"              # Required, Unsplash or custom
tags: ["Array"]           # Required, 3-6 tags
category: "String"        # Required (Web Development, AI Integration, etc.)
subcategory: "String"     # Required (Guide, Tutorial, News, etc.)
featured: boolean         # Required (true shows on homepage)
draft: boolean            # Optional (true hides from production)
---

Astro 5's content collections enforce this schema at build time. If a field is missing or the wrong type, the build fails with a clear error message.

Content Organization by Topic, Not Date

We organize by topic in the file system, not by date:

blog/
├── ai-chatbot-build-vs-buy-2026.mdx
├── ai-powered-web-development-2026.mdx
├── astro-5-agency-websites-2026.mdx
├── ... (alphabetical by slug)

Date-based folders (2026/03/post.mdx) create unnecessary nesting and make it harder to find posts. The pubDate frontmatter field handles chronological sorting at the application level.

Image Strategy

We use Unsplash URLs with dimension parameters instead of storing images in the repo:

![Alt text](https://images.unsplash.com/photo-123?w=800&h=400&fit=crop)

Benefits:

Zero image storage in the repo
Unsplash CDN handles optimization and delivery
Consistent dimensions via URL parameters
No build-time image processing needed

For project screenshots and custom graphics, we use a separate asset hosting approach.

Performance Numbers

Metric	Single Repo (Before)	Two Repos (After)
Clone time	45 seconds	12 seconds (code) + 8 seconds (content)
Average build time	3.2 minutes	2.1 minutes
Content-only deploy	3.2 minutes (full build)	1.8 minutes
Git log noise	80% content commits	Clean separation
Merge conflicts	Weekly	Rare

The biggest win is not speed — it is cognitive clarity. When we look at the website repo's Git history, we see code changes. When we look at the content repo, we see content changes. Each history tells a coherent story.

When This Pattern Makes Sense

Use a Separate Content Repo When:

You have 20+ content files and growing
Multiple people contribute content
Content updates are more frequent than code changes
You want content validation independent of code builds
Your content is text-based (Markdown, MDX, YAML) not database records

Stick with a Single Repo When:

You have fewer than 20 content files
Only developers touch content
Content and code change at the same rate
You are using a headless CMS anyway

Use a Headless CMS When:

Non-technical users need a visual editor
Your content includes complex media management
You need role-based content workflows (draft, review, publish)
Content does not include code blocks or technical formatting

Common Pitfalls

Forgetting to update the submodule. New content sits in the content repo but the website still points to the old commit. Solution: automate submodule updates via webhook or scheduled CI job.
Submodule confusion for new developers. git clone does not automatically initialize submodules. New team members need to run git submodule update --init. We document this in the README.
Branch divergence. If the content repo has branches, the website repo needs to know which branch to track. We keep it simple: always track main.
Build failures from content changes. Even with validation in the content repo, a valid MDX file might break the Astro build (e.g., using an undefined component). We run a test build in the website repo's CI before deploying.

The Numbers

Our content repo today:

Metric	Count
Blog posts	130+
Project case studies	11
Team profiles	5
Total MDX/MD files	146+
Average post length	1,800 words
Total content	~260,000 words
Repo size	2.1 MB (text only, no images)

At 2.1 MB for 260,000 words of content, Git handles this effortlessly. We could 10x the content volume before needing to reconsider the architecture.

Running a content-heavy site and struggling with repository management? We have built this pattern for ourselves and for clients. Talk to us about your content architecture.

Managing 130+ Blog Posts with a Separate Content Repository

The Problem with a Single Repository

What Broke at 40 Posts

The Two-Repository Architecture

Repository Structure

Why Git Submodules (Not a CMS)

The Git Submodule Setup

Initial Setup

Updating Content

Astro Configuration

Content Validation with GitHub Actions

The Content Workflow

For Blog Posts

For Project Case Studies

Deployment Trigger

What We Learned After 130+ Posts

File Naming Convention Matters

Frontmatter Schema Enforcement

Content Organization by Topic, Not Date

Image Strategy

Performance Numbers

When This Pattern Makes Sense

Use a Separate Content Repo When:

Stick with a Single Repo When:

Use a Headless CMS When:

Common Pitfalls

The Numbers

Comments

On this page

The Problem with a Single Repository

What Broke at 40 Posts

The Two-Repository Architecture

Repository Structure

Why Git Submodules (Not a CMS)

The Git Submodule Setup

Initial Setup

Updating Content

Astro Configuration

Content Validation with GitHub Actions

The Content Workflow

For Blog Posts

For Project Case Studies

Deployment Trigger

What We Learned After 130+ Posts

File Naming Convention Matters

Frontmatter Schema Enforcement

Content Organization by Topic, Not Date

Image Strategy

Performance Numbers

When This Pattern Makes Sense

Use a Separate Content Repo When:

Stick with a Single Repo When:

Use a Headless CMS When:

Common Pitfalls

The Numbers

Comments

Related Posts More from Web Development

Running a Tech Studio from India for Global Clients — What They Don't Tell You

How We Built a High-Performance Food Delivery Platform — The Colleatz Case Study

Decentralized IP Protection on Blockchain — Lessons from Building Lore Web3

On this page