When we started CODERCOPS, our blog had 12 posts. They lived in the same repository as the website code. Life was simple.
Today we have 130+ blog posts, 11 project case studies, and 5 team member profiles. The single-repo approach broke down around post 40. Content commits polluted the Git history, merge conflicts became frequent, and non-technical team members needed access to content files without risking changes to production code.
This post documents how we solved the problem by splitting content into a separate repository, connected via Git submodules. It is a pattern we now recommend to any content-heavy site.
Separating content from code is one of the best architectural decisions we have made
The Problem with a Single Repository
What Broke at 40 Posts
| Issue | Impact |
|---|---|
| Git history noise | Content commits outnumbered code commits 5:1. Finding code changes in git log required filtering. |
| Merge conflicts | Two people editing different blog posts on different branches still conflicted on index files. |
| CI/CD waste | Every typo fix in a blog post triggered a full site rebuild with all linting, testing, and deployment steps. |
| Access control | We wanted content editors to push changes without access to API keys, deployment configs, or source code. |
| Repository size | 130+ MDX files with image references made the repo clone time noticeable. |
The tipping point was when a content update accidentally triggered a deployment that failed because of an unrelated linting issue in the codebase. A blog typo fix should never be blocked by a code problem.
The Two-Repository Architecture
Repository Structure
codercops-agency-website/ (Code repo)
├── src/
│ ├── pages/
│ ├── components/
│ ├── layouts/
│ └── styles/
├── public/
├── codercops-agency-content/ (Git submodule → Content repo)
│ ├── blog/
│ │ ├── post-one.mdx
│ │ ├── post-two.mdx
│ │ └── ... (130+ files)
│ ├── projects/
│ │ ├── the-venting-spot.md
│ │ ├── colleatz.md
│ │ └── ... (11 files)
│ └── team/
│ ├── anurag-verma.md
│ └── ... (5 files)
├── astro.config.mjs
└── package.json
codercops-agency-content/ (Content repo)
├── blog/
├── projects/
├── team/
├── .github/
│ └── workflows/
│ └── validate.yml
└── README.mdThe content repo is included in the website repo as a Git submodule. This means:
- The website repo points to a specific commit of the content repo
- Content editors work in the content repo independently
- The website repo updates its submodule reference when content is ready to deploy
- Both repos have their own CI/CD pipelines
Why Git Submodules (Not a CMS)
We evaluated several alternatives before settling on submodules:
| Approach | Pros | Cons | Why We Rejected |
|---|---|---|---|
| Headless CMS (Sanity, Strapi) | Visual editor, media management | Runtime dependency, API costs, vendor lock-in | We write in MDX with code blocks and tables. No CMS handles this well. |
| Git monorepo with CODEOWNERS | Simple setup | Still one repo, still polluted history | Does not solve the core problem. |
| npm package for content | Clean separation, versioning | Overhead of publishing, slower iteration | Too heavy for content that changes daily. |
| Git submodule | Clean separation, familiar Git workflow, no runtime dependency | Submodule learning curve | Best fit for our needs. |
| Copy files at build time (rsync) | Simple | Fragile, no version tracking | Too brittle for production. |
The key insight: our content is MDX files, not database records. MDX with frontmatter, code blocks, tables, and custom components does not fit neatly into any CMS. Git is the natural version control system for text files.
The Git Submodule Setup
Initial Setup
# In the website repo
git submodule add https://github.com/codercops/codercops-agency-content.git codercops-agency-content
git commit -m "Add content as submodule"Updating Content
# Pull latest content changes
cd codercops-agency-content
git pull origin main
cd ..
git add codercops-agency-content
git commit -m "Update content submodule"Astro Configuration
In astro.config.mjs, we point the content collections to the submodule directory:
// astro.config.mjs
import { defineConfig } from 'astro/config';
export default defineConfig({
// Content collections read from the submodule
// Astro's content directory is configured to include
// the submodule path
});The content collections in Astro 5 handle the MDX parsing, frontmatter validation, and type safety. The submodule is transparent to Astro — it just sees a directory of MDX files.
Content Validation with GitHub Actions
The content repo has its own validation pipeline that runs on every push:
# .github/workflows/validate.yml
name: Validate Content
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check frontmatter
run: |
# Validate required frontmatter fields
for file in blog/*.mdx; do
if ! grep -q "^title:" "$file"; then
echo "Missing title in $file"
exit 1
fi
if ! grep -q "^pubDate:" "$file"; then
echo "Missing pubDate in $file"
exit 1
fi
if ! grep -q "^author:" "$file"; then
echo "Missing author in $file"
exit 1
fi
done
- name: Check for broken image references
run: |
# Ensure all image URLs are valid
grep -rh "!\[" blog/ | grep -oP 'https?://[^\s)]+' | while read url; do
status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
if [ "$status" -ne 200 ]; then
echo "Broken image: $url (status: $status)"
fi
done
- name: Lint markdown
uses: DavidAnson/markdownlint-cli2-action@v14
with:
globs: '**/*.mdx'This catches problems before they reach the website build:
- Missing required frontmatter fields (title, pubDate, author)
- Broken image URLs
- Markdown formatting issues
- Invalid date formats
The Content Workflow
For Blog Posts
Writer creates post
↓
Push to content repo (branch)
↓
GitHub Actions validates frontmatter and formatting
↓
Pull request reviewed
↓
Merge to main
↓
Website repo updates submodule reference
↓
Vercel deploys updated siteFor Project Case Studies
Project files follow the same flow but with additional frontmatter fields (client, tech stack, timeline, live URL).
Deployment Trigger
We have two deployment paths:
- Code changes trigger a full build from the website repo
- Content changes trigger a submodule update in the website repo, which triggers a Vercel deployment
This means a blog post typo fix deploys in under 2 minutes without touching any code.
What We Learned After 130+ Posts
File Naming Convention Matters
We settled on this pattern: {slug}-{year}.mdx
ai-chatbot-build-vs-buy-cost-breakdown-2026.mdx
vibe-coding-revolution-2026.mdx
web-development-costs-pricing-guide-2026.mdxThe year suffix prevents slug collisions when we update topics annually. "web-development-trends" could exist for 2025 and 2026 without conflict.
Frontmatter Schema Enforcement
Every blog post requires this frontmatter:
---
title: "String" # Required
description: "String" # Required, used for meta description and OG
pubDate: YYYY-MM-DD # Required, ISO date
author: "String" # Required
image: "URL" # Required, Unsplash or custom
tags: ["Array"] # Required, 3-6 tags
category: "String" # Required (Web Development, AI Integration, etc.)
subcategory: "String" # Required (Guide, Tutorial, News, etc.)
featured: boolean # Required (true shows on homepage)
draft: boolean # Optional (true hides from production)
---Astro 5's content collections enforce this schema at build time. If a field is missing or the wrong type, the build fails with a clear error message.
Content Organization by Topic, Not Date
We organize by topic in the file system, not by date:
blog/
├── ai-chatbot-build-vs-buy-2026.mdx
├── ai-powered-web-development-2026.mdx
├── astro-5-agency-websites-2026.mdx
├── ... (alphabetical by slug)Date-based folders (2026/03/post.mdx) create unnecessary nesting and make it harder to find posts. The pubDate frontmatter field handles chronological sorting at the application level.
Image Strategy
We use Unsplash URLs with dimension parameters instead of storing images in the repo:
Benefits:
- Zero image storage in the repo
- Unsplash CDN handles optimization and delivery
- Consistent dimensions via URL parameters
- No build-time image processing needed
For project screenshots and custom graphics, we use a separate asset hosting approach.
Performance Numbers
| Metric | Single Repo (Before) | Two Repos (After) |
|---|---|---|
| Clone time | 45 seconds | 12 seconds (code) + 8 seconds (content) |
| Average build time | 3.2 minutes | 2.1 minutes |
| Content-only deploy | 3.2 minutes (full build) | 1.8 minutes |
| Git log noise | 80% content commits | Clean separation |
| Merge conflicts | Weekly | Rare |
The biggest win is not speed — it is cognitive clarity. When we look at the website repo's Git history, we see code changes. When we look at the content repo, we see content changes. Each history tells a coherent story.
When This Pattern Makes Sense
Use a Separate Content Repo When:
- You have 20+ content files and growing
- Multiple people contribute content
- Content updates are more frequent than code changes
- You want content validation independent of code builds
- Your content is text-based (Markdown, MDX, YAML) not database records
Stick with a Single Repo When:
- You have fewer than 20 content files
- Only developers touch content
- Content and code change at the same rate
- You are using a headless CMS anyway
Use a Headless CMS When:
- Non-technical users need a visual editor
- Your content includes complex media management
- You need role-based content workflows (draft, review, publish)
- Content does not include code blocks or technical formatting
Common Pitfalls
Forgetting to update the submodule. New content sits in the content repo but the website still points to the old commit. Solution: automate submodule updates via webhook or scheduled CI job.
Submodule confusion for new developers.
git clonedoes not automatically initialize submodules. New team members need to rungit submodule update --init. We document this in the README.Branch divergence. If the content repo has branches, the website repo needs to know which branch to track. We keep it simple: always track
main.Build failures from content changes. Even with validation in the content repo, a valid MDX file might break the Astro build (e.g., using an undefined component). We run a test build in the website repo's CI before deploying.
The Numbers
Our content repo today:
| Metric | Count |
|---|---|
| Blog posts | 130+ |
| Project case studies | 11 |
| Team profiles | 5 |
| Total MDX/MD files | 146+ |
| Average post length | 1,800 words |
| Total content | ~260,000 words |
| Repo size | 2.1 MB (text only, no images) |
At 2.1 MB for 260,000 words of content, Git handles this effortlessly. We could 10x the content volume before needing to reconsider the architecture.
Running a content-heavy site and struggling with repository management? We have built this pattern for ourselves and for clients. Talk to us about your content architecture.
Comments