Claude Training Program · Week 4 · Build Track · Final Tutorial

Advanced Tool Builds &
Capstone Projects

📖 Tutorial 8 of 8 ⏱ 3–4 hours 👥 Build Track — intermediate focus 🎯 Prerequisites: Tutorials 3, 4 & 6

This is the final and most technically ambitious tutorial in the programme. You'll build three professional-grade SEO tools: a log file analyser that diagnoses crawl budget waste, a redirect chain auditor that maps and flags redirect problems programmatically, and a content brief generator that uses the Claude API itself to produce AI-powered briefs from live SERP data. By the end, you'll have a toolkit genuinely capable of replacing hours of manual agency work every week.

Learning Objectives

Parse and analyse large server log files with Python
Build a redirect chain follower that detects loops, long chains, and bad destinations
Call the Claude API programmatically from within a Python script
Combine live web scraping with AI analysis to generate content briefs
Understand how to structure and maintain a professional tools repository
Know where to go next after completing the programme

1Your Completed Tools Repository

Before building the final three tools, here's what your seo-tools repository should look like by the end of this tutorial — a complete, professional toolkit any member of your team can use:

seo-tools/

├── .env # API keys — never commit this

├── .gitignore

├── config/

│ └── gsc_credentials.json # GSC OAuth credentials

├── scripts/

│ ├── check_status.py # Tutorial 3: HTTP status checker

│ ├── internal_links.py # Tutorial 4: Internal link finder

│ ├── pagespeed_checker.py # Tutorial 6: PageSpeed batch checker

│ ├── gsc_puller.py # Tutorial 6: GSC declining queries

│ ├── log_analyser.py # Tutorial 8: Log file analyser ← new

│ ├── redirect_auditor.py # Tutorial 8: Redirect chain auditor ← new

│ └── brief_generator.py # Tutorial 8: AI content brief generator ← new

├── data/ # Input files go here

├── output/ # All generated reports

└── README.md # How to use each tool (Claude Code can write this)

Ask Claude Code to write your README: Once all tools are built, open Claude Code and say: "Read all the Python scripts in the scripts/ folder and write a comprehensive README.md that explains what each tool does, what input files it needs, how to run it, and what output it produces." It will do this automatically.

Tool 1 of 3

Log File Analyser

🕷️ Crawl budget ⏱ ~60 min build time 📊 Large file parsing

What it does

Server log files contain every request made to your site — including every Googlebot visit. Analysing them reveals what Google is actually crawling, how often, and where it's wasting budget on low-value URLs. This tool parses a raw Apache or Nginx access log, filters for search engine bot requests, and produces a prioritised report of crawl budget waste.

Key concepts

📄

Log file format

Access logs are plain text, one line per request. Each line contains: IP, date, HTTP method, URL, status code, bytes, referrer, and user agent. We filter for lines where the user agent contains "Googlebot".

🗜️

Large file handling

Log files can be gigabytes. We read them line-by-line rather than loading the whole file into memory — essential for files over ~100MB.

📉

Crawl budget waste

URLs crawled frequently that return 4xx, 5xx, or redirect responses are wasting crawl budget. So are low-value URL patterns like faceted navigation, session IDs, and print pages.

📊

Crawl distribution

We look at how Googlebot's crawls are distributed across the site — are key commercial pages being crawled as often as low-value pages?

Building the tool

Build a Python script at scripts/log_analyser.py that analyses server access log files for SEO crawl budget insights. INPUT - Accept a log file path as a command-line argument: python scripts/log_analyser.py --log data/access.log - Support both Apache combined log format and Nginx default log format - Read the file line-by-line (do not load the whole file into memory — logs can be very large) PARSING For each line, extract: IP address, datetime, HTTP method, URL path, status code, bytes transferred, user agent. Filter to keep only lines where user agent contains "Googlebot" or "bingbot" (case-insensitive). Parse the URL to extract: path, query string parameters (as a list of param names). ANALYSIS — generate these insights: 1. CRAWL OVERVIEW - Total bot requests in the log - Date range covered - Unique URLs crawled - Requests per day (average) 2. STATUS CODE BREAKDOWN - Count and percentage of 200, 301, 302, 404, 500, other - List top 20 URLs returning 404 (sorted by crawl frequency) - List top 10 URLs returning 5xx errors 3. CRAWL BUDGET WASTE — URLs crawled more than 5 times that return non-200 status - Also flag URL patterns that indicate low-value crawling: * Contains query parameters: ?page=, ?sort=, ?filter=, ?sessionid=, ?ref= * Contains /tag/, /author/, /feed/, /search?, /wp-login, /xmlrpc - For each waste pattern: count of URLs affected, total crawl requests wasted 4. TOP CRAWLED URLS — top 50 most-crawled URLs with their status codes and crawl count 5. CRAWL FREQUENCY DISTRIBUTION - Bin URLs by crawl frequency: crawled once, 2-5 times, 6-20 times, 20+ times - What % of total crawl budget goes to each bin? OUTPUT - output/log_analysis_report.html — styled HTML report with all 5 sections - output/log_analysis_waste.csv — just the waste URLs for easy remediation - Print a brief summary to terminal when complete Handle malformed log lines gracefully — skip and count them rather than crashing. After building, test with a sample log. I'll provide the real log file path separately.

Sample output

Terminal summary on completion

Log File Analysis Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
File: data/access.log  |  Size: 847 MB
Date range: 2026-01-01 to 2026-02-19  (50 days)

Googlebot requests:  24,847  (avg 497/day)
Unique URLs crawled:  8,312

Status breakdown:
200 OK           18,204  (73.3%)
301 Redirect      3,891  (15.7%)  ← budget waste
404 Not Found     2,196   (8.8%)  ← budget waste
500 Server Error    556    (2.2%)  ← budget waste

⚠ Crawl budget waste: 6,643 requests (26.7% of total)
⚠ Faceted nav URLs crawled: 1,204 unique URLs, 3,812 requests

Reports saved to output/log_analysis_report.html
Waste URLs saved to output/log_analysis_waste.csv

How to get a log file from a client

Log files live on the web server. For clients on shared hosting, they can download them from cPanel → Logs → Raw Access. For VPS/dedicated servers, they're typically at /var/log/apache2/access.log or /var/log/nginx/access.log. For clients on cloud hosting (WP Engine, Kinsta, etc.), check their dashboard for a log download option. A week's worth of logs is usually sufficient for analysis.

Tool 2 of 3

Redirect Chain Auditor

🔗 Redirect chains ⏱ ~45 min build time 🔄 Loop detection

What it does

Takes a CSV of URLs (from a crawl, a sitemap, or a manual list), follows every redirect chain for each URL, and produces a report flagging: chains longer than 2 hops, redirect loops, chains that end in non-200 responses, and HTTP-to-HTTPS redirect opportunities. This is one of the most-requested tools in SEO migrations and site audits.

Why redirect chains matter: Each hop in a redirect chain adds latency, dilutes PageRank, and risks Googlebot abandoning the chain before reaching the destination. Google's John Mueller has stated Google will follow up to 10 redirects, but best practice is a maximum of 1–2 hops. Chains from old site migrations often go undetected for years.

Build a Python script at scripts/redirect_auditor.py that audits redirect chains for a list of URLs. INPUT - Read URLs from data/urls.csv (column: "url") - Accept optional --input flag: python scripts/redirect_auditor.py --input data/my_urls.csv REDIRECT FOLLOWING For each URL, follow the full redirect chain manually using requests with allow_redirects=False so we can record each hop individually. For each hop record: the URL, HTTP status code, Location header (where it redirects to), and response time in ms. Continue following until: a non-redirect status code is reached, more than 10 hops occur (flag as potential loop), or the same URL appears twice in the chain (definitive loop — stop immediately). Set a per-request timeout of 8 seconds. ANALYSIS — classify each URL as: - OK: single hop or no redirect, destination returns 200 - CHAIN: 2 hops — acceptable but worth reviewing - LONG_CHAIN: 3+ hops — should be fixed - LOOP: same URL appears twice in chain — must be fixed - BAD_DESTINATION: chain ends in 404, 410, 500, or other non-200 — must be fixed - HTTP_UPGRADE: URL starts with http:// and redirects to https:// — flag as fixable at source OUTPUT 1. output/redirect_audit.html — styled HTML report with: - Summary stats at top: counts of each classification - Full chains table: for each URL, show each hop as URL → [status] → URL → [status] → final URL - Colour-coded severity: red for LOOP/BAD_DESTINATION, orange for LONG_CHAIN, grey for OK - Sort by severity (worst first) 2. output/redirect_audit.csv — flat CSV for spreadsheet analysis: Columns: source_url, classification, hop_count, final_url, final_status, full_chain Add a 0.3 second delay between URLs. Show progress: "Auditing URL X of Y..." After building, test on 10 URLs from data/urls.csv and show the HTML report output.

Useful follow-up improvements

Improvement	Follow-up prompt
Generate fix recommendations	"For each LONG_CHAIN URL, add a 'Recommended Fix' column showing the direct URL the source should redirect to (skipping intermediate hops)."
Bulk import from Screaming Frog	"Add support for a Screaming Frog redirect export CSV as an alternative input format — the column is called 'Address'."
Check redirect type consistency	"Flag any chains that mix 301 and 302 redirects — all hops should use 301 for permanent redirects."

Tool 3 of 3 — Capstone Build

AI-Powered Content Brief Generator

🤖 Claude API ⏱ ~75 min build time 🔍 SERP scraping + AI analysis

What it does & why it's different

This is the most sophisticated tool in the programme. Unlike the previous tools that use Python to process and report on data, this one calls the Claude API directly to perform AI analysis as part of the script itself. The result is a tool that: fetches the top 10 SERP results for a target keyword, extracts each page's heading structure, feeds everything into Claude, and receives back a full, structured content brief — all automatically.

Understanding the Claude API call

What's the Claude API? The same Claude you've been using in your browser is accessible via an API — meaning your Python script can send it a message and receive a reply, exactly like you do in the chat interface. The difference is it happens inside your code, automatically, as part of a larger workflow. You need an Anthropic API key for this (separate from your Claude for Teams subscription — see below).

API key for this tool: The Claude API requires an Anthropic API key from console.anthropic.com. This has a cost component based on usage, but content brief generation uses a modest amount of tokens — typically a few pence per brief. Add the key to your .env file as ANTHROPIC_API_KEY.

How the tool works — build phases

SERP scraping

Fetch top 10 Google results for the target keyword. Extract each result's URL, title, and meta description.

Page heading extraction

Fetch each of the 10 pages, extract H1–H3 headings, and the first sentence of each major section.

Claude API analysis

Send all extracted data to Claude with a detailed system prompt asking it to synthesise a content brief.

Brief generation

Claude returns a structured brief. The script formats and saves it as a clean HTML file.

Build a Python script at scripts/brief_generator.py that generates AI-powered content briefs by analysing the top SERP results for a target keyword. CONFIGURATION (at the top of the script) - TARGET_KEYWORD: set by the user before running, or accept --keyword CLI argument - TARGET_AUDIENCE: who the content is for - CONTENT_GOAL: e.g. "rank for this keyword and generate leads" - SERP_RESULTS_TO_ANALYSE: 8 (top 8 results) PHASE 1 — SERP FETCHING Use the SerpAPI or ValueSERP API to fetch the top organic results for the keyword. Load the API key from .env as SERP_API_KEY. Extract for each result: position, URL, title, meta description. Skip any results from the same domain as TARGET_DOMAIN (loaded from .env). PHASE 2 — HEADING EXTRACTION For each result URL, fetch the page and extract using BeautifulSoup: - Page title and meta description - All H1, H2, H3 tags in order, preserving hierarchy - Word count estimate (count words in p, li tags) Skip any URL that takes more than 8 seconds to load. Add a 1 second delay between fetches. PHASE 3 — CLAUDE API ANALYSIS Using the anthropic Python library, send the following to claude-sonnet-4-6: SYSTEM PROMPT: "You are a senior SEO content strategist. Analyse the provided SERP data and produce a detailed content brief." USER MESSAGE: "Target keyword: [keyword] Target audience: [audience] Content goal: [goal] Here are the top [n] ranking pages and their content structure: [formatted list of each page: URL, title, headings hierarchy] Produce a content brief with these sections: 1. SEARCH INTENT ANALYSIS — what is the user actually looking for? (100 words) 2. RECOMMENDED CONTENT FORMAT — article, guide, comparison, tool page, etc. With rationale. 3. SUGGESTED TITLE TAG — 3 options, each under 60 characters 4. SUGGESTED META DESCRIPTION — 2 options, each under 155 characters 5. RECOMMENDED HEADING STRUCTURE — a full H1, H2, H3 outline for the article. Include: topics covered by all top pages (table stakes), topics covered by 3+ pages (strong signals), and any gap topics missing from current results (differentiation opportunity). 6. WORD COUNT RECOMMENDATION — with rationale based on competitor lengths 7. KEY ENTITIES TO INCLUDE — important named concepts, tools, brands that appear across results 8. INTERNAL LINKING OPPORTUNITIES — [leave blank with placeholder: 'To be completed by SEO team'] 9. CONTENT NOTES — any specific advice on tone, depth, or angle to differentiate" PHASE 4 — OUTPUT Save the brief to output/brief_[keyword-slug]_[date].html as a clean, styled HTML document. The HTML should be formatted for easy reading and suitable for sending to a content writer. Also save a plain text version to output/brief_[keyword-slug]_[date].txt. Print a one-paragraph summary to the terminal.

The Claude API call in plain Python

For reference, here's the core API call Claude Code will generate. You don't need to write this yourself — it's shown here so you understand what's happening:

The Claude API call — reference onlyPython
import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env automatically

message = client.messages.create(
  model   = "claude-sonnet-4-6",
  max_tokens = 4096,
  system  = "You are a senior SEO content strategist...",
  messages = [
      {
          "role":    "user",
          "content": f"Target keyword: {keyword}\n\n{serp_data}"
      }
  ]
)

brief_text = message.content[0].text  # The full brief as a string

Choosing the right model: Use claude-sonnet-4-6 for brief generation — it's the best balance of quality and cost for this task. For simpler tasks like summarising a single page's headings, claude-haiku-4-5-20251001 is faster and cheaper. Claude Code will use whichever model you specify.

Sample brief output snippet

output/brief_technical-seo-audit_2026-02-19.html — excerpt

CONTENT BRIEF — "technical seo audit"
Generated: 19 Feb 2026 | Analysed 8 competitor pages

1. SEARCH INTENT ANALYSIS
Mixed intent: primarily informational (users learning what a technical SEO
audit involves) with a secondary commercial layer (users evaluating whether
to hire an agency vs. do it themselves). Content must serve both: explain
the process credibly while positioning the brand as the expert to trust
with the work.

2. RECOMMENDED FORMAT
Comprehensive guide with embedded tool/checklist. All top-ranking pages are
long-form guides (2,400–4,800 words). A checklist component would provide
differentiation and increase time-on-page.

5. RECOMMENDED HEADING STRUCTURE
H1: What Is a Technical SEO Audit? (Complete Guide for 2026)

H2: What Does a Technical SEO Audit Cover?
  H3: Crawlability and indexation
  H3: Site architecture and internal linking
  H3: Page speed and Core Web Vitals
  H3: Structured data and schema markup
  H3: Mobile usability
  H3: International SEO (hreflang)    ← gap: only 2/8 pages cover this

H2: How to Run a Technical SEO Audit: Step-by-Step
...

2Where to Go From Here

Completing this programme means your team can build, maintain, and iterate on professional SEO tools. But this is a starting point, not a finish line. Here are the most valuable directions to explore next:

🗄️

Add a database layer

Store results in SQLite so you can track metrics over time. Ask Claude Code: "Add SQLite storage so each run's results are saved with a timestamp."

🌐

Build web interfaces

Wrap your scripts in a simple Flask or FastAPI web app so non-technical team members can run them via a browser form without touching the terminal.

📅

Full automation

Schedule the GSC puller, PageSpeed checker, and log analyser to run weekly via cron. Email or Slack the summaries automatically every Monday morning.

🧩

Build a unified dashboard

Combine all tool outputs into a single HTML dashboard that gives a weekly health overview for each client — one page, all key metrics.

🔌

More API integrations

Add Ahrefs, Semrush, or GA4 API connections using the pattern from Tutorial 6. Each new data source multiplies the value of the existing tools.

📦

Package for the team

Add a simple requirements.txt and setup instructions so any new team member can get all tools running in under 10 minutes. Claude Code can write both.

3Practice Exercises

✏️ Exercise 1 — Build the log file analyser

Build Tool 1 and run it on a real client log file:

Use the Claude Code prompt from Section 1 to build the script
Obtain a week's worth of access logs from one client (see "How to get a log file" in Section 1)
Run the analyser and review the HTML report
Identify the top 3 crawl budget waste issues in the report
Follow up: "For each waste pattern identified, write a one-paragraph recommendation I can include in the client's audit report"

✏️ Exercise 2 — Build the redirect auditor

Run the redirect auditor on a real migration or audit project:

Build the script using the Claude Code prompt from Section 1
Create a URLs CSV from a recent Screaming Frog crawl (export redirect URLs)
Run the auditor and review classifications in the HTML report
Identify any LOOP or BAD_DESTINATION cases — these need immediate attention
Follow up in the same session: "Generate a redirect fix plan CSV with columns: source_url, current_chain, recommended_direct_target"

✏️ Exercise 3 — Build the content brief generator (capstone)

Build Tool 3 and generate a real brief for a client keyword:

Get an Anthropic API key from console.anthropic.com and add it to .env
Get a SerpAPI or ValueSERP key (both offer free trial credits) and add to .env
Use the Claude Code prompt from Section 1 to build the script
Run it for a target keyword from one of your client's wish lists
Review the generated brief — is the heading structure reasonable? Compare it to what you'd have written manually. What did it miss?
Send the brief to a content writer and ask for their feedback on usefulness

4Programme Complete

Claude Training Programme · Technical SEO Agency Edition

🏆 Programme Complete

Your team has covered all 8 tutorials across 4 weeks

🎯

Advanced Prompting

📚

Shared Prompt Library

🔍

Deep SEO Analysis

⚡

🔧

Claude Code

🔌

API Integrations

🏗️

Team Standards & SOPs

🤖

AI-Powered Tools

📈

Production Toolkit

Tools built: URL Status Checker · Internal Link Finder · PageSpeed Batch Checker · GSC Data Puller · Log File Analyser · Redirect Chain Auditor · AI Content Brief Generator

🛠️

7 Tools Built

A complete SEO toolkit that replaces hours of manual work every week

🤖

AI Inside Your Tools

The Claude API turns your scripts into AI-powered workflows, not just data processors

♾️

Compound Improvement

Every tool you build, every prompt you refine, every SOP you define makes the next one easier

Final thought: The most important thing you've built over these eight tutorials isn't any individual tool — it's the habit of reaching for Claude Code when you hit a repetitive task, the discipline of maintaining a shared prompt library, and the confidence to say "we can build that" when a client need arises. Those compound over months and years in ways that a single script never will.