Building an Internal Linking
Opportunity Finder
Internal linking is one of the highest-impact, most under-executed areas of technical SEO — and manually identifying opportunities across large sites is tedious and inconsistent. In this tutorial you'll build a tool that reads a sitemap and a set of target keywords, fetches each page, and produces a prioritised report of internal linking opportunities. It's a genuinely useful tool your team will use on real client work.
- Understand how to plan a multi-step tool before writing any code
- Learn how to work with XML sitemaps programmatically
- Understand basic keyword-in-content matching techniques
- Build a tool that fetches live pages and analyses their content
- Produce a formatted HTML report suitable for client sharing
- Practice breaking complex tools into phases with Claude Code
1What the Tool Does
Before writing a single line of code, it's worth being precise about what we're building. A vague brief leads to a vague tool. Here's the exact specification:
For each page on the site, the tool asks: "Does this page mention any of our target keywords — but doesn't already link to the target page for that keyword?" If yes, it's an internal linking opportunity. The report shows which page, which keyword was found, the suggested anchor text, and the target URL to link to.
The two input files you'll need
You'll need to prepare two files before building the tool. Here's what they look like:
What is a sitemap? An XML sitemap is a file, usually at yourdomain.com/sitemap.xml, that lists all the pages on a website. It's primarily for search engines, but it's also a convenient way for our tool to know which pages to check. View one by opening it in your browser.
2Key Concepts Before We Build
Sitemaps are XML files. Python can read them using the built-in xml.etree.ElementTree library to extract all the URLs listed inside.
We fetch each page with requests and use BeautifulSoup to extract the visible text and existing links from the page's HTML.
We check if a keyword phrase appears in a page's text content (case-insensitive). If it does, and the page doesn't already link to the target URL, we flag it.
Instead of a plain CSV, we'll generate a styled HTML file — more readable for sharing with clients or colleagues, and openable in any browser.
Why BeautifulSoup? It's the standard Python library for parsing HTML. It handles messy, real-world HTML gracefully. Claude Code will install it automatically when it builds the script. You don't need to do anything manually.
3Planning the Build in Phases
Complex tools are much easier to build — and debug — when you break them into phases. Rather than asking Claude Code to build everything in one go, we'll work through four phases. Each phase produces something testable before we move on.
Build a function that reads a sitemap XML (either a local file or a live URL), extracts all page URLs, and prints them. Verify it works before continuing.
Build a function that fetches a single URL, strips HTML tags, and returns: (a) the visible text content, (b) all outbound links already on the page. Test on one URL before scaling.
Build the matching logic: for each page, for each keyword, check if the keyword appears in the page text but the target URL is not in the existing links. Collect all matches.
Take the collected matches and write them to a styled HTML file, sorted by priority. Include a summary at the top showing total opportunities found.
4Setting Up Your Project Files
Add two new files to your existing seo-tools folder from Tutorial 3:
Create your keywords.csv now using the format shown in Section 1. Use real keywords and target URLs from one of your clients, or create a fictional example to test with. Aim for 5–15 keywords to start.
5Building the Tool — Phase by Phase
Opening Claude Code
Phase 1 prompt — Sitemap parser
Large sitemaps: Some sites have sitemaps with tens of thousands of URLs. For testing, it's fine — but when building Phase 3, we'll add a limit parameter so you don't accidentally fetch thousands of live pages in one run.
Once Phase 1 runs successfully and you see a URL count printed, move to Phase 2:
Phase 2 prompt — Page fetcher
Phase 3 prompt — Keyword matching
Phase 4 prompt — HTML report
6What the Finished Script Looks Like
After all four phases, Claude Code will have built a script structured roughly like this. You don't need to type this — it's here for reference so you can understand what was built:
7Sample Output
The HTML report will open in any browser and look something like this:
| Source Page | Keyword Found | Suggested Anchor Text | Link To | Priority |
|---|---|---|---|---|
| /blog/site-speed-guide | core web vitals | core web vitals | /blog/core-web-vitals-guide | ● High |
| /services/seo-consultancy | technical seo audit | technical SEO audit | /services/technical-seo-audit | ● High |
| /blog/crawling-best-practices | crawl budget | crawl budget | /blog/crawl-budget-optimisation | ● Medium |
| /about | page speed optimisation | page speed optimisation | /services/page-speed | ● Low |
8Useful Follow-Up Improvements
Once your base tool is working, here are valuable additions to ask Claude Code for in the same session:
| Improvement | What to say to Claude Code |
|---|---|
| Add context snippets to report | "Update the report to show a short text snippet (the sentence containing the keyword) in a tooltip or expandable row." |
| Skip non-content pages | "Add a filter to skip URLs containing /tag/, /category/, /author/, /page/, or /feed/ — these are archive pages we don't want to analyse." |
| Export to CSV as well | "In addition to the HTML report, also save the opportunities as a CSV at output/internal_links_report.csv." |
| Accept command-line arguments | "Add argparse support so I can run: python scripts/internal_links.py --sitemap https://example.com/sitemap.xml --max-pages 100" |
| Respect robots.txt | "Before fetching any pages, check the site's robots.txt and skip any URLs that are disallowed." |
9Practice Exercises
Work through all four phases using the prompts in Section 5:
- Create your
keywords.csvwith 8–12 real or realistic keywords - Run Phase 1 and confirm you can parse a live sitemap
- Run Phase 2 and confirm you can extract text and links from a page
- Run Phase 3 and confirm opportunities are being found
- Run Phase 4 and open the HTML report in your browser
- If any phase fails, paste the error message into Claude Code and let it fix it
Use the finished tool on a live client (with permission):
- Update
keywords.csvwith real target keywords and pages for the client - Run the script with
max_pages=30to start — check how long it takes - Open the HTML report and review the findings manually — do they look accurate?
- Flag any false positives (keyword appears in an irrelevant context) to Claude Code and ask it to improve the matching logic
Pick any one improvement from the table above and add it:
- Choose the improvement most useful to your workflow
- Use the suggested prompt as a starting point, but adapt it if needed
- Test the updated script confirms the improvement works
- If you added CLI arguments, practice running the script with different flags
10Summary
Key takeaway: The phased approach is the right way to build any non-trivial tool with Claude Code. Each phase is testable and self-contained — if something breaks, you know exactly which phase to look at. Never ask Claude Code to build an entire complex tool in one shot.