On Building a System to Solve AI Hallucination

On Building a System to Solve AI Hallucination

Last Updated: December 1, 2025 11 min read Tags: #ai integration #system design #content automation

Table of Contents


    If you’re responsible for a brand’s reputation, you’re likely facing a challenging paradox. The pressure to scale content and improve efficiency has never been higher, and generative AI is often sold as the silver bullet. Yet, the risk is significant.

    The idea of publishing a plausible-sounding AI hallucination is a high-stakes concern. A single, confident-sounding error can erode years of trust.

    This isn’t a theoretical problem. I faced this challenge while architecting the content pipeline for GetViajo.com. The engineering constraint was severe: I needed to produce high-volume, expert-level content to seed the platform, but I could not compromise on accuracy or risk flooding the domain with low-quality, hallucinated “slop.”

    I ended up building my own solution. This article details the architecture for the system I built, explaining the decisions I made to significantly de-risk AI content. It’s one component of the four-part marketing system I outlined in my main blueprint.

    The Risk of AI Hallucinations

    A common and dangerous flaw in many AI content workflows is their tendency to produce fabricated sources. You give a tool a prompt, and a finished article comes out, but you have little visibility into how it sourced its facts. Worse yet, it might cite claims you can’t verify or produce links to pages that don’t even exist.

    The financial and legal risks are well-documented. Deloitte famously botched a $290,000 government report due to hallucinations, and courts are now penalizing lawyers who cite fake cases.

    My solution was to design a repeatable process where the workflow itself guarantees a more trustworthy result. I designed the system to make quality an engineered requirement from the start.

    This approach shifts AI from being a questionable author to being a highly efficient, verifiable research assistant. It’s the content-side solution to the foundational problems of marketing data integrity.

    A Four-Script Pipeline for Trustworthy Content

    Instead of single-shot prompting (or even 2 or 3-shot prompting), my system is a disciplined, multi-stage pipeline. It’s a series of Python scripts, all orchestrated by one “main” script whose job it is to run the entire process, one segment at a time. The main script runs the first step, which produces an output file (a JSON file with research data). Then, once that file is created, the main script finds that newly-made file and kicks off the second script, which uses that file as its input, and so on.

    Each stage has a specific job to do, and the whole thing is designed to “fail-fast”, meaning that the entire process stops immediately if any script hits an error, rather than continuing to produce a broken or inferior article. It alerts me in Terminal about what went wrong so I can fix it before trying again.

    Here’s the flow.

    Script 1: Find Real, Verifiable URLs

    The process doesn’t start with the AI writing. It starts with the live, current internet.

    • What I Built: The pipeline’s first action is to run sophisticated boolean searches on Google (using SerpAPI) based on a templatized daily note, written in Obsidian.

    • The Business Outcome: This decision grounds the entire workflow in reality from step one. The system uses real-world data from verifiable, high-quality sources, just as a human researcher would. It provides a verifiable link for every claim, from an article ranking in the top 3 Google results. With my agency background in SEO, I trust this approach; Google puts a lot of effort into not ranking baseless claims in those top spots.

    An example of one of the hundreds of boolean searches that get searched, scraping the top 3 results of each An example of one of the hundreds of boolean searches that get searched, scraping the top 3 results of each

    Script 2: Scrape and Extract Actual Text Snippets

    Once the system has a list of high-quality URLs, it needs to gather the raw data.

    • What I Built: The system visits the top 3 results for each of those boolean searches and uses BeautifulSoup to parse the content. The logic is that if a sentence includes a phrase like “according to,” “X percent of,” or “Statistics show that” (and dozens of others), it might be something worth citing. The script logs those phrases, plus the sentences before and after for context, into a new JSON file, along with the source URL.
    • The Business Outcome: I use a custom density scoring logic based on my specific content tags to rank these snippets. The result is a large, highly-structured JSON file containing all potentially useful snippets and citations, each one paired with its source URL. This file becomes the verifiable foundation for the entire article.

    top 3 results of a marketing article-related google search In the example above, these would be the articles that get scraped, looking for potential citations from

    Script 3: Use AI as a Research Assistant

    Now that I have a (potentially massive) JSON file of potentially useful citations, statistics, and quotes, the next script’s job is to filter it.

    • What I Built: This Python script feeds the large JSON file of snippets into a new LLM session. Its job is not to write, but to analyze this pool of facts against the “goals” from my daily note. It selects the 6-10 most relevant, diverse, and high-quality snippets for mandatory citation.

    Obsidian Daily Note kicking off AI-content-writing without hallucinations What my Obsidian Daily Note Looks Like. These are the directives I give the initial prompt, which includes a section on my goals for the article and my unique angle, labeled “My Take”

    • The Business Outcome: This script creates a new, much shorter “final” JSON file containing only the best, prioritized citations that support the objectives I’ve laid out in my Obsidian Daily Note. The AI is forced to act as a research assistant, operating within the boundaries of pre-vetted, real-world data. This builds a foundation of trust before a single sentence of the draft is written.

    Script 4: Build a Grounded Narrative

    Only now, with a set of mandatory, pre-vetted citations, is the AI instructed to construct an outline, using the verifiable citations, and then begin writing.

    • What I Built: The system uses an entirely new prompt that instructs the AI to construct a narrative. This narrative must align with the “Goals” and “My Take” from my originating daily note of instructions, and it must build its argument using the findings from that final, definitive JSON of sources.
    • The Business Outcome: The final article is a trustworthy asset where every major claim can be traced back to its origin. The risk of hallucination is substantially reduced, and the output is a well-researched, properly cited piece that is credible by design.

    [!warning] The Pitfall of Single-Prompt Engineering Many teams try to “fix” AI content by writing complex, 10-page prompts. I found that this can be a difficult battle. My architectural choice was to avoid this by focusing on the system. My prompts at each stage are deliberately short and manageable.

    Anyone who has worked heavily with LLMs has surely seen what happens when they get overwhelmed. They don’t admit it, but the outputs get progressively less useful and starts producing worse and worse outputs until you’re banging your head against a wall. These short, deliberate prompts, each building on the last output, help reduce the likelihood of that happening.

    Solving the “Robot Voice” Problem

    Even if the facts are correct, default LLM outputs suffer from a generic “machine tone” that degrades brand perception. Common linguistic markers include excessive use of em-dashes, semicolon-heavy headers, and repetitive “not X, but Y” sentence structures.

    This is where the second key component of my system comes in. It’s a custom model that I fine-tuned on my entire archive of human-written articles from my consulting days. This ensures the voice is authentic and avoids those generic, robotic tones.

    A side by side comparison of AI writing styles vs human writing styles I very much hope this article doesn’t read like the first image

    • The Business Outcome: This is a key component for authenticity. The model learns to sound like me, which is a level above just sounding “human”. For a brand, this is the real value. The system can be trained on your team’s best writing, ensuring the final output is a genuine extension of your unique brand voice.

    Tools like GPTZero are notoriously unreliable, often flagging 100% human-written content as artificial. However, the fact that my system’s output consistently bypasses detection is a useful proxy metric for quality—it confirms the fine-tuning successfully mimics natural human syntax patterns.

    A screenshot proof of an article my system produced being scored as human-written by GPTZero This article was created 100% by my AI-system (with only a small amount of editing). It’s being flagged as human-written on GPTZero

    Engineered for E-E-A-T (And Why It Matters)

    Having spent the last 8 years specializing in SEO and Content Strategy with my (recently shut-down) agency, I view Google’s E-E-A-T standards (Experience, Expertise, Authoritativeness, Trustworthiness) as more than an SEO checklist; they are a reliable proxy for the brand trust I am responsible for protecting. My system was purpose-built to align with these standards by default.

    • Trustworthiness, Expertise, & Authoritativeness: The workflow solves these simultaneously by guaranteeing verifiable output. The system constructs a narrative around 6-10 diverse, high-quality sources, ensuring every claim is backed by a citation. This “verification-first” architecture provides the critical trust signal required by both users and search algorithms.
    • Experience: This is a part AI cannot do alone. This is where the “My Take” and “Goals” sections of my daily note are so useful. They allow me to provide the LLM with my personal opinions and guide the article’s direction. This allows the model to draft first-person perspectives that align with my strategy. The system delivers a 95% complete draft, leaving the final 5% for high-leverage “human-in-the-loop” review—focusing on nuance rather than drafting.

    My outcome is a de-risked AI system that allows me to scale my content efforts without sacrificing quality. I can produce more high-quality, on-brand content while maintaining (and even enhancing) the trust that is important to my success.

    Stay in the loop

    Sign up for my newsletter to get my latest updates directly in your inbox.

    By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

    How This Applies to a Growth Team

    Here are some of the principles from my system that can be applied to almost any workflow.

    • Mandate a “Research-First” Brief: Before your team ever opens an AI writer, require them to fill out a simple brief with 3-5 verifiable, high-quality URLs that will serve as the “ground truth” for the article.
      • Pro Tip: LLMs respond very well to structured data. I highly recommend formatting your sources in JSON, including the specific facts you want to use and the links for citation.
    • Define Your “Human-in-the-Loop” Pass: Clearly define what your human polishing step is for. Is it to add personal anecdotes? Is it to verify technical accuracy? This person is responsible for adding the “Experience” (the ‘E’ in E-E-A-T) that AI cannot.
    • Audit Your Process, Not Just Your Tools: Instead of asking if your AI tool is reliable, audit your internal process. Is your team fact-checking against the original source links, or are they just fact-checking the AI’s output? The first method is far more reliable.

    Future-Facing Questions

    [!faq]- Can’t you just use a tool like Perplexity to get non-hallucinated answers? Only kind of. Tools like Perplexity are fantastic answer engines. They are designed to take a prompt and give you a cited answer. That IS useful in its own right.

    The fundamental difference is control vs. convenience.

    • An answer engine is not transparent. You ask a question, and it autonomously decides which sources to use to build its answer. This is great for quick, one-off research.
    • My “content factory” is transparent. It is not designed to simply answer a question; it’s designed to manufacture a strategic, on-brand, SEO-optimized content asset at scale, based on my explicit instructions.

    My system not only avoids hallucinations, but it automates a workflow that guarantees a specific, high-quality output:

    1. Strategic Input: It’s driven by a detailed strategic brief (with “Goals” and “My Take”), not just a simple prompt.
    2. Verifiable Workflow: I control the entire research process. The system runs my specific boolean searches, scrapes only the top 3 Google results, extracts snippets based on my logic, and then uses AI to prioritize from that pre-vetted corpus.
    3. Authentic Voice: It uses my custom fine-tuned GPT model, trained on my own writing, to ensure the final article sounds like me, not like a generic AI.

    So while an answer engine is a useful tool for getting answers, this pipeline is a system for building strategic brand assets.

    [!faq]- How does a system like this adapt to new AI models (like GPT-5, Gemini, Claude etc)? This is a resilient part of the design. The “Content Factory” is a process, which makes it model-agnostic. I can swap in any new, more powerful model, and the workflow (Research -> Extract -> Prioritize -> Generate) remains the same. The system’s safeguards are independent of the AI.

    [!faq]- How much time does the “Human-in-the-Loop” part really take? I spend about 2 minutes writing a daily note with my objectives and goals. Then I run my Python script, which kicks off the series of processes and finishes the first draft in about 5 minutes. I use text-expander snippets to run a few saved prompts that add internal links and clean up redundancies, which takes another 2 minutes. Then, I manually read through the final draft and “humanize it,” which takes between 10 and 20 minutes. Overall, the system reduces the production time for a well-researched, publishable asset to approximately 30 minutes. The architecture transforms content creation from a manual bottleneck into a scalable engineering process.

    [!faq]- Can this system be integrated with an existing CMS? Yes. The system is built with Python, APIs, and Markdown files. It’s designed to be the “engine” that plugs into any front-end, similar to the four-part marketing system I outlined in my main blueprint. My own system, for example, integrates with Airtable and my website’s Git repository to automatically schedule and publish content, streamlining the final steps.

    Photo of Justin Borge

    By Justin Borge

    Marketing Growth Engineer. I design integrated systems (analytics, content, AI) to solve marketing chaos. This website is my 'backstage pass,' documenting what I'm working on, implementing, and learning.