How to Build an AI Critic System That Actually Improves Your Work

Shelly Palmer

5 months ago

My AI critic systems cut proposal revision cycles by more than 40 percent over 18 months, shortened blog and social production time by over 50 percent, and objectively improved the efficacy of our business writing. If you create more than ten similar deliverables per quarter, you will benefit from a structured set of critics that evaluates your work the same way every time. Here is how to build them.

What Is a Critics System?

A critics system is a set of structured files that defines multiple evaluation perspectives, each with weighted scoring criteria and explicit failure conditions. It replaces subjective judgment with systematic evaluation. You externalize the way your sharpest editors think. Instead of relying on intuition or hoping a colleague catches a buried flaw, you encode the evaluation criteria of your toughest critics into a repeatable workflow.

Weighted scoring forces discipline. Technical accuracy might count for 30 percent of the total score while style accounts for 10 percent. This prevents you from accidentally “passing” a weak deliverable because it sounds good. The system rewards what matters and penalizes what does not.

Three Advantages

Comprehensiveness. You never skip an essential evaluation dimension. If value articulation matters, the value critic always runs. Weak ROI language will be caught every time.

Calibration. Weighted scoring forces you to state what you care about. A factual error matters more than a stylistic choice. A vague scope definition in a contract matters more than a clever turn of phrase in the introduction.

Adversarial perspective. You may write a strong technical document that fails the procurement test. A well-designed critic system includes perspectives you would otherwise ignore. Critiques happen before clients see the work.

Note: These systems only pay off when applied to deliverables you produce repeatedly. One-off projects rarely justify the effort.

The Architecture

You don’t need to know any computer languages or how to code to do this. Your AI platform will help you create the actual files. (I’ve included a prompt below that will do most of this for you).

You’re going to ask your AI platform to create your AI critic system using JSON (JavaScript Object Notation). It is a structured text format that organizes your critic definitions in key-value pairs both humans and AI systems can read.

I like JSON over XML (too verbose), YAML (whitespace errors break everything), or plain English (too ambiguous for consistent evaluation) because it balances readability with the structure needed for systematic scoring. But you can use any language an LLM can read. The basic architecture contains three layers:

Layer 1: Identity and perspective. Each critic needs a role such as “Executive Readability Critic,” “Technical Accuracy Critic,” or “Client Pushback Predictor.” The role determines what the critic sees and what it ignores.

Layer 2: Scoring criteria. A 1-to-5 scale (or 1-to-10) means nothing without anchors. Define what separates a 3 from a 4, what triggers a 1, and what excellence requires for a 5. Precision matters.

Layer 3: Failure conditions and weights. Some problems are dealbreakers. A proposal with the wrong client name fails. An SOW with unlimited scope language fails. These auto-fail triggers are kill switches for quality. Weights force you to prioritize what actually matters.

Building Your First System

Start with one deliverable you create often. Proposals, SOWs, blog posts, or client communications. Budget three to four hours for your first build. ChatGPT, Gemini, and Claude all handle this well.

Step 1: Identify five to nine perspectives that matter. For a proposal: strategic clarity, value articulation, scope definition, technical credibility, client-centricity, executive usability, and client pushback. For a blog post: audience fit, insight density, structure, voice consistency, and competitive differentiation.
Use your past ten deliverables. Where did you fail? What required revisions? What created client questions? Each failure mode becomes a critic.

Step 2: Write evaluation prompts with specificity. “Evaluate whether the proposal articulates business value” is vague. Instead: “Evaluate whether each deliverable ties to a measurable client outcome that an executive could present to a CFO without additional context.”

Step 3: Define a scoring scale with concrete anchors. A 7 means “Meets professional standards with minor gaps.” An 8 means “Ready for client review.” A 9 means “Exceptional; could serve as a template.”

Step 4: Assign weights. Some dimensions matter more than others. For proposals, client-centricity and value should outweigh stylistic polish. Set minimum thresholds. My proposal critics require an overall score of 7.5 with no critic below 6.

Or Just Vibe-Create It

If you do not want to design the system manually, upload three or four of your best deliverables and copy this prompt into your LLM:

Create an AI critic system for evaluating my [TYPE OF DELIVERABLE]. Analyze the attached examples and build a structured evaluation framework with 5-7 critic perspectives, weighted scoring criteria, and failure conditions.

Requirements:

Identify Quality Patterns: Review the attached examples and identify what makes them effective. Look for patterns in structure, argument flow, evidence use, clarity, and business impact.

Define Critic Perspectives: Create 5-7 distinct evaluation perspectives based on what matters most for this deliverable type. Each critic should evaluate one specific dimension (strategic clarity, technical accuracy, client-centricity, etc.).

Build Scoring Criteria: For each critic, define a 1-5 scoring scale with explicit anchors:

What separates a 3 from a 4?
What conditions trigger a 1?
What does excellence look like at 5?

Assign Weights: Determine relative importance of each critic (totaling 100%). What matters most for success? What are the fatal flaws worth catching?

Set Failure Conditions: Define auto-fail triggers for critical flaws that make the work unpublishable regardless of other scores.

Output Format: Provide the complete critic system as a structured JSON file with:

Critic name and role
Evaluation prompt with specific questions to ask
Scoring scale definitions (1-5 with concrete examples)
Weight (as decimal, e.g., 0.18 for 18%)
Minimum acceptable score threshold

Make the evaluation criteria specific enough that two different people running the same content through the system would produce similar scores. Avoid vague language like “good quality” or “clear writing.” Use concrete, measurable criteria.

Advanced Application: Research-Enhanced Critics

The most sophisticated critic I built is the “Prospective Client Pushback Predictor.” Before scoring, this critic researches the actual client through earnings calls, press releases, and stakeholder LinkedIn profiles. It evaluates whether the proposal aligns with what the client has publicly stated about their strategic priorities.

For example: A proposal to a company that ignores their publicly disclosed AI initiatives and impact targets fails the alignment test. The critic catches what I would miss because it forces an external reality check against real client data. This approach saved several proposals from rejection by identifying strategic misalignments that would have cost the various engagements.

Note: My Prospective Client Pushback Predictor critic requires either manual research or web scraping capabilities. Budget additional development time (subscription fees and token costs, etc.) for API integrations with LinkedIn, earnings call transcription services, or press release aggregators.

System Design Forces Intellectual Honesty

The very best part about building a critic system is that it forces intellectual honesty. You must articulate what good looks like before you can evaluate whether you have achieved it. You must decide what matters before you can weight it. You must define failure before you can avoid it.

The process of building the system may prove as valuable as using it. When you write explicit scoring criteria, you surface assumptions you did not know you held. When you assign weights, you discover misalignments between stated priorities and actual behavior. I learned more about my own quality standards in four hours of building critics than in four years of receiving ad hoc feedback. I revise my critics often. You should too.

How to Use Your Critics

Once you build a critic system, you need a workflow.

Command line. I run critics through Python scripts that call the Claude API. It requires basic coding but offers automation and control. The workflow reads my JSON critic files, sends content plus evaluation criteria to the API, and returns structured feedback with scores.

ChatGPT. Create a Custom GPT specifically for your critic system. Upload your JSON critic file as knowledge, set the instructions to “Evaluate uploaded content according to the critics framework in your knowledge base,” and configure it to output structured scores and feedback. Each deliverable type can have its own Custom GPT for quick access.

Claude. If you’re not using ClaudeCode, a good way to use critics with Claude is to create a Claude “skill” for each deliverable type. The skill embeds your JSON critic file and evaluation logic, then you invoke it with a slash command like /critique-blog or /critique-proposal. This provides repeatability and consistency. Alternatively, open a new project, upload or copy your JSON critic file, then paste your content and ask Claude “evaluate this content using this critics file.”

Gemini. Use Gemini’s chat interface with uploaded files. Start a new conversation, upload your JSON critic file using the attachment feature, then paste your content and prompt: “Using the critics system I uploaded, evaluate this content and provide scores for each critic.” Gemini Advanced handles larger context windows for comprehensive critic frameworks.

Testing

Run the same deliverable through your critic at least twice. If you get inconsistent scores, your criteria need more precision. It takes time to calibrate a critic system. You will probably change the weights of each criteria several times. You will always be tweaking some parameters to improve performance. This is all part of the process.

Housekeeping

File management is very important. Store your JSON critics in a dedicated folder. (I keep mine in a private repo on GitHub.) Version them as you refine your criteria. Consistency matters, as does version control. Then, be sure to track your improvement. You can add a workflow to measure revision cycles before and after implementing critics. Document time saved. Review your results quarterly. You know the old saying, “What gets measured, gets managed.”

My Disclaimer

At the bottom of every blog post, I include the line: “This work was created with the assistance of various generative AI models.” Now you know what that really means. If you’ve read this far, and you are interested in seeing what my “Nine Critics Master Framework” looks like. Please reach out. I’m happy to share it with you.

Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.