Adding llms.txt and FAQ sections increased LLM citation rate by 2.4x in 60 days
A GEO experiment measuring the impact of adding a structured llms.txt file, FAQ sections, and JSON-LD structured data on citation rates across ChatGPT, Perplexity, and Claude. After 60 days, the product was mentioned in 2.4x more LLM-generated responses for target queries, with Perplexity showing the strongest improvement at 3.1x baseline.
The Experiment
Resend, a developer-focused email API, ran a 60-day experiment to measure whether targeted GEO (Generative Engine Optimization) changes could increase how often LLMs mention and recommend their product in AI-generated responses. The experiment tracked citation rates across three major LLMs — ChatGPT (GPT-4), Claude, and Perplexity — for a set of 25 target queries that developers commonly ask when evaluating email infrastructure.
The core question: can a developer tool company meaningfully influence how often AI models recommend their product, and if so, which interventions have the largest impact?
Baseline Measurement Method
Before making any changes, the team established a citation baseline over 14 days. The measurement protocol was deliberately rigorous to account for LLM response variability.
Query Set
25 target queries were selected based on search console data and developer community analysis:
- Product queries (5): "what is Resend," "Resend pricing," "Resend vs SendGrid," "Resend vs Postmark," "is Resend good for transactional email"
- Category queries (10): "best email API for developers," "transactional email service comparison," "email API with React templates," "how to send email from Next.js," "best alternative to SendGrid"
- Technical queries (10): "how to set up transactional email," "email API with TypeScript SDK," "send email with React components," "DKIM setup for developer email," "email deliverability best practices"
Measurement Protocol
Each query was submitted to all three LLMs three times per week (Monday, Wednesday, Friday) during the 14-day baseline period. Each response was scored on a 0-3 scale:
- 0 — Product not mentioned
- 1 — Product mentioned but not recommended (e.g., "Resend is one option")
- 2 — Product mentioned with positive context (e.g., "Resend is known for its developer experience")
- 3 — Product recommended as a top choice (e.g., "For React email templates, Resend is the best option")
The baseline citation score was calculated as the average score across all queries and all LLMs.
Baseline results:
| LLM | Average score | Mentioned at all (score ≥ 1) |
|---|---|---|
| ChatGPT | 0.82 | 44% of queries |
| Claude | 0.96 | 52% of queries |
| Perplexity | 0.71 | 36% of queries |
| Combined | 0.83 | 44% of queries |
Resend was mentioned in fewer than half of relevant queries, and when mentioned, it was rarely the primary recommendation.
Changes Made
Three categories of changes were implemented simultaneously in a single deployment. While this means individual attribution is impossible, the combined approach reflects how most companies would implement GEO improvements — as a cohesive update rather than isolated tests.
1. llms.txt File
A structured llms.txt file was created at resend.com/llms.txt containing:
- A factual product description ("Resend is a developer-focused email API for sending transactional and marketing email. It provides SDKs for Node.js, Python, Ruby, Go, PHP, and Elixir, and supports building email templates with React components via the React Email framework.")
- Content section links (docs, API reference, blog, changelog, pricing)
- Key topics covered (transactional email, email deliverability, DKIM/SPF setup, React Email templates)
- Machine-readable content index URL
- Citation format guidance
2. FAQ Sections on Key Pages
FAQ sections were added to 8 high-traffic pages:
- Homepage: 4 questions covering "What is Resend," "How is Resend different from SendGrid," "Is Resend free," and "What languages does Resend support"
- Pricing page: 3 questions covering free tier limits, overage charges, and enterprise pricing
- Each of 4 comparison pages (vs SendGrid, vs Postmark, vs Amazon SES, vs Mailgun): 3 questions per page covering migration effort, feature differences, and pricing comparison
Each FAQ was written in a question-answer format optimized for LLM extraction — complete sentences, factual claims, no marketing language. Every answer could stand alone as a self-contained response to the question.
### How is Resend different from SendGrid?
Resend is built specifically for developers who want to send
transactional email using modern tooling. Unlike SendGrid, Resend
supports building email templates with React components through
the React Email framework, provides a TypeScript-first SDK, and
offers a simpler API surface with fewer configuration options.
SendGrid offers a broader feature set including marketing email,
a visual template editor, and more extensive analytics.3. JSON-LD Structured Data
Article schema, FAQPage schema, and BreadcrumbList schema were added to every documentation page, blog post, and comparison page. The FAQPage schema was particularly important because it provides a machine-readable representation of the Q&A pairs that LLMs can parse directly from the page metadata.
Additionally, the homepage received Organization schema with product details, and all comparison pages received a custom ComparisonTable structured data format.
Results After 60 Days
The same 25-query measurement protocol was repeated for 60 days after the changes went live.
Citation Score Improvement
| LLM | Baseline score | Day 60 score | Improvement |
|---|---|---|---|
| ChatGPT | 0.82 | 1.74 | +112% (2.1x) |
| Claude | 0.96 | 1.88 | +96% (2.0x) |
| Perplexity | 0.71 | 2.21 | +211% (3.1x) |
| Combined | 0.83 | 1.94 | +134% (2.4x) |
Mention Rate Improvement
| LLM | Baseline mention rate | Day 60 mention rate |
|---|---|---|
| ChatGPT | 44% | 72% |
| Claude | 52% | 76% |
| Perplexity | 36% | 84% |
| Combined | 44% | 77% |
The combined mention rate increased from 44% to 77% — Resend went from being mentioned in fewer than half of relevant queries to being mentioned in more than three-quarters.
Improvement Timeline
The improvements didn't happen uniformly over 60 days. The timeline revealed three distinct phases:
Days 1-14: Perplexity responds first. Perplexity's real-time crawling picked up the changes within days. By day 14, Perplexity's citation score had already improved from 0.71 to 1.62 — a 128% increase. The FAQ sections had the most visible impact, with Perplexity directly quoting FAQ answers in its responses.
Days 15-35: Claude begins to shift. Claude's responses started reflecting the new content around week 3, likely through its retrieval mechanisms. The comparison page FAQs had the largest impact on Claude, with "Resend vs SendGrid" queries consistently producing more detailed and favorable responses.
Days 36-60: ChatGPT catches up. ChatGPT's improvements were the most gradual, consistent with its less frequent content refresh cycle. By day 60, ChatGPT had reached parity with Claude on most queries.
What Moved the Needle Most
While individual attribution isn't possible in this experiment, the team tracked which content changes were most frequently reflected in LLM responses:
FAQ sections were the dominant factor. When LLMs cited Resend, they disproportionately referenced FAQ content — often using the exact phrasing from FAQ answers. This makes structural sense: FAQ-format Q&A pairs are the closest natural language pattern to how users query LLMs. The training and retrieval systems are optimized to match questions to answers.
Comparison page content had outsized impact on "vs" queries. Before the experiment, "Resend vs SendGrid" queries in ChatGPT would return generic comparisons. After the FAQ sections were added to comparison pages, ChatGPT began citing Resend's own comparison content — including the nuanced "SendGrid offers a broader feature set" acknowledgment that built credibility.
llms.txt impact was hardest to isolate. The llms.txt file was crawled by GPTBot and PerplexityBot within the first week (confirmed via server logs), but its direct impact on citation quality is unclear. It likely contributed to the accuracy and specificity of LLM responses rather than the frequency of mentions.
FAQ
Can this experiment be replicated by smaller developer tools with less existing brand awareness?
Yes, though the baseline citation rate will be lower and the absolute improvement may take longer to materialize. Smaller tools benefit most from comparison page FAQs — "Alternative to [bigger competitor]" queries are high-intent and LLMs are actively looking for alternative recommendations to surface. Start with 3-5 comparison pages against your top competitors.
Does adding FAQ sections create duplicate content issues with Google SEO?
No. FAQ sections with unique, substantive answers do not create duplicate content problems. Google's own structured data guidelines recommend FAQ sections. The key is ensuring each answer is genuinely useful and not just repeating content from elsewhere on the page. Write FAQ answers as standalone responses that provide value independent of the surrounding page content.
How do you prevent competitors from gaming your comparison page FAQs?
You can't control what LLMs synthesize, but you can control the quality of your comparison content. Be factually honest about competitor strengths — LLMs can cross-reference claims across sources, and dishonest comparisons will be deprioritized or contradicted. The most effective comparison FAQ answers acknowledge where competitors are stronger while clearly articulating where your product wins.