llms.txt is a proposed open standard for helping large language models (LLMs) better understand, navigate, and cite content from websites.
Placed at the root of a website at /llms.txt, the file acts as a curated, AI-friendly guide to a site’s most important content—written in plain Markdown so both machines and humans can read it easily.
Think of it as the modern evolution of robots.txt, but instead of telling crawlers what to avoid, it tells AI systems where to find your best content.
Table of Contents
Historical Origins
The llms.txt standard was first proposed on September 3, 2024 by Jeremy Howard, Australian technologist, co-founder of fast.ai, and founder of Answer.AI.
Howard originally published the proposal as a blog post at answer.ai, framing it not as a finalized standard but as a starting point for community discussion and experimentation.
His motivation was practical: as LLMs increasingly rely on website content to assist users in real time—during inference, not just during training—the process of assembling relevant context from a site was ambiguous and inefficient.
Do you crawl the entire sitemap? Include external links? Include source code?
Howard’s answer was simple: let the site author decide, and give them a standardized file format to communicate that decision.
The Problem llms.txt Solves
Large language models face a fundamental technical constraint: context windows are too small to handle most websites in their entirety. Converting complex HTML pages filled with navigation menus, advertisements, JavaScript, and boilerplate into clean, usable text is both difficult and imprecise.
The result is that AI tools often pull from whatever they can parse fastest—which may include outdated pages, duplicate content, or low-signal sources rather than your most authoritative, carefully crafted content.
Without a guiding file, LLMs are essentially navigating your website blindfolded, guessing at what matters most.
llms.txt solves this by giving site owners a standardized way to say:
“Here are the pages that matter. Here’s what my site is about. Here’s how to understand it.”
Modern Definition
At its core, llms.txt is a plain text file written in Markdown, placed at the root directory of a website (yourdomain.com/llms.txt).
It provides:
- A concise description of the website or project
- Curated links to the most important and LLM-readable pages
- Optional contextual notes explaining what each linked resource covers
- Guidance on how to interpret the site’s content and structure
The file is specifically designed for inference-time use—meaning it helps LLMs give better answers to users right now, rather than being primarily aimed at training future models.
How llms.txt Differs from robots.txt and sitemap.xml
Many people initially compare llms.txt to existing web standards, but the differences are significant.
| File | Purpose | Format | Audience | Content |
|---|---|---|---|---|
robots.txt | Control crawler access | Custom syntax | Search engine bots | Allow/disallow rules |
sitemap.xml | List all URLs for indexing | XML | Search engine bots | Full URL list with metadata |
llms.txt | Curate key content for LLMs | Markdown | LLMs and AI agents | Structured links with descriptions |
Unlike robots.txt, llms.txt contains no blocking or disallow directives—it is purely a positive, affirmative guide to your best content.
Unlike sitemap.xml, which lists every indexable page, llms.txt is a curated subset of the most important content, designed to fit within an LLM’s context window.
The File Format Explained
The llms.txt specification uses Markdown because it is the format most widely and easily understood by language models, while also remaining human-readable and parseable by standard programming tools.
Required and Optional Sections
A valid llms.txt file follows a specific structure in this order:
- An H1 heading — The name of the project or site (only required element)
- A blockquote — A short summary containing key information necessary for understanding the file
- Body sections — Paragraphs or lists with more detailed background information
- H2-delimited file lists — Sections containing curated URLs with optional descriptions
- An “Optional” section — Links that can be skipped if a shorter context is needed
Example llms.txt File
Here is a simplified example based on the official specification:
# My Company Name
> We build project management tools for remote teams.
Our platform integrates with Slack, Notion, and GitHub.
## Docs
- [Getting Started](https://example.com/docs/start.md): Setup guide for new users
- [API Reference](https://example.com/docs/api.md): Full API documentation
## About
- [Company Overview](https://example.com/about.md): Mission and team
## Optional
- [Case Studies](https://example.com/case-studies.md): Customer success stories
The .md Companion Convention
The llms.txt proposal also includes a companion convention: making clean Markdown versions of web pages available at the same URL as the original page, but with .md appended.
For example, yoursite.com/blog/post would have a corresponding yoursite.com/blog/post.md that strips away HTML, navigation, and other non-content elements to deliver clean, LLM-digestible text.
llms-full.txt
A common variation is llms-full.txt, which expands the llms.txt index into a single large file containing the complete flattened text of the entire website rather than just links and descriptions.
This approach trades compactness for completeness, and some sites use both files simultaneously—the standard llms.txt for quick context assembly and the full version for deep site analysis.
Relationship to Generative Engine Optimization (GEO)
As AI-powered search engines like Perplexity, ChatGPT Search, and Google’s AI Overviews become primary discovery channels, a new discipline called Generative Engine Optimization (GEO) has emerged alongside traditional SEO.
While SEO focuses on ranking in search results, GEO focuses on being the source that AI systems cite when generating answers. llms.txt directly supports GEO strategy by:
- Clarifying canonical sources: Specifying which URLs represent your definitive content, reducing the risk of AI citing outdated or duplicate pages
- Prioritizing high-quality content: Ensuring thought leadership, case studies, and cornerstone content are visible to AI systems
- Protecting brand narrative: Directing LLMs to preferred content to reduce inaccurate or generic AI-generated descriptions of your business
- Supporting AI search indexes: Helping your content surface in Perplexity, ChatGPT, and other AI-driven search layers
Who Is Using llms.txt?
Adoption has grown rapidly since the proposal’s release in September 2024, though it remains concentrated in the developer and technology community.
Companies like Perplexity, Anthropic, and various developer documentation platforms have created llms.txt files for their own documentation and internal use.
As of June 2025, a scan of the top 1,000 most visited global websites showed approximately 0.3% adoption (3 out of 1,000 sites), suggesting the standard is still in early-adopter territory among mainstream web properties.
However, adoption among developer tools, SaaS documentation, and AI-adjacent companies is significantly higher.
Notable early adopters include:
- Anthropic: Uses llms.txt in internal documentation for agent-building
- FastHTML and nbdev projects: All fast.ai and Answer.AI software projects using nbdev automatically generate
.mdversions of all pages - Perplexity: Has developed llms.txt files for its own documentation
- Mintlify, GitBook, and other documentation platforms: Have built native llms.txt generation into their platforms
Current Debate and Honest Limitations
It is important to approach llms.txt with measured expectations.
The standard remains a proposal, not an adopted protocol, and there are legitimate questions about its real-world impact.
Key Criticisms
LLMs may not actually read it: Critics point out that there is limited verified evidence that major AI systems—including ChatGPT, Claude, and Gemini—actively read and prioritize llms.txt files during inference. Google’s John Mueller has stated he doesn’t know of any search systems that use the file.
No enforcement mechanism: Like robots.txt, llms.txt can be obeyed or ignored by any AI agent—there is no technical enforcement. Its effectiveness depends entirely on voluntary adoption by LLM providers.
Not a ranking signal: llms.txt does not guarantee that a brand will appear in AI-generated answers. AI search does not operate based on a single file.
The illusory truth effect: The standard has spread rapidly through SEO and marketing communities, which some argue has outpaced actual evidence of its effectiveness.
The Measured Case for Adopting It Anyway
Despite these limitations, there are practical reasons to implement llms.txt:
- Google crawls it: Google has been observed crawling llms.txt files weekly, and in December 2025 stated that adding one would not harm a website.
- Legitimate AI crawlers check for it: Server logs show GPTBot, ClaudeBot, and PerplexityBot do access the file.
- Future-proofing: As AI search evolves, having a well-structured llms.txt positions you ahead of adoption curves.
- Content audit value: The process of creating llms.txt forces a useful exercise in identifying your most important content.
- Low implementation cost: Creating a basic llms.txt file takes minutes and costs nothing.
How to Create an llms.txt File
Step 1: Audit Your Most Important Content
Identify the pages that best represent your brand, products, services, and expertise. Focus on quality over quantity—the goal is a curated guide, not an exhaustive sitemap.
Step 2: Write Clear, Neutral Descriptions
LLMs perform best with content that clearly defines terms, avoids emotional language, and does not rely on context-free marketing claims. Instead of “an innovative, groundbreaking platform,” use “an analytics platform for monitoring and analyzing user behavior.”
Step 3: Structure the File in Markdown
Create your H1 title, add a blockquote summary, and build your H2 sections with linked file lists. Keep descriptions concise and informative.
Step 4: Place It at Your Domain Root
Host the completed file at yourdomain.com/llms.txt where AI crawlers expect to find it.
Step 5: Optionally Create .md Page Versions
For maximum AI readability, create Markdown versions of your key pages by appending .md to their URLs.
Step 6: Test and Monitor
Use a tool like llms_txt2ctx to expand your file into a full LLM context file and test whether AI systems can accurately answer questions about your content. Monitor server logs for bot access from GPTBot, ClaudeBot, and PerplexityBot.
Tools and Plugins for llms.txt
A growing ecosystem of tools supports llms.txt creation and management:
llms_txt2ctx: Official CLI and Python module for parsing llms.txt files and generating expanded LLM contextvitepress-plugin-llms: VitePress plugin that auto-generates LLM-friendly documentationdocusaurus-plugin-llms: Docusaurus plugin for LLM-friendly docs following the llms.txt standardllms-txt-php: A PHP library for reading and writing llms.txt files- Drupal LLM Support: A Drupal Recipe providing full llms.txt support for Drupal 10.3+ sites
- GitBook: Native llms.txt generation built into the documentation platform
- VS Code PagePilot Extension: Automatically loads external context from llms.txt files for enhanced responses
Use Cases by Industry
Software and Developer Documentation
The most natural fit for llms.txt—developers often use AI assistants while coding and need accurate, up-to-date library documentation. llms.txt ensures AI coding tools reference current API documentation rather than outdated information.
E-commerce
llms.txt can outline product categories, return policies, shipping information, and FAQ content—ensuring AI assistants give accurate answers when customers ask questions about a store.
Professional Services and B2B
Agencies, consultancies, and SaaS companies can use llms.txt to ensure AI systems accurately describe their services, expertise, and differentiators.
Publishing and Media
Content publishers can curate their most authoritative editorial content, helping AI systems cite original reporting rather than aggregated or republished versions.
Personal and Portfolio Sites
Individuals can use llms.txt to help AI systems accurately answer questions about their background, work, and expertise.
Future Outlook
The llms.txt standard sits at the intersection of several major trends reshaping the web: the rise of AI-powered search, the shift from click-based to answer-based information retrieval, and the growing importance of structured content for machine comprehension.
As AI search adoption accelerates—with OpenAI reporting roughly 700 million weekly active users and Google’s Gemini reaching 400 million monthly active users—the strategic importance of being accurately represented in AI-generated answers will only increase. Whether llms.txt becomes the definitive standard for AI content curation or is superseded by a more formalized protocol, the underlying principle—giving site owners a voice in how AI understands their content—is likely here to stay.
llms.txt represents a practical, low-cost step that website owners can take today to improve how AI systems understand and represent their content. While it is not a magic bullet for AI search visibility and its adoption by major LLMs remains inconsistent, the combination of minimal implementation cost, growing crawler interest, and the strategic importance of AI content optimization makes it a worthwhile addition to any modern web content strategy.
For web developers, marketers, and content strategists navigating the shift from traditional SEO to Generative Engine Optimization, llms.txt is less about immediate, measurable impact and more about positioning—ensuring your most important content is clearly organized, accurately described, and ready for the AI-driven web that is already here.
Leave a Reply