llms.txt directory — Technical overview
The llms.txt directory is a curated, searchable index of websites publishing machine-readable /llms.txt and /llms-full.txt files. It collects metadata (category, description, main domain), token counts for both short and full manifests, availability status, and direct links to each site's llms endpoints.
Key features
- Aggregated registry of sites exposing /llms.txt and /llms-full.txt, including token counts and availability flags.
- Full-text and short manifest tracking: tracks both /llms.txt and /llms-full.txt where available for precise context budgets.
- Filtering & search: search bar, category filters and sorting to find sites by domain, category or token footprint.
- Useful metadata: category, description, mainDomain, down flag, skipCheck, and token-size metrics to assist ingestion planning.
- Submission flow: self-service "Submit" form for site owners to add or update entries.
Technical use cases
- Data ingestion planning for RAG pipelines: pick domains and files by token counts to control context size and cost.
- Automated crawlers and indexers: integrate the directory as a source of canonical llms endpoints to fetch site manifests.
- AI agents and browsers: discover sites that explicitly publish machine-readable usage instructions and context to improve prompt safety and fidelity at inference time.
- SEO & GEO tooling: identify sites that surface content specifically for LLMs to optimize visibility in generative search.
Target users
- LLM engineers building retrieval-augmented systems
- API and platform engineers who need canonical site manifests for automated crawlers
- SEO and Growth engineers optimizing for generative search engines
- Site owners who want to expose structured guidance to AI systems
Implementation notes
- Each entry includes direct links to /llms.txt and /llms-full.txt; many entries include precomputed token counts to aid selection.
- The directory is built with Astro + shadcn/ui + Tailwind CSS and offers an open submission link for additions.
Unique selling points
- Focused on the emerging standard of llms.txt: provides a single authoritative index for LLM-aware site manifests.
- Token-count-aware: helps teams design cost-efficient retrieval and context injection strategies.
- Community-driven: easy submission and curated entries help maintain quality and relevance.