# convertintomp4.com — robots.txt # Content Signals (contentsignals.org, draft-romm-aipref-contentsignals) # Pure offline scrapers / content harvesters — no indexing value User-agent: MJ12bot User-agent: DotBot User-agent: BLEXBot User-agent: AspiegelBot User-agent: HTTrack User-agent: WebCopier User-agent: WebZIP Disallow: / # AI discovery bots — allow everything so content appears in ChatGPT, Perplexity, Claude, Gemini User-agent: GPTBot User-agent: ClaudeBot User-agent: anthropic-ai User-agent: cohere-ai User-agent: PerplexityBot User-agent: CCBot User-agent: Bytespider User-agent: Google-Extended Allow: / Disallow: /api/auth/ Disallow: /api/admin/ Disallow: /api/webhooks/ Disallow: /admin/ Disallow: /account/ Disallow: /auth/ # SEO audit tools User-agent: SemrushBot User-agent: AhrefsBot User-agent: PetalBot User-agent: DataForSeoBot User-agent: SiteAuditBot User-agent: ZoominfoBot Allow: / Disallow: /api/auth/ Disallow: /api/admin/ Disallow: /api/webhooks/ Disallow: /admin/ Disallow: /account/ Disallow: /auth/ Crawl-delay: 10 # Wildcard — all legitimate crawlers + AI content-use signals # NB: /api/{video,audio,...} are indexable MARKETING pages (api-products # sitemap) — only the real endpoint groups under /api/ are disallowed. User-agent: * Content-Signal: search=yes, ai-input=yes, ai-train=no Allow: / Disallow: /api/auth/ Disallow: /api/admin/ Disallow: /api/webhooks/ Disallow: /api/health Disallow: /api/md-view Disallow: /_next/ Disallow: /private/ Disallow: /admin/ Disallow: /account/ Disallow: /auth/ Disallow: /dashboard/ # Low-traffic locales — English-only blog content, translated chrome = thin duplicates Disallow: /ar/blog/ Disallow: /cs/blog/ Disallow: /da/blog/ Disallow: /el/blog/ Disallow: /fi/blog/ Disallow: /hu/blog/ Disallow: /nb/blog/ Disallow: /ro/blog/ Disallow: /sv/blog/ Disallow: /th/blog/ Disallow: /uk/blog/ Disallow: /vi/blog/ Crawl-delay: 10 Sitemap: https://convertintomp4.com/s-m4p/index.xml