Robots.txt Generator
Create customized robots.txt files to control how search engines crawl your website. Use templates or build your own rules.
Quick Templates
Start with a common template and customize as needed
Configure Rules
Customize crawling rules for your website
No rules added. Click "Allow" or "Disallow" to add rules.
Minimum time between requests (not supported by all bots)
Link to your XML sitemap
No Robots.txt Yet
Configure rules and click "Generate" to create your robots.txt file
Your generated robots.txt will appear here
What is robots.txt?
A robots.txt file tells search engine crawlers which pages or files they can or can't request from your site.
- Controls crawler access
- Prevents overload on server
- Specifies sitemap location
- Part of Robots Exclusion Protocol
Complete Guide to Robots.txt Files
The robots.txt file is a simple but powerful text file that sits in your website's root directory and tells search engine crawlers which parts of your site they can and cannot access. It's one of the first files crawlers check when visiting your website.
Why Robots.txt Matters
🎯 Control Crawling
Prevent search engines from crawling sensitive pages, duplicate content, or resource-heavy pages.
⚡ Save Crawl Budget
Direct crawlers to your most important pages by blocking low-value URLs.
🔒 Protect Resources
Prevent server overload by controlling how frequently bots can crawl your site.
📍 Sitemap Discovery
Provide a direct link to your XML sitemap for faster page discovery.
⚠️ Important Warning:
Robots.txt does NOT provide security. Blocked pages can still be indexed if linked from other sites, and malicious bots may ignore your rules. Use proper authentication for sensitive content.
Robots.txt Syntax Explained
User-agent: *Specifies which crawler the rules apply to. * means all crawlers.
Disallow: /private/Blocks crawlers from accessing the specified path. Use Disallow: / to block everything.
Allow: /public/Explicitly allows crawling of a path. Useful for overriding broader Disallow rules.
Sitemap: https://example.com/sitemap.xmlTells crawlers where to find your XML sitemap. Can include multiple sitemap directives.
Crawl-delay: 10Sets minimum delay (seconds) between requests. Note: Not supported by Google, but respected by Bing and others.
Common Robots.txt Examples
Allow All Crawling (Default)
User-agent: * Disallow:
Block All Crawling
User-agent: * Disallow: /
Block Specific Folders
User-agent: * Disallow: /admin/ Disallow: /private/ Disallow: /tmp/
WordPress Site
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Disallow: /wp-includes/ Sitemap: https://example.com/sitemap.xml
Best Practices for Robots.txt
Keep It Simple
Only block what you need to. Overly complex rules can cause issues and confusion.
Test Before Deploying
Use Google Search Console's robots.txt Tester to validate your file before uploading.
Include Your Sitemap
Always add a Sitemap directive pointing to your XML sitemap for better crawling efficiency.
Don't Block CSS/JS Files
Google needs to access these files to render pages properly. Blocking them can hurt SEO.
Don't Use It for Security
Robots.txt is publicly accessible. Use proper authentication and meta robots tags for security.
Don't Block Pages You Want Indexed
If you block a page in robots.txt, search engines won't index it. Use noindex meta tags instead if needed.
How to Deploy Your Robots.txt File
Save the File
Download or copy the generated robots.txt content and save it as a plain text file named "robots.txt".
Upload to Root Directory
Upload the file to your website's root directory (public_html, www, or htdocs) so it's accessible at https://yoursite.com/robots.txt
Test Accessibility
Visit https://yoursite.com/robots.txt in your browser to confirm it's publicly accessible.
Validate in Search Console
Use Google Search Console's robots.txt Tester tool to validate your file and check for errors.
Monitor & Update
Check Search Console regularly for crawl errors and update your robots.txt as your site structure changes.
Robots.txt Examples by Website Type
Real-world robots.txt configurations for different types of websites
🛒 E-commerce Store
User-agent: * Disallow: /checkout/ Disallow: /cart/ Disallow: /account/ Disallow: /admin/ Disallow: /search? Allow: /products/ Sitemap: https://shop.com/sitemap.xml
Why: Blocks transactional pages that create duplicate content, allows product pages, and prevents crawling of internal search results with parameters.
✍️ Blog/Content Site
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Disallow: /wp-includes/ Disallow: /?s= Disallow: /*?replytocom Sitemap: https://blog.com/sitemap.xml
Why: Standard WordPress configuration that blocks admin areas, search results, and comment reply URLs while allowing necessary AJAX functionality.
💼 SaaS Application
User-agent: * Disallow: /app/ Disallow: /dashboard/ Disallow: /login/ Disallow: /signup/ Allow: /docs/ Allow: /api/docs/ Sitemap: https://saas.com/sitemap.xml
Why: Blocks the actual application interface and auth pages while allowing public-facing documentation to be crawled for SEO benefit.
📰 News/Magazine Site
User-agent: * Disallow: /print/ Disallow: /amp/*/print Disallow: /newsletter-signup/ Allow: /amp/ User-agent: Googlebot-News Allow: / Sitemap: https://news.com/sitemap.xml Sitemap: https://news.com/news-sitemap.xml
Why: Allows AMP pages but blocks print versions, specifically allows Google News bot full access, and references both standard and news-specific sitemaps.
💬 Forum/Community Site
User-agent: * Disallow: /members/ Disallow: /messages/ Disallow: /search/ Disallow: /ucp.php Disallow: /*?sort= Disallow: /*page=* Crawl-delay: 10 Sitemap: https://forum.com/sitemap.xml
Why: Blocks user profiles and private messaging, prevents crawling of search and sorted/paginated views, adds crawl delay to prevent server overload from deep crawling.
🌍 Multi-language Website
User-agent: * Allow: /en/ Allow: /es/ Allow: /fr/ Allow: /de/ Disallow: /admin/ Disallow: /*?lang= Sitemap: https://global.com/sitemap-en.xml Sitemap: https://global.com/sitemap-es.xml Sitemap: https://global.com/sitemap-fr.xml Sitemap: https://global.com/sitemap-de.xml
Why: Explicitly allows language-specific directories, blocks language parameters to prevent duplicates, and provides separate sitemaps for each language.
Common Robots.txt Mistakes to Avoid
Learn from these critical errors that can damage your SEO
❌ Accidentally Blocking Your Entire Site
One of the most catastrophic mistakes: using Disallow: / when you didn't intend to block everything. This prevents all search engines from crawling your site.
Always test your robots.txt in Google Search Console before deploying to production!
❌ Blocking CSS and JavaScript Files
Blocking /css/ or /js/ directories prevents Google from rendering your pages correctly, which can severely hurt your rankings.
❌ Using Robots.txt Instead of Noindex
If you block a page with robots.txt, Google can't crawl it to see the noindex tag. This means the page might still appear in search results with no description. For pages you don't want indexed, use meta robots noindex instead.
Disallow: /old-page/<meta name="robots" content="noindex, follow">❌ Incorrect File Placement
Robots.txt MUST be in your root directory. It won't work in subdirectories or with different names.
The file name is case-sensitive on some servers - always use lowercase "robots.txt"
❌ Using Wildcards Incorrectly
Not all bots support wildcards (*) the same way. Test your patterns carefully.
❌ Not Including Sitemap Reference
Forgetting to add your sitemap URL means bots might not discover all your pages efficiently. Always include a Sitemap directive.
❌ Blocking Pages with Sensitive Information
Robots.txt is PUBLIC and can draw attention to pages you want to hide. Malicious actors often check robots.txt to find interesting targets.
Advanced Robots.txt Strategies
Professional techniques for complex websites
1. Bot-Specific Rules for Different Crawlers
Create different rules for different bots to optimize crawling based on bot capabilities and your needs.
# Default rules for all bots User-agent: * Disallow: /admin/ Crawl-delay: 5 # Google can crawl more aggressively User-agent: Googlebot Disallow: /admin/ Crawl-delay: 1 # Allow Google News unrestricted access User-agent: Googlebot-News Allow: / # Slow down aggressive crawlers User-agent: Bingbot Crawl-delay: 10
2. Block Bad Bots While Allowing Good Ones
Block known scraper bots and content thieves while keeping legitimate search engines.
# Block known bad bots User-agent: AhrefsBot Disallow: / User-agent: SemrushBot Disallow: / User-agent: MJ12bot Disallow: / # Allow legitimate search engines User-agent: Googlebot User-agent: Bingbot User-agent: DuckDuckBot Allow: /
Note: Bad bots often ignore robots.txt, so also use server-side blocking and rate limiting.
3. Handling URL Parameters
Prevent crawling of infinite URL combinations created by parameters, filters, and sorting.
User-agent: * # Block tracking parameters Disallow: /*?utm_source= Disallow: /*?utm_medium= Disallow: /*?fbclid= # Block sorting and filtering Disallow: /*?sort= Disallow: /*?filter= Disallow: /*?page= # Block internal search results Disallow: /*?s= Disallow: /*?search= Disallow: /search?
4. Using Allow to Override Disallow
Block a broad directory but allow specific subdirectories or files within it.
User-agent: * # Block entire directory Disallow: /wp-admin/ # But allow specific file needed for functionality Allow: /wp-admin/admin-ajax.php # Block all user directories Disallow: /users/ # But allow specific public profile pages Allow: /users/*/profile/
5. Multiple Sitemaps for Large Sites
Reference multiple sitemaps to help search engines discover all your content efficiently.
User-agent: * Disallow: /admin/ # Multiple sitemaps for different content types Sitemap: https://example.com/sitemap-index.xml Sitemap: https://example.com/sitemap-products.xml Sitemap: https://example.com/sitemap-blog.xml Sitemap: https://example.com/sitemap-images.xml Sitemap: https://example.com/sitemap-videos.xml
6. Staging and Development Environment Protection
Completely block search engines from indexing your staging, development, or testing sites.
# Block everything on staging User-agent: * Disallow: / # Also add this meta tag to all staging pages: # <meta name="robots" content="noindex, nofollow">
Pro tip: Also password-protect staging sites and use different subdomains.
Testing and Monitoring Your Robots.txt
Ensure your robots.txt is working correctly
Use Google Search Console's Robots.txt Tester
Before deploying, test your robots.txt file:
- • Go to Google Search Console
- • Navigate to robots.txt Tester tool
- • Paste your robots.txt content
- • Test specific URLs to see if they're blocked
- • Check for syntax errors and warnings
Verify File Accessibility
After deployment, immediately check:
Monitor Crawl Stats in Search Console
Track how your robots.txt affects crawling:
- • Check "Crawl Stats" report weekly
- • Monitor for sudden drops in crawl rate
- • Look for robots.txt fetch errors
- • Verify important pages are still being crawled
Set Up Monitoring Alerts
Prevent catastrophic mistakes:
Regular Audits
Review your robots.txt quarterly or when making major site changes:
- • Are all blocked paths still relevant?
- • Have you added new sections that should be blocked?
- • Is your sitemap URL still correct?
- • Are crawl delays still appropriate?
- • Do rules align with current SEO strategy?
Frequently Asked Questions
Does robots.txt affect my search rankings?
Not directly, but it affects what pages get crawled and indexed. Blocking important pages will hurt rankings. However, blocking low-value pages (like admin areas, duplicate content, or parameter URLs) can actually help by letting crawlers focus on your important content - this is called "crawl budget optimization."
What happens if I don't have a robots.txt file?
If robots.txt is missing (returns 404), crawlers will assume everything is allowed. This is fine for small sites with all public content, but larger sites benefit from explicitly controlling crawler access to optimize crawl budget and avoid indexing low-value pages.
Can I use robots.txt to completely hide pages from Google?
No. Robots.txt prevents crawling, but pages can still appear in search results if linked from other sites (though without descriptions). To truly deindex pages, use noindex meta tags. To hide content entirely, use password protection or server-side authentication.
Do all search engines respect robots.txt?
Legitimate search engines (Google, Bing, Yahoo, etc.) respect robots.txt. However, it's not enforced - malicious bots, scrapers, and hackers often ignore it. Consider robots.txt as polite requests to good bots, not a security mechanism. For actual protection, use authentication, IP blocking, and rate limiting.
Should I block my images folder?
Generally no. Blocking images prevents them from appearing in Google Images search, which is a significant traffic source for many sites. Only block images if you have a specific reason (copyright protection, bandwidth concerns, or they're not relevant for discovery).
How long does it take for robots.txt changes to take effect?
Crawlers typically cache robots.txt for up to 24 hours. Most major search engines check it more frequently (hourly for active sites). If you make critical changes, you can request a re-fetch in Google Search Console, but it may still take hours to days for the full effect as crawlers gradually re-crawl your site.
What's the difference between Disallow and Noindex?
Disallow (in robots.txt) prevents crawling but pages may still be indexed from external links. Noindex (meta tag on page) requires crawling to be seen and tells search engines not to index. For pages you don't want in search results: allow crawling but add noindex meta tag. Never use both together as noindex won't be seen if crawling is blocked.
Can I have multiple robots.txt files for different sections?
No. Only one robots.txt file per domain/subdomain, and it must be in the root directory. Each subdomain can have its own robots.txt (blog.example.com/robots.txt is separate from example.com/robots.txt), but you cannot have example.com/section/robots.txt.
Should I use crawl-delay?
Only if your server is struggling with crawler load. Google ignores crawl-delay (use Search Console to adjust crawl rate instead). Bing and some other crawlers respect it. A delay of 5-10 seconds is reasonable. Too high (30+) might cause slow indexing. Most modern sites don't need crawl-delay at all.
What if my robots.txt accidentally blocks everything?
Fix it immediately! Upload a corrected version. Submit it for re-crawling in Google Search Console. It may take days to weeks for Google to fully re-crawl your site. Monitor "Index Coverage" and "Crawl Stats" reports. This is why testing with Google Search Console's robots.txt tester before deployment is absolutely critical - one typo can deindex your entire site.