Robots.txt Generator

Create customized robots.txt files to control how search engines crawl your website. Use templates or build your own rules.

Quick Templates

Start with a common template and customize as needed

Configure Rules

Customize crawling rules for your website

User-agent

Rules

No rules added. Click "Allow" or "Disallow" to add rules.

Crawl Delay (seconds, optional)

Minimum time between requests (not supported by all bots)

Sitemap URL (optional)

Link to your XML sitemap

No Robots.txt Yet

Configure rules and click "Generate" to create your robots.txt file

Your generated robots.txt will appear here

What is robots.txt?

A robots.txt file tells search engine crawlers which pages or files they can or can't request from your site.

Controls crawler access
Prevents overload on server
Specifies sitemap location
Part of Robots Exclusion Protocol

Complete Guide to Robots.txt Files

The robots.txt file is a simple but powerful text file that sits in your website's root directory and tells search engine crawlers which parts of your site they can and cannot access. It's one of the first files crawlers check when visiting your website.

Why Robots.txt Matters

🎯 Control Crawling

Prevent search engines from crawling sensitive pages, duplicate content, or resource-heavy pages.

⚡ Save Crawl Budget

Direct crawlers to your most important pages by blocking low-value URLs.

🔒 Protect Resources

Prevent server overload by controlling how frequently bots can crawl your site.

📍 Sitemap Discovery

Provide a direct link to your XML sitemap for faster page discovery.

⚠️ Important Warning:

Robots.txt does NOT provide security. Blocked pages can still be indexed if linked from other sites, and malicious bots may ignore your rules. Use proper authentication for sensitive content.

Robots.txt Syntax Explained

User-agent: *

Specifies which crawler the rules apply to. * means all crawlers.

Disallow: /private/

Blocks crawlers from accessing the specified path. Use Disallow: / to block everything.

Allow: /public/

Explicitly allows crawling of a path. Useful for overriding broader Disallow rules.

Sitemap: https://example.com/sitemap.xml

Tells crawlers where to find your XML sitemap. Can include multiple sitemap directives.

Crawl-delay: 10

Sets minimum delay (seconds) between requests. Note: Not supported by Google, but respected by Bing and others.

Common Robots.txt Examples

Allow All Crawling (Default)

User-agent: *
Disallow:

Block All Crawling

User-agent: *
Disallow: /

Block Specific Folders

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/

WordPress Site

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Sitemap: https://example.com/sitemap.xml

Best Practices for Robots.txt

✓

Keep It Simple

Only block what you need to. Overly complex rules can cause issues and confusion.

✓

Test Before Deploying

Use Google Search Console's robots.txt Tester to validate your file before uploading.

✓

Include Your Sitemap

Always add a Sitemap directive pointing to your XML sitemap for better crawling efficiency.

✗

Don't Block CSS/JS Files

Google needs to access these files to render pages properly. Blocking them can hurt SEO.

✗

Don't Use It for Security

Robots.txt is publicly accessible. Use proper authentication and meta robots tags for security.

✗

Don't Block Pages You Want Indexed

If you block a page in robots.txt, search engines won't index it. Use noindex meta tags instead if needed.

How to Deploy Your Robots.txt File

Save the File

Download or copy the generated robots.txt content and save it as a plain text file named "robots.txt".

Upload to Root Directory

Upload the file to your website's root directory (public_html, www, or htdocs) so it's accessible at https://yoursite.com/robots.txt

Test Accessibility

Visit https://yoursite.com/robots.txt in your browser to confirm it's publicly accessible.

Validate in Search Console

Use Google Search Console's robots.txt Tester tool to validate your file and check for errors.

Monitor & Update

Check Search Console regularly for crawl errors and update your robots.txt as your site structure changes.

Robots.txt Examples by Website Type

Real-world robots.txt configurations for different types of websites

🛒 E-commerce Store

User-agent: *
Disallow: /checkout/
Disallow: /cart/
Disallow: /account/
Disallow: /admin/
Disallow: /search?
Allow: /products/

Sitemap: https://shop.com/sitemap.xml

Why: Blocks transactional pages that create duplicate content, allows product pages, and prevents crawling of internal search results with parameters.

✍️ Blog/Content Site

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /?s=
Disallow: /*?replytocom

Sitemap: https://blog.com/sitemap.xml

Why: Standard WordPress configuration that blocks admin areas, search results, and comment reply URLs while allowing necessary AJAX functionality.

💼 SaaS Application

User-agent: *
Disallow: /app/
Disallow: /dashboard/
Disallow: /login/
Disallow: /signup/
Allow: /docs/
Allow: /api/docs/

Sitemap: https://saas.com/sitemap.xml

Why: Blocks the actual application interface and auth pages while allowing public-facing documentation to be crawled for SEO benefit.

📰 News/Magazine Site

User-agent: *
Disallow: /print/
Disallow: /amp/*/print
Disallow: /newsletter-signup/
Allow: /amp/

User-agent: Googlebot-News
Allow: /

Sitemap: https://news.com/sitemap.xml
Sitemap: https://news.com/news-sitemap.xml

Why: Allows AMP pages but blocks print versions, specifically allows Google News bot full access, and references both standard and news-specific sitemaps.

💬 Forum/Community Site

User-agent: *
Disallow: /members/
Disallow: /messages/
Disallow: /search/
Disallow: /ucp.php
Disallow: /*?sort=
Disallow: /*page=*
Crawl-delay: 10

Sitemap: https://forum.com/sitemap.xml

Why: Blocks user profiles and private messaging, prevents crawling of search and sorted/paginated views, adds crawl delay to prevent server overload from deep crawling.

🌍 Multi-language Website

User-agent: *
Allow: /en/
Allow: /es/
Allow: /fr/
Allow: /de/
Disallow: /admin/
Disallow: /*?lang=

Sitemap: https://global.com/sitemap-en.xml
Sitemap: https://global.com/sitemap-es.xml
Sitemap: https://global.com/sitemap-fr.xml
Sitemap: https://global.com/sitemap-de.xml

Why: Explicitly allows language-specific directories, blocks language parameters to prevent duplicates, and provides separate sitemaps for each language.

Common Robots.txt Mistakes to Avoid

Learn from these critical errors that can damage your SEO

❌ Accidentally Blocking Your Entire Site

One of the most catastrophic mistakes: using Disallow: / when you didn't intend to block everything. This prevents all search engines from crawling your site.

# ❌ WRONG - Blocks everything!

User-agent: *

Disallow: /

# ✓ CORRECT - Allows everything

User-agent: *

Disallow:

Always test your robots.txt in Google Search Console before deploying to production!

❌ Blocking CSS and JavaScript Files

Blocking /css/ or /js/ directories prevents Google from rendering your pages correctly, which can severely hurt your rankings.

# ❌ DON'T DO THIS

Disallow: /css/

Disallow: /js/

Disallow: *.css

# ✓ Let Google access these resources

❌ Using Robots.txt Instead of Noindex

If you block a page with robots.txt, Google can't crawl it to see the noindex tag. This means the page might still appear in search results with no description. For pages you don't want indexed, use meta robots noindex instead.

❌ Wrong approach for deindexing:

Disallow: /old-page/

✓ Correct approach - Add to page HTML:

<meta name="robots" content="noindex, follow">

❌ Incorrect File Placement

Robots.txt MUST be in your root directory. It won't work in subdirectories or with different names.

❌ https://example.com/pages/robots.txt

❌ https://example.com/robot.txt

❌ https://example.com/Robots.txt

✓ https://example.com/robots.txt

The file name is case-sensitive on some servers - always use lowercase "robots.txt"

❌ Using Wildcards Incorrectly

Not all bots support wildcards (*) the same way. Test your patterns carefully.

# Google supports these patterns:

Disallow: /*.pdf$ (blocks all PDF files)

Disallow: /*?sort= (blocks URLs with ?sort=)

# But other bots might not

❌ Not Including Sitemap Reference

Forgetting to add your sitemap URL means bots might not discover all your pages efficiently. Always include a Sitemap directive.

# ✓ Always add this at the end:

Sitemap: https://yoursite.com/sitemap.xml

❌ Blocking Pages with Sensitive Information

Robots.txt is PUBLIC and can draw attention to pages you want to hide. Malicious actors often check robots.txt to find interesting targets.

# ❌ This tells everyone where your sensitive pages are!

Disallow: /secret-admin-portal/

Disallow: /confidential-docs/

# ✓ Use proper authentication instead

Advanced Robots.txt Strategies

Professional techniques for complex websites

1. Bot-Specific Rules for Different Crawlers

Create different rules for different bots to optimize crawling based on bot capabilities and your needs.

# Default rules for all bots
User-agent: *
Disallow: /admin/
Crawl-delay: 5

# Google can crawl more aggressively
User-agent: Googlebot
Disallow: /admin/
Crawl-delay: 1

# Allow Google News unrestricted access
User-agent: Googlebot-News
Allow: /

# Slow down aggressive crawlers
User-agent: Bingbot
Crawl-delay: 10

2. Block Bad Bots While Allowing Good Ones

Block known scraper bots and content thieves while keeping legitimate search engines.

# Block known bad bots
User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: MJ12bot
Disallow: /

# Allow legitimate search engines
User-agent: Googlebot
User-agent: Bingbot
User-agent: DuckDuckBot
Allow: /

Note: Bad bots often ignore robots.txt, so also use server-side blocking and rate limiting.

3. Handling URL Parameters

Prevent crawling of infinite URL combinations created by parameters, filters, and sorting.

User-agent: *
# Block tracking parameters
Disallow: /*?utm_source=
Disallow: /*?utm_medium=
Disallow: /*?fbclid=

# Block sorting and filtering
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?page=

# Block internal search results
Disallow: /*?s=
Disallow: /*?search=
Disallow: /search?

4. Using Allow to Override Disallow

Block a broad directory but allow specific subdirectories or files within it.

User-agent: *
# Block entire directory
Disallow: /wp-admin/

# But allow specific file needed for functionality
Allow: /wp-admin/admin-ajax.php

# Block all user directories
Disallow: /users/

# But allow specific public profile pages
Allow: /users/*/profile/

5. Multiple Sitemaps for Large Sites

Reference multiple sitemaps to help search engines discover all your content efficiently.

User-agent: *
Disallow: /admin/

# Multiple sitemaps for different content types
Sitemap: https://example.com/sitemap-index.xml
Sitemap: https://example.com/sitemap-products.xml
Sitemap: https://example.com/sitemap-blog.xml
Sitemap: https://example.com/sitemap-images.xml
Sitemap: https://example.com/sitemap-videos.xml

6. Staging and Development Environment Protection

Completely block search engines from indexing your staging, development, or testing sites.

# Block everything on staging
User-agent: *
Disallow: /

# Also add this meta tag to all staging pages:
# <meta name="robots" content="noindex, nofollow">

Pro tip: Also password-protect staging sites and use different subdomains.

Testing and Monitoring Your Robots.txt

Ensure your robots.txt is working correctly

Use Google Search Console's Robots.txt Tester

Before deploying, test your robots.txt file:

• Go to Google Search Console
• Navigate to robots.txt Tester tool
• Paste your robots.txt content
• Test specific URLs to see if they're blocked
• Check for syntax errors and warnings

Verify File Accessibility

After deployment, immediately check:

• Visit https://yoursite.com/robots.txt in browser

• Confirm file loads without errors (200 status code)

• Check that content displays correctly (not 404 or redirect)

• Verify no server adds unwanted headers or modifications

Monitor Crawl Stats in Search Console

Track how your robots.txt affects crawling:

• Check "Crawl Stats" report weekly
• Monitor for sudden drops in crawl rate
• Look for robots.txt fetch errors
• Verify important pages are still being crawled

Set Up Monitoring Alerts

Prevent catastrophic mistakes:

Uptime Monitoring

Alert if robots.txt returns 404 or 500 errors

Content Monitoring

Alert if robots.txt content changes unexpectedly

Index Coverage

Alert if indexed page count drops dramatically

Search Console Errors

Alert for robots.txt fetch failures

Regular Audits

Review your robots.txt quarterly or when making major site changes:

• Are all blocked paths still relevant?
• Have you added new sections that should be blocked?
• Is your sitemap URL still correct?
• Are crawl delays still appropriate?
• Do rules align with current SEO strategy?

Frequently Asked Questions

Does robots.txt affect my search rankings?

Not directly, but it affects what pages get crawled and indexed. Blocking important pages will hurt rankings. However, blocking low-value pages (like admin areas, duplicate content, or parameter URLs) can actually help by letting crawlers focus on your important content - this is called "crawl budget optimization."

What happens if I don't have a robots.txt file?

If robots.txt is missing (returns 404), crawlers will assume everything is allowed. This is fine for small sites with all public content, but larger sites benefit from explicitly controlling crawler access to optimize crawl budget and avoid indexing low-value pages.

Can I use robots.txt to completely hide pages from Google?

No. Robots.txt prevents crawling, but pages can still appear in search results if linked from other sites (though without descriptions). To truly deindex pages, use noindex meta tags. To hide content entirely, use password protection or server-side authentication.

Do all search engines respect robots.txt?

Legitimate search engines (Google, Bing, Yahoo, etc.) respect robots.txt. However, it's not enforced - malicious bots, scrapers, and hackers often ignore it. Consider robots.txt as polite requests to good bots, not a security mechanism. For actual protection, use authentication, IP blocking, and rate limiting.

Should I block my images folder?

Generally no. Blocking images prevents them from appearing in Google Images search, which is a significant traffic source for many sites. Only block images if you have a specific reason (copyright protection, bandwidth concerns, or they're not relevant for discovery).

How long does it take for robots.txt changes to take effect?

Crawlers typically cache robots.txt for up to 24 hours. Most major search engines check it more frequently (hourly for active sites). If you make critical changes, you can request a re-fetch in Google Search Console, but it may still take hours to days for the full effect as crawlers gradually re-crawl your site.

What's the difference between Disallow and Noindex?

Disallow (in robots.txt) prevents crawling but pages may still be indexed from external links. Noindex (meta tag on page) requires crawling to be seen and tells search engines not to index. For pages you don't want in search results: allow crawling but add noindex meta tag. Never use both together as noindex won't be seen if crawling is blocked.

Can I have multiple robots.txt files for different sections?

No. Only one robots.txt file per domain/subdomain, and it must be in the root directory. Each subdomain can have its own robots.txt (blog.example.com/robots.txt is separate from example.com/robots.txt), but you cannot have example.com/section/robots.txt.

Should I use crawl-delay?

Only if your server is struggling with crawler load. Google ignores crawl-delay (use Search Console to adjust crawl rate instead). Bing and some other crawlers respect it. A delay of 5-10 seconds is reasonable. Too high (30+) might cause slow indexing. Most modern sites don't need crawl-delay at all.

What if my robots.txt accidentally blocks everything?

Fix it immediately! Upload a corrected version. Submit it for re-crawling in Google Search Console. It may take days to weeks for Google to fully re-crawl your site. Monitor "Index Coverage" and "Crawl Stats" reports. This is why testing with Google Search Console's robots.txt tester before deployment is absolutely critical - one typo can deindex your entire site.

Related Tools

Robots.txt Checker

Fetch, validate, and analyze robots.txt files for SEO issues

XML Sitemap Generator

Generate valid XML sitemaps for better SEO crawling

Technical SEO Audit

Comprehensive technical SEO analysis of any webpage

Complete SEO Report

Get a full website SEO analysis and action items

Need a SaaS SEO agency to handle your organic growth?

PikaSEO is a done-for-you SaaS SEO agency — we handle strategy, content, and publishing so your team stays focused on building product. From $299. 20M+ impressions delivered across B2B SaaS, startups, and enterprise software companies.

Explore SaaS SEO services