Robots.txt Checker

Fetch, validate, and analyze robots.txt files from any domain. Our free robots.txt checker helps you identify crawling issues, syntax errors, and SEO problems that could affect your search rankings.

Check Robots.txt

Enter a domain to fetch and analyze its robots.txt file

Enter a domain name - we'll automatically fetch /robots.txt

What This Tool Checks

Our robots.txt checker fetches and analyzes your robots.txt file for:

User-agent Rules: Which bots can/cannot crawl
Allow/Disallow: Paths permitted or blocked
Sitemap References: Linked XML sitemaps
Crawl-delay: Requested delay between requests
Issues: Syntax errors, blocking problems

Common Robots.txt Directives

User-agent
Specifies which crawler the rules apply to
Disallow
Blocks specified paths from crawling
Allow
Permits crawling of specified paths (overrides Disallow)
Sitemap
Points to your XML sitemap location
Crawl-delay
Requests delay between crawler requests

No Results Yet

Enter a domain and click "Check Robots.txt" to analyze its crawling configuration

Your robots.txt analysis will appear here

How to Use This Robots.txt Checker Tool

Using our robots.txt checker is simple and straightforward. Follow these steps to analyze any website's robots.txt file:

1

Enter a Domain

Type or paste the domain you want to check. You can enter it with or without "https://" - we'll automatically fetch the /robots.txt file from the root domain.

2

Click "Check Robots.txt"

Our robots.txt validator fetches the file, parses all directives, and analyzes for potential SEO issues, syntax errors, and configuration problems.

3

Review the Analysis

See all User-agent rules, Allow/Disallow directives, Sitemap references, and any issues found. View the raw robots.txt content and export results as CSV.

For bulk checking, switch to the "Bulk Check" tab and enter multiple domains (one per line) to analyze up to 10 robots.txt files at once. Perfect for auditing multiple websites or comparing competitor configurations.

Why Robots.txt Matters for SEO

The robots.txt file is one of the most critical files for SEO and crawl management. It tells search engine crawlers which parts of your website they can and cannot access. A misconfigured robots.txt can lead to pages being excluded from search results or wasted crawl budget.

Crawl

Controls which pages search engines can crawl

Sitemap

Points crawlers to your XML sitemap

Budget

Helps manage crawl budget efficiently

Key SEO Impacts of Robots.txt:

  • Search Visibility: Accidentally blocking Googlebot can remove your entire site from Google search results. Our robots.txt tester helps you catch these critical errors.
  • Crawl Budget Optimization: For large sites, blocking unimportant pages (like admin areas, duplicate content, or parameters) preserves crawl budget for your most important pages.
  • Sitemap Discovery: Including Sitemap directives helps search engines find and crawl all your important pages faster, especially for new or updated content.
  • Security by Obscurity: While not true security, blocking sensitive directories can reduce exposure of admin panels and private areas to search engines.

Use our robots.txt validator regularly to ensure your crawling configuration supports rather than hinders your SEO efforts.

Robots.txt Syntax Explained

Understanding robots.txt syntax is essential for proper implementation. Here's a complete breakdown of each directive:

User-agent

Specify Target Crawler

The User-agent directive specifies which crawler the following rules apply to. Use * for all crawlers.

# Apply to all crawlers
User-agent: *
# Apply to Googlebot only
User-agent: Googlebot
# Apply to Bingbot only
User-agent: Bingbot
Disallow

Block Paths from Crawling

The Disallow directive prevents crawlers from accessing specified paths. An empty value allows all crawling.

# Block entire site
Disallow: /
# Block specific directory
Disallow: /admin/
# Block files with extension
Disallow: /*.pdf$
# Allow everything (empty value)
Disallow:
Allow

Override Disallow Rules

The Allow directive explicitly permits crawling of specific paths, even if a parent directory is disallowed. Googlebot supports this directive.

# Block /private/ but allow /private/public/
Disallow: /private/
Allow: /private/public/
# Allow specific file in blocked directory
Disallow: /downloads/
Allow: /downloads/catalog.pdf
Sitemap

Point to XML Sitemap

The Sitemap directive tells crawlers where to find your XML sitemap. Use absolute URLs. You can include multiple sitemaps.

# Single sitemap
Sitemap: https://example.com/sitemap.xml
# Multiple sitemaps
Sitemap: https://example.com/sitemap-posts.xml
Sitemap: https://example.com/sitemap-pages.xml
Sitemap: https://example.com/sitemap-images.xml
Crawl-delay

Request Crawl Speed Limit

The Crawl-delay directive requests crawlers wait a specified number of seconds between requests. Note: Googlebot ignores this directive.

# Wait 10 seconds between requests
Crawl-delay: 10
# Note: Googlebot ignores Crawl-delay
# Use Google Search Console instead

Bingbot and Yandex respect Crawl-delay, but Google does not. Configure Google's crawl rate in Search Console.

Best Practices for Robots.txt

Follow these best practices to ensure your robots.txt supports your SEO goals:

Always Include a Sitemap Reference

Add your XML sitemap URL to robots.txt so search engines can easily discover it. This is especially important for new sites or after major content updates.

Block Admin and Login Pages

Prevent crawlers from accessing /wp-admin/, /admin/, /login/, and similar administrative areas. These pages don't need to be indexed and waste crawl budget.

Don't Block CSS and JavaScript

Google needs to render your pages to understand their content. Blocking CSS and JS files can prevent proper rendering and hurt your rankings.

Use Specific User-agent Rules When Needed

If you need different rules for different crawlers, create specific User-agent blocks. For example, you might allow Googlebot-Image to access images but block other image bots.

Block Duplicate Content Paths

Prevent crawling of URL parameters, print pages, and other duplicate content. This helps focus crawl budget on your canonical pages and prevents duplicate content issues.

Test Before Deploying Changes

Always test robots.txt changes before deploying. Use this robots.txt checker and Google Search Console's robots.txt tester to verify rules work as expected.

Place Robots.txt in the Root Directory

Robots.txt must be at the root of your domain (e.g., example.com/robots.txt). It won't work in subdirectories like example.com/blog/robots.txt.

Use Noindex for True Blocking

Remember: robots.txt blocks crawling, not indexing. If a blocked page has inbound links, it may still appear in search results. Use noindex meta tags for guaranteed de-indexing.

Keep It Simple and Well-Commented

Add comments (#) to explain why each rule exists. This helps future maintainers understand the configuration and prevents accidental changes that break SEO.

Common Robots.txt Mistakes to Avoid

These are the most common robots.txt mistakes that can seriously harm your SEO:

Blocking Googlebot Entirely

Using Disallow: / for User-agent: * or User-agent: Googlebot will remove your entire site from Google.

# WRONG - Blocks everything!
User-agent: *
Disallow: /

Forgetting the Sitemap Directive

Not including a Sitemap directive means search engines have to discover your sitemap through other means. Always add it for faster and more complete indexing.

Blocking CSS and JavaScript Files

Many older guides recommend blocking /wp-content/ or /assets/. This prevents Google from rendering your pages properly, harming rankings.

# WRONG - Breaks rendering!
Disallow: /wp-content/
Disallow: /assets/

Using Relative Sitemap URLs

Sitemap URLs must be absolute (full URLs). Relative paths won't work and will be ignored.

# WRONG
Sitemap: /sitemap.xml

# CORRECT
Sitemap: https://example.com/sitemap.xml

Syntax Errors and Typos

Common typos like "Useragent" (no hyphen), "Dissallow" (double s), or missing colons will cause rules to be ignored.

# WRONG - typos!
Useragent: *
Dissallow: /private/

# CORRECT
User-agent: *
Disallow: /private/

Leaving robots.txt from Staging

Staging environments often use "Disallow: /" to prevent indexing. If you copy this to production, your live site becomes invisible to search engines. Always check after launching!

Thinking Robots.txt Provides Security

Robots.txt is publicly accessible and only a "request" to crawlers. Malicious bots ignore it. Never rely on robots.txt to hide sensitive content - use proper authentication instead.

Frequently Asked Questions About Robots.txt

What is a robots.txt file?

A robots.txt file is a text file placed in the root directory of a website that tells web crawlers (like Googlebot) which pages or sections they can or cannot access. It follows the Robots Exclusion Protocol standard and is one of the first files crawlers check when visiting your site.

Does every website need a robots.txt file?

No, it's not required. If you don't have a robots.txt file, search engines will crawl all accessible pages on your site. However, having one is strongly recommended because it lets you specify your sitemap location, block unnecessary pages from crawling, and optimize your crawl budget.

What's the difference between robots.txt and noindex?

Robots.txt blocks crawling - it prevents search engines from accessing the page. Noindex blocks indexing - it allows crawling but tells search engines not to show the page in results. Important: If a page is blocked by robots.txt, crawlers can't see the noindex tag, so the page might still appear in search results (showing just the URL without a snippet).

Where should robots.txt be located?

Robots.txt must be in the root directory of your domain. For example: https://example.com/robots.txt. It won't work in subdirectories. For subdomains, each needs its own robots.txt (e.g., https://blog.example.com/robots.txt).

How long does it take for robots.txt changes to take effect?

Google caches robots.txt for up to 24 hours. You can request a refresh in Google Search Console under the robots.txt tester tool. For critical changes, submit the updated robots.txt URL through Search Console for faster processing. Other search engines may have different caching periods.

Can I use wildcards in robots.txt?

Yes, Googlebot supports wildcards. Use * to match any sequence of characters and $ to mark end-of-URL. Examples: Disallow: /*.pdf$ blocks all PDFs, Disallow: /page?* blocks URLs with query parameters. Note: Not all crawlers support wildcards.

Why does Google ignore my Crawl-delay directive?

Googlebot does not respect the Crawl-delay directive. To control Google's crawl rate, use Google Search Console's Crawl Rate Settings instead. Other crawlers like Bingbot and Yandex do respect Crawl-delay. This is why our robots.txt validator specifically warns about Crawl-delay and Google.

How do I check if my robots.txt is working correctly?

Use this robots.txt checker tool to validate syntax and check for common issues. Additionally, use Google Search Console's robots.txt Tester to test specific URLs against your rules and see how Googlebot interprets them. Regularly monitor your Search Console coverage report for crawling errors.

What happens if robots.txt has errors or is unreachable?

If robots.txt returns a 4xx error, crawlers assume no restrictions and crawl everything. If it returns a 5xx error, Googlebot may temporarily stop crawling to avoid potentially accessing blocked content. A missing or empty robots.txt is treated as "allow all."

Can robots.txt stop pages from appearing in Google?

Not reliably. Robots.txt blocks crawling, but if other sites link to a blocked page, Google may still show it in search results (without a snippet). To truly prevent indexing, allow crawling and use a noindex meta tag or X-Robots-Tag header instead. Use robots.txt for crawl budget management, not for hiding content.

How do I allow Googlebot but block other crawlers?

Create specific User-agent blocks for each crawler. Rules are applied based on the most specific match. Example: User-agent: Googlebot followed by Disallow: (empty, allows all), then User-agent: * followed by Disallow: / blocks all other bots.

What is the maximum size for a robots.txt file?

Google limits robots.txt to 500KB (500 kilobytes). Content beyond this limit is ignored. If you need extensive rules, consider consolidating paths, using wildcards, or restructuring your site's URL architecture. Most well-structured sites rarely exceed a few kilobytes.

Tool Limitations

This robots.txt checker is a syntax validation and analysis tool. While powerful, it has some limitations you should be aware of:

Cannot Test Actual Crawler Behavior

This tool validates syntax and identifies common issues, but cannot simulate exactly how Googlebot or other crawlers will interpret every rule. Use Google Search Console's robots.txt tester for definitive Google behavior testing.

Cannot Verify Sitemap Accessibility

We check if Sitemap URLs are included and properly formatted, but don't verify if the sitemap files are actually accessible or valid XML. Test sitemaps separately.

Bot Protection May Block Requests

Some websites use Cloudflare, reCAPTCHA, or other protection that may block our requests. If you see connection errors, the robots.txt may still be accessible to actual search engine crawlers.

What This Tool DOES Check

Syntax validation, directive parsing, common SEO mistakes (blocking important bots, missing sitemaps, high crawl delays), typos, and overall configuration analysis. This covers the most important aspects of robots.txt validation.

Tip: For comprehensive testing, combine this tool with Google Search Console's robots.txt Tester, which can test specific URLs against your rules and shows exactly how Googlebot interprets them.

Related Tools