About this tool
Generate a robots.txt file to control how search engines crawl and index your website. Essential for SEO, preventing duplicate content issues, protecting private pages, managing crawl budget, and communicating with search engine bots about which pages to index.
The robots.txt file sits at your domain root (yoursite.com/robots.txt) and tells search engines like Google, Bing, and others which pages they can and cannot access. Proper configuration prevents indexing of admin pages, duplicate content, and resource-intensive pages.
Perfect for SEO specialists, web developers, site owners, and digital marketers who need to control search engine behavior. This generator creates properly formatted robots.txt files with common rules, sitemap declarations, and crawl-delay directives.
Used by websites of all sizes to manage search engine crawling, block sensitive pages (login, admin, checkout), prevent indexing of staging sites, and declare XML sitemap locations for better discoverability.
Practical Usage Examples
Basic site protection
Block admin and private sections from all search engines.
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /wp-admin/
Sitemap: https://site.com/sitemap.xml E-commerce site
Allow products but block checkout and cart pages.
User-agent: *
Allow: /products/
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/ Block specific bots
Allow Google but block aggressive scrapers.
User-agent: Googlebot
Allow: /
User-agent: BadBot
Disallow: /
Crawl-delay: 10 Step-by-Step Instructions
Enter the user agent — use * to target ALL search engines, or type a specific bot name (e.g., Googlebot, Bingbot).
In "Disallow Paths", enter one directory or URL path per line that you want to block from crawlers (e.g., /admin/).
In "Allow Paths", enter paths within a blocked directory that you want to explicitly permit (overrides Disallow).
Optionally enter a Crawl Delay in seconds — use only if your server struggles with crawler traffic.
Enter your Sitemap URL so crawlers can immediately discover all your site pages.
Click Run — copy the generated robots.txt and upload it to your website root at yoursite.com/robots.txt.
Core Benefits
Prevents search engines from indexing sensitive pages (admin, login, checkout)
Controls crawl budget by blocking low-value pages
Declares sitemap location for better search engine discovery
Blocks aggressive bots and scrapers
Prevents duplicate content issues
Properly formatted and validated syntax
Essential for every website's SEO strategy
Frequently Asked Questions
Robots.txt is a plain text file at your website root (yoursite.com/robots.txt) that tells search engine crawlers which pages to access or skip. It's essential for controlling crawl budget on large sites, blocking admin/private pages from appearing in search results, preventing duplicate content from being indexed, and guiding crawlers to your sitemap for complete page discovery.
Always in the root directory of your website, accessible at the exact URL https://yourdomain.com/robots.txt. It cannot be in subdirectories — /blog/robots.txt, /public/robots.txt, etc. will not work. Most hosting control panels let you access your root directory via FTP, cPanel File Manager, or CMS-specific settings (WordPress: upload to /public_html/).
The asterisk () is a wildcard meaning "all bots" — rules under User-agent: apply to every search engine and crawler that respects robots.txt. To target a specific bot, use its exact name: User-agent: Googlebot, User-agent: Bingbot, User-agent: GPTBot, etc. Bot-specific rules take precedence over wildcard rules for that specific bot.
Not necessarily. Disallow in robots.txt prevents Google from crawling the page, but the page can still appear in search results if other sites link to it. Google can infer a page exists from links without crawling it. For complete removal from search results, use a "noindex" meta tag on the page OR the URL removal tool in Google Search Console.
Disallow in robots.txt prevents crawling (bots skip the page entirely). Noindex is a tag on the page itself that tells bots "don't add this to your index." Critical: if you Disallow a URL, Google cannot read the noindex tag on that page. Use noindex for pages you want deindexed but crawled; use Disallow only to save crawl budget on pages you don't care about at all.
Yes. Add a separate rule group with the specific bot name: User-agent: Bingbot / Disallow: /private/. To block OpenAI's training crawler: User-agent: GPTBot / Disallow: /. To block Common Crawl (used for AI datasets): User-agent: CCBot / Disallow: /. Well-behaved bots from major companies follow these rules; malicious scrapers typically ignore them.
Add this line anywhere in your robots.txt file (conventionally at the end): Sitemap: https://yoursite.com/sitemap.xml. Use the full absolute URL including https://. You can add multiple Sitemap: lines for different site sections (Sitemap: https://example.com/news-sitemap.xml). This is recommended as it helps all search engines discover your pages, independent of which user-agent section they read.
Google has officially documented that Googlebot ignores the Crawl-delay directive in robots.txt. To control Googlebot's crawl rate, use Google Search Console: Settings → Crawl Rate → Limit Google's crawl rate. Other bots (Bing, Yandex) do respect Crawl-delay. Setting it to 2-5 seconds helps with bandwidth-limited servers for non-Google crawlers.
Test your robots.txt in Google Search Console: navigate to Settings → robots.txt and use the built-in tester. Also visit your URL directly (yoursite.com/robots.txt) to verify it's publicly accessible and has correct syntax. Bing Webmaster Tools has a similar robots.txt tester. For quick validation, tools like ryte.com/free-tools/robots-txt can analyze your file for syntax errors.
No — never block CSS, JavaScript, or image directories in robots.txt. Google needs to access and render these files to understand your page layout, design, and functionality. Blocking them prevents Google from properly understanding your pages, which can hurt rankings. Only block directories you truly don't want crawled: admin panels, private user areas, duplicate content paths.