Join the Webstudio community

Updated 4 months ago

How do I create or edit the "robots.txt"?

At a glance

The post asks if there is a way to create or edit the robots.txt file in Webstudio, as the documentation does not specify how to do it. The comments indicate that currently, there is no way to add or edit the robots.txt file in Webstudio. Some community members express a need for this feature, particularly to block AI crawlers from accessing their content. However, other community members suggest that robots.txt is not a reliable way to block crawlers, as they may not respect the protocol. The discussion also touches on using Cloudflare as an alternative solution to block AI crawlers. In the end, the Webstudio team confirms that they do not plan to add the ability to edit robots.txt, but users can use Cloudflare to rewrite the URL and point it to a custom robots.txt file.

Useful resources
I found in the documentation that Webstudio creates robots.txt, but it doesn't specify how to create or edit it. Is there currently a way of doing it?
3
M
B
J
30 comments
@briv Right now, There is no way to add its own robots.txt or edit the robots.txt. @TrySound do you plan to add it or not? I cannot find any issue on it.
What do you want to specify in custom robots.txt?
I know it's not prio but for some technical reasons in SEO the ability to make some edits on robots.txt would be very nice.

Here are some examples I'm not sure it would be possible to have right now in the robots.txt on Webstudio:
I've also run into wanting to edit the robots.txt file in the past. The one reason I remember was to disallow different subdomains.
I know it's possible to edit it on wordpress ou Webflow. It could be nice to create an issue on that I think
Hi, everyone; thanks for checking it out.

I'll also need to do what @Milan Boisgard | Uncode School suggested as a case.
In my case, I'm developing a multilingual portal with original content, and I will also need to block all AI crawlers, as you can see here in the list below. I want to allow only search engine crawlers.

AI Crawlers:
User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: CCBot Disallow: / User-agent: FacebookBot Disallow: / User-agent: OmgiliBot Disallow: / User-agent: anthropic-ai Disallow: / User-Agent: PerplexityBot Disallow: /
Yes! It's super important. Nowadays, it's not good to create a blog, spend time creating original content (which requires a lot of investment), and have it crawled without permission by OpenAI, Claude, Perplexity, You.com, and so on.

Webstudio CMS system is super flexible and great (I love it so far), and I'm using it instead of Webflow or Framer because their CMS system is not so flexible for multilingual websites and is super expensive.

As you can see here: https://www.lemonde.fr/robots.txt
They are also blocking AI Crawlers.
# User-agent: Diffbot Disallow: / # User-agent: FacebookBot Disallow: / # User-agent: YouBot Disallow: / # User-agent: anthropic-ai Disallow: / # User-agent: Claude-Web Disallow: / # User-agent: ClaudeBot Disallow: / User-agent: cohere-ai Disallow: /
If blocking AI I would just use Cloudflare... they keep up to date with that so you don't have to. They also restrict access to the site for the bots. With robots.txt youre just asking the bots to not crawl, but its up to them to respect it (idk if any don't respect it or not but just technically speaking)

https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/
do you have any specific use cases that are solved by robots.txt?
@Oleg Isonen I think that the @briv 's use case above it one particular use case where we need to edit robots.txt
Blocking AI crawlers?
I think like John said this is probably not a protection. This is just "I wish you didn't crawl me but since you don't give a f. go ahead and crawl"
Yep it's right. That's why I said that it's really not a prio to have this feature! I think OpenAI don't care about that info (they just take the datas no matter the robots.txt are πŸ˜…)
Maybe someone with more SEO experience can help about that
Would be great to have context there, otherwise its pointless
If blocking AI I would just use Cloudflare... they keep up to date with that so you don't have to. They also restrict access to the site for the bots. With robots.txt youre just asking the bots to not crawl,

Hey @John Siciliano , thanks for the info and tip. It's great to hear about that solution where I don't need to update the list of AI crawlers manually. It'd solve my problems.

Question:
I've been using the Pro Plan (Cloud) hosted by Webstudio, and as stated on the Website, Webstudio uses Cloudflare.
β€’ Does that mean Webstudio is already taking care of that?
β€’ The Website doesn't specify that; it is not clearer in this area. I wasn't able to find that info.

Cloud flare Page:
"Security: Projects automatically benefit from Cloudflare's security features, including DDoS (L7) protection, bot mitigation, and certificate management."
Pricing Page:

Bot Mitigation: Uses machine learning and behavioral analysis to accurately detect and block harmful bots, ensuring your website remains fast, secure, and unnecessary usage is minimized.*


*but its up to them to respect it (idk if any don't respect it or not but just technically speaking) *

That's for sure. However, the perspective isn't about what they would do or not; it is from a Legal standpoint. I can only be lawfully backed by asking any AI crawler not to crawl. As Google, OpenAI, and others state on their websites. If they do it, it's up to them.
Webstudio is not taking care of that as I don't think users would universally say "yes we want this"... it's the same reason Cloudflare doesn't have it on by default.

I guess our copy is ambiguous as "harmful bots" is more referring to bots trying to exploit the website such as looking for WordPress security issues.

As for pricing the intention of that is to say that those bots aren't counting towards your usage.
Blocking AI crawlers?I think like John said this is probably not a protection. This is just "I wish you didn't crawl me but since you don't give a f. go ahead and crawl"Would be great to have context there, otherwise its pointless

Hi @Oleg Isonen ,
Thanks for creating the ticket and for your attention.

As I pointed out to John, It's about being lawfully backed. I'm not interested in personal thinking like, "since you don't give a f., go ahead and crawl".

To be clear, I'm interested in what is required to create a professional project. As you may already know, some projects need to consider many factors, including legal issues regarding data usage.

It's okay If it's not the target audience of your product, not your priority, or even your company direction; it was just a simple question, already answered by John. So, to be even more clear, It does not make what I need less important or pointless because you think, "since you don't give a f. go ahead and crawl"

As a customer, I came here to ask for help, not to hear, "Since you don't give a f., go ahead and crawl," whatever your reason.Β 

That was my second message as a customer, with a simple and friendly question.Β Β Not to generate more pointless questions if I should trust the seriousness of Webstudio, "since you don't give an f., go ahead and crawl."

Thank you for your attention to this matter.
Got it, thanks for the clarification.
Woah woah I think there was a misunderstanding
This comment:

I think like John said this is probably not a protection. This is just "I wish you didn't crawl me but since you don't give a f. go ahead and crawl"

Was Oleg being the robots.txt.

He was saying "Im a robots.txt and asking you AI crawlers not to crawl, whether you AI crawlers respect my request is up to you"
Exactly, @briv you misunderstood my comment. It was meant as a conversation between robots.txt and a crawler. It was meant as a satirical quote.
Just did a quick search about the legal force around robots.txt, there is a Robots Exclusion Protocol but it is 100% on the good faith basis. There is no legal component in this. It is as I assumed not ensuring NOT crawling, more like asking to not crawl.
What we could do in the future is to create a feature either by editing robots.txt or even a with a dedicated settings UI to actually exclude different types of crawlers and enforce it too by not letting them in.
I think this actually deserves a dedicated settings UI instead of text-based robots.txt.
Exactly, @briv you misunderstood my comment. It was meant as a conversation between robots.txt and a crawler. It was meant as a satirical quote.

@Oleg Isonen I understand your clarification, but your message needed more context and clarity to support the thread's focus. The tone was out of context, the "pointless" was also disproportional, and the language used was inappropriate for a simple and friendly question. You haven't offered an apology, which makes me believe I don't need to do that too.


Just did a quick search about the legal force around robots.txt, there is a Robots Exclusion Protocol but it is 100% on the good faith basis. There is no legal component in this. It is as I assumed not ensuring NOT crawling, more like asking to not crawl.

The legal issue is one of many and more complicated than it seems. There are other factors to consider, including terms, policies, copyrights, and different laws in various countries and states. It appears that your search results are from the first page of Google, which is fineβ€”I've been there, too. I recommend that each person conducts their own research, not search, and seeks advice from experts in this field based on their specific needs. It's also important to understand the purpose of "robots.txt."

In addition, I don't know how your search results are relevant to the problem discussed in the thread posted in the "help" section. The focus of the help and thread was not "why I need robots.txt" or "why robots.txt" but rather "How to edit or create robots.txt." I already understand why I need it.

I appreciate everybody's input and concerns, but I believe we all have the answers we need to continue our work. I don't want to continue this conversation any further. Thank you for understanding.
I am sorry for misunderstanding.
We are always use case driven as we are going to prioritize tasks based on how useful it is based on my estimate.
Right now webstudio will not offer robots.txt editing ability, but you can use cloudflare and rewrite url to point it to something else for the txt file.
Add a reply
Sign up and join the conversation on Discord