The Creative Commons logo has been the global symbol for the open web, signaling that knowledge is free to share, remix, and reuse. That era may be ending. In a recent blog post, CC announced tentative support for “pay-to-crawl” systems. You can read this as a policy update. You can also read it as a signal that the fair use defense for commercial AI training is under pressure.
CC acknowledged that while open sharing is vital for human culture, the industrial-scale scraping of content to build commercial AI models creates different economics. I don’t know if this is the work of a fringe group inside the free and open source crowd, or if it is the will of the majority, but the stewards of the “copyleft” movement are now arguing that payment gateways and licensing protocols are necessary to protect the value of human creation. It’s a pretty big shift.
We must adopt methods of compensating content creators and IP holders when their content is used for commercial purposes, so it’s encouraging to see CC take a strong position here (even if it’s couched as “tentative”).
This vision of a “pay-to-crawl” system is far from being reduced to practice. While we wait, start aggregating and quantifying the scrapable data you want to get paid for. When systems like the one CC is describing go live, properly identified IP will effectively have a machine-readable “cash register” attached to it. Even small websites like mine will benefit.
Most (if not all) of the public web has already been ingested by the foundational AI model builders. For many developers, the working assumption was that “publicly available” meant “free to exploit.” With a working “pay-to-crawl” protocol in place, rightholders will get to decide if they want their fair share, or if they are content to feed the machine for free. This can’t happen soon enough.
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.