The Spotify catalog download sparked a new debate about who controls the world’s music
An activist group called Anna’s archive assured this week that it had massively extracted content Spotify and posted a torrent to download 86 million files. In addition to the songs, they claim to have 256 million metadata records, including artist, album and song names in 300 terabytes of information.
Anna’s archive is a so-called Shadow Library (Library in the Shadows) and defines itself as “Archivar” collective. It is the most famous book piracy site in the world and this time the download was carried out using the so-called scraping: an automated extraction of information, a practice also carried out by generative artificial intelligence systems such as: B. is used systematically ChatGPT to maintain your chatbots. In this case the procedure was applied to songs.
The conflict over copyright and digital piracy has been going on for decades. Since the late 90s, with the emergence of file sharing platforms such as Napster, eMule, Kazaa or AresIn the cultural sector there are always disputes about the unauthorized distribution of content. One of the most significant milestones of this phase was the band’s demand Metallica filed against Napster in 2000 and became a paradigmatic case for litigation between artists, record labels and digital distribution services.
Today, however, the problem is different: it is no longer just about users who want to avoid payments, but about mega-corporations that use the information to train their models. It happened to Meta, Mark Zuckerberg’s multimillion-dollar company, which used pirated books without paying royalties to power Llama, his artificial intelligence model.
In the world of music, Suno is a platform that allows you to generate complete songs from a user “prompt,” from lyrics to melody to harmony. For this reason, the availability of this amount of music was celebrated by many users who do not want to pay for Spotify, but the case also sparked a debate about who is the biggest beneficiary of this download: Artificial intelligence company They feed their models large amounts of information.
Who will benefit most from this download? How do companies like Meta, OpenAI, Google and Amazon actually extract all the information they use to build their models?
“Scraping”, downloading and how a musical AI is trained
Data mining, a fundamental practice of AI companies. Photo: ShutterstockHe Scrape (or “data scraping”) is a technique that generally consists of automatically extracting large amounts of information from a digital platform without the express permission of the service concerned. This is done through programs that simulate a user’s behavior and systematically search websites or databases to copy content, metadata or complete data sets. As a symbolic case of this era, the American media New York Times is on trial with OpenAI because Sam Altman’s company uses journalistic articles to train ChatGPT.
In the case of Spotify, such a procedure can be used to collect not only songs, but also songs related information such as title, artistPlaylists, release dates and other data that is part of your digital infrastructure.
“First, we must make it clear that most leaked material is protected by copyright law, which legally restricts its copying, reproduction and use. It is neither trivial nor legally easy to have access to such a volume and variety of commercial music without licenses or permissions from the rights holders,” he explained Clarion Hernán Ordiales, engineer, teacher and audio specialist with AI.
The special thing about this case is that “the amount of data involved is unprecedented and from a technical perspective, it could serve as a basis for training generative music models.” “Models of this type, as used by Suno or Udiowho have already been sued by recording industry associations on suspicion of illegal use of this type of material and who are in full negotiations,” continues the specialist belonging to the Open Artificial Intelligence Laboratory (LAIA).
Suno, AI song generator. Photo: SunoThe songs serve as the basis for these models. “These types of models You “learn” from examplesExtracting structural, rhythm, harmony, and even timbre patterns directly from large amounts of real audio. The greater the number and variety of examples, the better the model’s ability to capture and reproduce complex structures. This approach is called “machine learning” in its most basic form, and when supported by architectures based on neural networks trained on large amounts of data, it specifically falls into the field Deep learning“Ordiales continues.
To do this, we work with audio fragments, translated into “Tokens“These would be small basic units of sound information.” “There are different architectures for developing music generation models.” They all start with representative examples of what the model is expected to generate. Transformer-based language models separate texts into “tokens,” which are words or parts of words. When generating music, similar processes can be applied, but not at the level of words, but at the level of audio fragments“Adds David Coronel, also from LAIA.
For this reason, Anna’s Archive has uploaded so-called “metadata” in addition to the songs, i.e. a kind of label with information about the artist, song, album, year of release, etc. “A fundamental step is labeling. Each audio sample should be accompanied by clear descriptions that tell the model what type of music it is: the genre, the instruments, the mood. Without these stamps of approval, the model would not be able to associate the audio patterns with the instructions that users will later give it,” he continues.
On a technical level, Coronel explains how music is generated by AI from other music: “In diffusion-type models, the process is to “break” the examples by adding noise to them. The model then tries to predict what the added noise looks like. It is also capable of performing the reverse process: Create clean music from random noise. In short, the training phase involves processing many examples to extract patterns that are musically meaningful and coherent, and then using these patterns to compose similar pieces,” he concludes.
Spotify defends itself, Anna’s Archive defends itself: the debate about cultural preservation
Dispute over the Spotify catalog. Photo: ShutterstockSpotify, that has more than 700 million users around the world, confirmed that it is investigating the incident and assured that it has already taken action against the affected accounts. “We identified and disabled malicious accounts that were involved in illegal scraping activities,” the company said. In a statement, it added that the investigation found that “a third party collected public metadata and used illegal tactics to circumvent DRM (Digital Rights Management) and access some audio files on the platform.”
On the other hand, Anna’s Archive, known for providing links to copyrighted books and texts, defended the initiative as a cultural preservation project. In a blog post, the group explained that the files “99.6% of all music listened to by Spotify users” and that they were distributed via torrents.
“Of course Spotify doesn’t have all the music in the world, but it’s a great start,” said the collective, which is dedicated to “preserving the knowledge and culture of humanity.” And he added: “With your help, humanity’s musical heritage will forever be protected from natural disasters, wars, budget cuts and other disasters.”
Anna’s Archive, the conservatory collective that has made Spotify music available for download. Photo: Anna’s archiveThe form of this dispute is therefore a commercial struggle with economic motivations. But the bottom line is deeper: “From the perspective of Cultural preservationthe problem of closed platforms is obvious. This already happened with the Kindle and is happening today with streaming: existing works disappear from one day to the next and cannot be found anywhere else. When a few private platforms become that only access route to the cultural heritage (Books, music, films), culture is subject to commercial decisions and not to a criterion of preservation or public access,” said Carolina Martínez Elebi, graduate of communication sciences and professor at UBA, to this medium.
“In this context, many copyright laws no longer seem to fulfill their original purpose. Instead of encouraging the creation of works and ensuring their preservation, they act as tools of blocking and persecution, preventing this cultural heritage from circulating through other channels. The action of some activist groups can be read as a provocation: as a way to put on the agenda that If the culture is enclosed in closed gardens, there is a risk that it will disappear if the owner decides to close the door.“, concludes the specialist, also author of the DHyTecno site.
The case of Spotify and Anna’s Archive thus reveals a tension that runs through today’s digital economy: who controls access to culture, according to what rules and for what purposes. In a scenario dominated by closed platforms such as Spotify, Apple Music and YouTube and artificial intelligence models that devour terabytes of data, music is also becoming a strategic resource.
It’s no longer just about not paying for music, but about defining the future of cultural heritage in the digital age. The discussion is no longer just technical, it is also cultural.