Beyond absolute safety… Why did Anthropic abandon its more conservative approach?

Contents

“The more we become able to build intelligent systems, the less certain we become about whether we actually understand them.” This idea, repeated in various forms in the discourse of a growing number of AI researchers, sums up the profound tension accompanying the new generation of models that the ability to develop does not necessarily equate to the ability to fully understand or predict behavior.

The transformation witnessed by the artificial intelligence company Anthropic clearly embodies this tension, after it moved from treating the “Mythos” model as high-risk and requiring strict restrictions to making a public version of it available under the name “Fable 5”, in a move that raised questions that go beyond the model’s technical capabilities to the risk management philosophy that governs the entire artificial intelligence industry.

Read also

list of 2 itemsend of list

From commitment to resilience

The debate within Anthropic is no longer about whether more advanced AI models should be developed, but rather about what rules should govern that development in an environment that is changing faster than policies can keep up.

According to statements by company officials to Time magazine, Anthropic abandoned one of the basic pledges in its pioneering safety policy, which in its first version stipulated that no artificial intelligence system would be trained unless the company could in advance ensure the adequacy of protection measures and reduce risks.

Watson explains that "Fable 5" Uses classification layers that detect sensitive requests and convert them to a less capable model in high-risk areas, allowing a robust model to be available while restricting its sensitive uses — Watson explained that “Fable 5” uses classification layers that detect sensitive requests and convert them to a less capable model (Pixabay)

This pledge, which has formed the cornerstone of the Responsible Scaling Policy (RSP) since 2023, was presented as a tough commitment that puts safety before speed of development.

The Responsible Scaling Policy is a voluntary security framework that major artificial intelligence companies adhere to to manage the catastrophic risks of advanced models. The policy is based on the principle of “linking the strength of the model to its level of security.” The capabilities of the systems are constantly evaluated and classified into graded risk levels.

But in recent updates, the company has eliminated the “pre-assurance” requirement for safety before training, replacing it with a more flexible approach that focuses on transparency, comparing the level of risk with competitors, and the possibility of delaying development only if the company leads the race with high catastrophic risks.

Jared Kaplan, chief scientist at Anthropic, said that clinging to stopping training no longer makes sense in an environment where progress is accelerating.

Safety through restriction rather than prevention

If Anthropic has redefined the way it deals with risk, the question arises: Has risk or the way it is managed changed? To understand this transformation more precisely, we must move from the policy level to the technical implementation level within the model.

In this context, Eleanor Watson, a senior member of the Institute of Electrical and Electronics Engineers (IEEE) and a doctoral researcher in engineering at the University of Gloucestershire in the United Kingdom, believes that what happened cannot be explained by a mere administrative reclassification. Rather, there are real technical interventions that were introduced between the “Methos” and “Fable 5” phases, focusing on controlling access more than changing capabilities.

Eleanor "Nell" Watson — Watson believes that the model that previously raised concerns still exists, but access to some of its capabilities has been restricted (Watson’s LinkedIn account)

Speaking to Al Jazeera Net, Watson explains that “Fable 5” works behind an additional layer of classifiers that monitor sensitive queries, and when requests related to high-risk areas such as offensive cybersecurity or molecular biology are discovered, they are transferred to a less capable model. She describes this solution as not ideal from an engineering standpoint, but it allows taking advantage of a powerful model while limiting its sensitive capabilities.

She adds that the system, which previously raised concerns due to its advanced capabilities, including the discovery of about 2,000 “Zero-Day” vulnerabilities, still exists at its core. But the company has added layers of controls to limit access to some sensitive capabilities.

However, the success of this approach depends, according to Watson, on the effectiveness of the classification systems themselves, which have not been fully tested independently, pointing out that some indicators reached by Amazon researchers regarding the possibility of circumventing some restrictions were enough to raise concerns within the US administration.

The moral and geopolitical dimension

For her part, Almira Zinutdinova, a Spanish researcher in the ethics and technologies of responsible artificial intelligence and vice president of VigIA, the Spanish non-profit association concerned with the ethical and responsible use of artificial intelligence, believes that the issue goes beyond the boundaries of the model itself.

She told Al Jazeera Net, “The entire story related to Anthropic’s latest developments and publications reveals the failures of current artificial intelligence governance in the United States of America more than it reveals anything else.”

Zinutdinova says Anthropic's recent developments reveal deeper failures in AI governance in the United States than they reflect mere technical development. (Zinutdinova's LinkedIn account) — Zinutdinova believes that Anthropic’s recent developments reveal deeper failures in AI governance in the United States (Zinutdinova’s LinkedIn account)

But on the other hand, it believes that Anthropic tried to adopt the concept of “shared responsibility” through the “Glasswing” project, which brought together a number of major technical companies and institutions with the aim of employing the model’s capabilities in cyber defense, discovering vulnerabilities, and securing open source software.

But she warns of the dual nature of these systems. According to her, the system that is able to discover and fix vulnerabilities is the same one that is able to help discover new ways to exploit them. From this standpoint, the debate is no longer only about the soundness of the model, but also about its strategic and geopolitical repercussions.

She adds that fears are also increasing about the possibility of utilizing the model’s outputs to train local models through “model distillation” techniques, and concludes that what we are witnessing “is beginning to resemble a new type of arms race, similar to the space race in the past, but today it revolves around artificial intelligence.”

Have safety rules changed?

Although Anthropic asserts that the amendment to its safety policy reflects a new reading of the technical and regulatory reality, it is difficult to ignore the competitive factor in explaining this shift. To understand this complexity, the interaction between technology, regulation and competition must be considered.

In this context, Watson believes that “it is naive to claim that competitive dynamics do not play any role,” noting that the period between the model being considered “too dangerous to be launched” and it being made available to the public did not exceed about two months.

But at the same time, she warns against reducing the issue to market pressures alone. Major AI companies face a real dilemma: withholding a powerful model does not prevent the same capabilities from emerging in other competitors, while releasing it sets a new precedent regarding the acceptable length of time between discovering potential risks and actually publishing them to the public.

Watson points out that competition affects the speed of launch of models, but the decision remains governed by the dilemma of caution and competitive advantage — Watson points out that competition affects the speed of launching models, but the decision remains governed by the dilemma of caution and competitive precedence (Pixels)

From this standpoint, she believes that the assumption that “someone else will build it anyway” may gradually turn into a justification that weakens the culture of caution in the entire sector. On the other hand, it cannot be denied that the fierce competition between companies has contributed to keeping the pace of progress at unprecedented levels, with many ideas and capabilities quickly transferred to less expensive models through competitors.

She adds that the leaders of artificial intelligence companies are now facing a kind of “trilemma,” as they want to maintain the image of the innovator, protect competitive advantages, and avoid regulatory reactions that may hinder their business.

Zinutdinova partly agrees with this assessment, but she adds another dimension to the discussion, which is the symbolic and cultural pressures surrounding the artificial intelligence race, as she believes that it cannot be excluded that some ethical discourses in the sector sometimes turn into what is known as “ethics washing”, where the principles of governance and responsibility are used to enhance the public image of companies without always being reflected in the final commercial decisions.

She says that today’s AI leaders “want to be seen as the modern version of Neil Armstrong or Yuri Gagarin,” indicating that the competition is no longer just economic or technical, but has also become a race for historical status and leadership.

On the other hand, Watson points out, communication about safety itself has become a source of risk. When Anthropic described the “METHOS” model as very dangerous, this language later became part of the evidentiary basis used in discussions and procedures related to export controls.

Thus, laboratories found themselves facing an unexpected paradox: the more transparent they were in talking about risks, the greater the possibility of being exposed to regulatory restrictions that may not be proportional to the actual level of risk after applying safeguards and protection measures.

From prevention to gradual release

Watson believes that what is happening is not limited to a re-evaluation of a particular model, but rather reflects the emergence of a new pattern in the way advanced artificial intelligence systems are deployed. Instead of choosing between a complete release or a complete ban, companies have begun moving towards what they describe as a “multi-level release model,” which is based on making a public version available with strong restrictions and controls, along with more advanced versions that can be accessed by researchers or verified and approved entities.

According to her, this approach represents an attempt to reconcile caution with the requirements of practical deployment, and it also reflects a growing realization that traditional binary options are no longer sufficient to deal with systems with this level of capabilities. She adds that the first interaction with the public sometimes reveals problems that even security researchers may not be able to discover during internal tests.

But the most important question, in its view, relates to the readiness of the monitoring infrastructure and regulatory frameworks to detect failures and respond to them with the required speed.

She points out that the US government’s decision to subject “Fable 5” to restrictions similar to those imposed on some sensitive military materials, days after its launch, suggests that some institutions still doubt the adequacy of these mechanisms, or fear that the rapid spread of advanced technologies will lead to the loss of important geopolitical and strategic advantages.

Beyond the safety narrative

For her part, Zinutdinova believes that judging decisions to launch models should not be based on companies’ statements alone, but rather on their practical track record, their network of partnerships, and the extent of their actual commitment to ethical standards. She emphasizes that collaborations linked to mass surveillance, data misuse, or autonomous systems may be an important indicator of how companies deal with responsibility and risk.

She adds that commercial pressures will always be present in the AI race, and may sometimes trump ethical considerations, even as safety frameworks gradually mature.

Source link

Beyond absolute safety… Why did Anthropic abandon its more conservative approach? | technology

Read also

From commitment to resilience

Safety through restriction rather than prevention

The moral and geopolitical dimension

Have safety rules changed?

From prevention to gradual release

Beyond the safety narrative

Leave a Reply Cancel reply

Recent Posts

Recent Comments

You May also Like

Sources: OpenAI hasn’t held pre-IPO meetings or set timeline yet

Snapdragon 8 Elite Gen 6, 8 Elite Gen 6 Pro Details Leak as Tipster Reveals Anticipated Rivals to MediaTek’s Flagship Chips

FBI warns Microsoft 365 users about Kali365 scam that bypasses MFA

Anthropic Google DeepMind CEOs call for U.S.-led AI coalition at G7

About Company