Understanding OpenAI's Rumored Humanity-Ending Algorithm

Illustration created by DALL-E with the prompt “A 16×9 fantasy image of the rumored Q* algorithm ruling over humanity with its inanimate hand.”

By now, I’m sure you’ve seen dozens of headlines like, “OpenAI may have discovered AI so powerful it could end humanity.” While such headlines are sensational, it’s important to approach them with a balanced perspective. Humanity is as safe from AI today as it was yesterday. And, it is highly unlikely that (considering all of the other threats to human safety) AI is going to be humanity’s undoing any time soon.

That said, there are credible rumors that OpenAI has developed an algorithmic methodology called “Q*” (pronounced Q-star) that will bring it (and, by proxy, us) closer to Artificial General Intelligence (AGI), a form of AI capable of understanding, learning, and applying intelligence across a wide range of tasks, akin to human capabilities. I’ve spoken to several engineers and AI/ML specialists since the Sam Altman saga began. Here’s my best guess at the component algorithms and machine learning principles that OpenAI may be using to move us closer to AGI.

Introduction to Q*: Enhancing AI Capabilities

Rooted in two pivotal theories, Q* represents the culmination of Q-learning—a subset of reinforcement learning—and an innovative component of the Maryland Refutation Proof Procedure system. These elements combine to potentially redefine the capabilities of AI systems.

Q-learning allows AI to learn decision-making autonomously, mirroring the trial-and-error process inherent in human learning. Unlike OpenAI’s reinforcement learning through human feedback (RLHF), Q-learning operates without human intervention. For instance, imagine a robot autonomously navigating a maze, developing its strategy (a “Q table”) by trial and error, without external guidance. This contrasts sharply with RLHF, where human input shapes the AI’s decisions. The ultimate goal of Q-learning is to achieve a Q* state, where the AI knows the optimal action for every situation, fulfilling the Bellman equation. This method could significantly enhance an AI’s native problem-solving abilities and extend its reach to new domains, such as business analytics and strategic planning.

Q* Algorithm Boost

Q* also refers to a key aspect of the Maryland Refutation Proof Procedure system, an AI theorem-proving technique that integrates semantic and syntactic information. This method, akin to solving a complex puzzle, indicates OpenAI’s progress toward AI systems that possess a deeper understanding of reality, moving beyond text prompts to a human-like comprehension.

The implications of functional Q* are vast and varied. If it represents an advanced form of Q-learning, we could see AI systems capable of autonomous learning and adaptation in complex environments. This advancement could revolutionize fields like autonomous driving and complex task management in various industries. If it aligns with the Maryland Refutation Proof Procedure’s Q algorithm, a surge in AI’s analytical and problem-solving capabilities is expected, impacting deep reasoning fields such as legal analysis and medical diagnostics.

What The Rumor Mill Thinks OpenAI’s Q* Might Include

OpenAI’s Q* Learning may combine the decision-making prowess of Q-learning with the efficiency of the A* search algorithm which calculates the shortest path to a target by combining the actual distance from the start point with an estimated distance to the goal, using heuristics to optimize the search process. Unlike traditional large language models (like GPT-4) which rely on static datasets and struggle with context understanding, Q* may offer dynamic learning, allowing continuous adaptation and specific goal achievement.

Critical Roles in a Q* Algorithm

Here’s an outline of the key components of a system using a Q* algorithm, particularly in the context of reinforcement learning:

Environment and Agent: This is a fundamental aspect of reinforcement learning where the “agent” (the AI algorithm) interacts with an “environment.” The environment could be any defined space or scenario, like navigating through a maze or playing a video game. The agent learns to make decisions based on its interactions within this environment.
States and Actions: In the context of reinforcement learning, the environment is characterized by “states” and the agent has various “actions” it can take in each state. The quality and outcome of these actions determine the learning process of the agent.
The Q Table: This is a critical component in Q-learning, a type of reinforcement learning. The Q table helps the agent determine which action to take in each state. The table stores the estimated “value” or “quality” (hence the “Q”) of taking a specific action in a specific state, based on past experience.
Learning by Doing: Reinforcement learning is characterized by this trial-and-error process. The agent learns from the consequences of its actions – receiving rewards for successful actions and penalties for unsuccessful ones. This feedback mechanism is essential for the agent’s learning and decision-making process.
Updating the Q Table: The Q table is not static; it gets updated as the agent learns more about the environment. This updating process involves considering both the immediate reward of an action and the potential future rewards, allowing the agent to make decisions that are beneficial in the long term.
Refining the Q Table: Over time, as the agent continues to interact with the environment and receive feedback, the Q table becomes more refined and accurate. This refinement process enhances the agent’s ability to make effective decisions and navigate the environment more successfully, potentially leading to breakthroughs in various fields.

The Future and AGI

The development of the rumored Q* as described above might mark a significant step toward artificial general intelligence (AGI). It addresses current LLM limitations by enabling more dynamic, goal-oriented, and efficient AI systems. With applications ranging from autonomous vehicles to complex task management, Q* could redefine AI capabilities. But again, this is a hypothetical application.

Comparing OpenAI’s Q* with Google DeepMind’s Project Gemini

While we’re on the subject of hypothetical near-AGI systems, Google DeepMind’s Project Gemini, another major AI initiative, aims to surpass GPT-4 using advanced techniques, including tree search methods akin to Q-learning, enhancing decision-making and creativity. As you know, Google has been objectively behind OpenAI in the deployment of consumer-usable AI tools. I am expecting the launch of Gemini to flip the script.

Why The Hyperbole About AI Causing the End of Humanity?

The potential benefits of Q* include enhanced problem-solving across various sectors, improved human-AI collaboration, and advancements in automation. However, these advancements come with risks, including ethical challenges, privacy concerns, economic impact, and the potential for AI misalignment with human interests.

Perhaps the biggest concern is that, like humans, a fully functional Q* system would be able to train itself on a relatively small amount of data. Our ability to see a demonstration of something and work out how to do it, then practice and get better, and then apply that knowledge and those skills to achieve other goals that are seemingly unrelated to the acquired knowledge or that build on the acquired knowledge is thought to be a uniquely human ability. The thought that a machine or some hypothetical “free-floating” algorithm could have the power to learn and evolve its skills across a wide range of seemingly unrelated tasks has more than a few people very scared.

There are several good reasons to fear such a system. First and foremost, if it is sufficiently powerful (and the history of technology teaches us that it will increase in power exponentially) it will ultimately be able to learn more quickly than humans. Since we’re imagining here (this is all conjecture), we can assume that it will be able to evolve its knowledge and improve its cross-disciplinary capabilities in ways that we (humans) would not be able to understand.

The good news is that such a system might be able to discover things that are outside of our abilities – cures for diseases, solutions for environmental problems, etc. On the downside, if weaponized, such a system would be able to do unimaginable harm.

Balancing Optimism and Caution

If you’re wondering where I stand on this subject – when it comes to technology, I am an optimist. I believe that technological progress is, on a whole, a force for good. I am also a pragmatist and I know that every tool (and technology is just a fancy word for tool) can be used as intended, but also in unintentional ways. As we navigate this nascent stage of an evolutionary process that allows us to outsource not only our calculative but also our cognitive abilities, it’s crucial to balance our excitement with a healthy dose of caution. Personally, it simultaneously thrills me and scares me to death. If you’re not already feeling both glee and fear, you might not yet fully grasp the immense potential and challenges of this evolving technology.

If you’re interested in learning more about the practical applications of AI, sign up for our free online course Generative AI for Execs. It will help you shape your thoughts about the future.

Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.

Understanding OpenAI’s Rumored Humanity-Ending Algorithm