Wednesday, August 27, 2025

Scientists Just Developed a New AI Modeled On The Human Brain It’s Outperforming LLMs Like ChatGPT At Reasoning Tasks

Eugene Mymrin/Getty Images

Scientists have developed a new type of artificial intelligence (AI) model that can reason differently from most large language models (LLMs) like ChatGPT, resulting in much better performance in key benchmarks. The new reasoning AI, called a hierarchical reasoning model (HRM), is inspired by the hierarchical and multi-timescale processing in the human brain the way different brain regions integrate information over varying durations (from milliseconds to minutes)…….Continue reading….

By: By 

Source:  Live Science

.

Critics: 

As machine learning algorithms process numbers rather than text, the text must be converted to numbers. In the first step, a vocabulary is decided upon, then integer indices are arbitrarily but uniquely assigned to each vocabulary entry, and finally, an embedding is associated to the integer index. Algorithms include byte-pair encoding (BPE) and WordPiece.

There are also special tokens serving as control characters, such as [MASK] for masked-out token (as used in BERT), and [UNK] (“unknown”) for characters not appearing in the vocabulary. Also, some special symbols are used to denote special text formatting. For example, “Ġ” denotes a preceding whitespace in RoBERTa and GPT. “##” denotes continuation of a preceding word in BERT.

For example, the BPE tokenizer used by GPT-3 (Legacy) would split Tokenization also compresses the datasets. Because LLMs generally require input to be an array that is not jagged, the shorter texts must be “padded” until they match the length of the longest one. The average number of words per token depends on the language. In English, the ratio is typically around 0.75 words per token, with 4 characters per token on average.

In the context of training LLMs, datasets are typically cleaned by removing low-quality, duplicated, or toxic data. Cleaned datasets can increase training efficiency and lead to improved downstream performance. A trained LLM can be used to clean datasets for training a further LLM. With the increasing proportion of LLM-generated content on the web, data cleaning in the future may include filtering out such content.

LLM-generated content can pose a problem if the content is similar to human text (making filtering difficult) but of lower quality (degrading performance of models trained on it). Training of largest language models might need more linguistic data than naturally available, or that the naturally occurring data is of insufficient quality. In these cases, synthetic data might be used. Microsoft’s Phi series of LLMs is trained on textbook-like data generated by another LLM.

Before being fine-tuned, most LLMs are next-token predictors. The fine-tuning adjust the output of an LLM to seem more conversational via techniques like reinforcement learning from human feedback (RLHF) or constitutional AI. Instruction fine-tuning is a form of supervised learning used to teach LLMs to follow user instructions. In 2022, OpenAI demonstrated InstructGPT, a version of GPT-3 similarly fine-tuned to follow instructions.

Reinforcement learning from human feedback (RLHF) involves training a reward model to predict which text humans prefer. Then, the LLM can be fine-tuned through reinforcement learning to better satisfy this reward model. Since humans typically prefer truthful, helpful and harmless answers, RLHF favors such answers.

A mixture of experts (MoE) is a machine learning architecture in which multiple specialized neural networks (“experts”) work together, with a gating mechanism that routes each input to the most appropriate expert(s). Mixtures of experts can reduce inference costs, as only a fraction of the parameters are used for each input. The approach was introduced in 2017 by Google researchers.

Typically, LLMs are trained with single- or half-precision floating point numbers (float32 and float16). One float16 has 16 bits, or 2 bytes, and so one billion parameters require 2 gigabytes. The largest models typically have 100 billion parameters, requiring 200 gigabytes to load, which places them outside the range of most consumer electronics. Post-training quantization aims to decrease the space requirement by lowering precision of the parameters of a trained model, while preserving most of its performance.

Quantization can be further classified as static quantization if the quantization parameters are determined beforehand (typically during a calibration phase), and dynamic quantization if the quantization is applied during inference. The simplest form of quantization simply truncates all the parameters to a given number of bits: this is applicable to static as well as dynamic quantization, but loses much precision.

Dynamic quantization allows for the use of a different quantization codebook per layer, either a lookup table of values or a linear mapping (scaling factor and bias), at the cost of foregoing the possible speed improvements from using lower-precision arithmetic. Quantized models are typically seen as frozen with modification of weights (e.g. fine-tuning) only applied to the original model. It is possible to fine-tune quantized models using low-rank adaptation.

In 2020, OpenAI researchers demonstrated that their new model GPT-3 could understand what format to use given a few rounds of Q and A (or other type of task) in the input data as example, thanks in part due to the RLHF technique. This technique, called few-shot prompting, allows LLMs to be adapted to any task without requiring fine-tuning. Also in 2022, it was found that the base GPT-3 model can generate an instruction based on user input.

The generated instruction along with user input is then used as input to another instance of the model under a “Instruction: […], Input: […], Output:” format. The other instance is able to complete the output and often produces the correct answer in doing so. The ability to “self-instruct” makes LLMs able to bootstrap themselves toward a correct answer. An LLM can be turned into a chatbot or a “dialog assistant” by specializing it for conversation.

In essence, user input is prefixed with a marker such as “Q:” or “User:” and the LLM is asked to predict the output after a fixed “A:” or “Assistant:”. This type of model became commercially available in 2022 with ChatGPT, a sibling model of InstructGPT fine-tuned to accept and produce dialog-formatted text based on GPT-3.5. It could similarly follow user instructions. Before the stream of User and Assistant lines, a chat context usually start with a few lines of overarching instructions, from a role called “developer” or “system” to convey a higher authority than the user’s input. This is called a “system prompt”.

Retrieval-augmented generation (RAG) is an approach that enhances LLMs by integrating them with document retrieval systems. Given a query, a document retriever is called to retrieve the most relevant documents. This is usually done by encoding the query and the documents into vectors, then finding the documents with vectors (usually stored in a vector database) most similar to the vector of the query. The LLM then generates an output based on both the query and context included from the retrieved documents.

Tool use is a mechanism that enables LLMs to interact with external systems, applications, or data sources. It can allow for example to fetch real-time information from an API or to execute code. A program separate from the LLM watches the output stream of the LLM for a special tool-calling syntax. When these special tokens appear, the program calls the tool accordingly and feeds its output back into the LLM’s input stream.

Early tool-using LLMs were fine-tuned on the use of specific tools. But fine-tuning LLMs for the ability to read API documentation and call API correctly has greatly expanded the range of tools accessible to an LLM. Describing available tools in the system prompt can also make an LLM able to use tools. A system prompt instructing ChatGPT (GPT-4) to use multiple types of tools can be found online.

An LLM is typically not an autonomous agent by itself, as it lacks the ability to interact with dynamic environments, recall past behaviors, and plan future actions. But it can be transformed into an agent by adding supporting elements: the role (profile) and the surrounding environment of an agent can be additional inputs to the LLM, while memory can be integrated as a tool or provided as additional input. Instructions and input patterns are used to make the LLM plan actions and tool use is used to potentially carry out these actions.

The ReAct pattern, a portmanteau of “Reason + Act”, constructs an agent out of an LLM, using the LLM as a planner. The LLM is prompted to “think out loud”. Specifically, the language model is prompted with a textual description of the environment, a goal, a list of possible actions, and a record of the actions and observations so far. It generates one or more thoughts before generating an action, which is then executed in the environment.

In the DEPS (“Describe, Explain, Plan and Select”) method, an LLM is first connected to the visual world via image descriptions. It is then prompted to produce plans for complex tasks and behaviors based on its pretrained knowledge and the environmental feedback it receives. The Reflexion method constructs an agent that learns over multiple episodes.

At the end of each episode, the LLM is given the record of the episode, and prompted to think up “lessons learned”, which would help it perform better at a subsequent episode. These “lessons learned” are stored as a form of long-term memory and given to the agent in the subsequent episodes. Monte Carlo tree search can use an LLM as rollout heuristic. When a programmatic world model is not available, an LLM can also be prompted with a description of the environment to act as world model.

For open-ended exploration, an LLM can be used to score observations for their “interestingness”, which can be used as a reward signal to guide a normal (non-LLM) reinforcement learning agent. Alternatively, it can propose increasingly difficult tasks for curriculum learning. Instead of outputting individual actions, an LLM planner can also construct “skills”, or functions for complex action sequences. The skills can be stored and later invoked, allowing increasing levels of abstraction in planning.

Multiple agents with memory can interact socially. LLMs can handle programming languages similarly to how they handle natural languages. No special change in token handling is needed as code, like human language, is represented as plain text. LLMs can generate code based on problems or instructions written in natural language. They can also describe code in natural language or translate between programming languages.

They were originally used as a code completion tool, but advances have moved them towards automatic programming. Services such as GitHub Copilot offer LLMs specifically trained, fine-tuned, or prompted for programming. LLM architectures have also proven useful in analyzing biological sequences: protein, DNA, and RNA. With proteins they appear able to capture a degree of “grammar” from the amino-acid sequence, condensing a sequence into an embedding.

On tasks such as structure prediction and mutational outcome prediction, a small model using an embedding as input can approach or exceed much larger models using multiple sequence alignments (MSA) as input. ESMFold, Meta Platforms’ embedding-based method for protein structure prediction, runs an order of magnitude faster than AlphaFold2 thanks to the removal of an MSA requirement and a lower parameter count due to the use of embeddings.

Meta hosts ESM Atlas, a database of 772 million structures of metagenomic proteins predicted using ESMFold. An LLM can also design proteins unlike any seen in nature. Nucleic acid models have proven useful in detecting regulatory sequences, sequence classification, RNA-RNA interaction prediction, and RNA structure prediction.

Friday

Leave a Reply

No comments:

Post a Comment

Most Air Cleaning Devices Have Not Been Tested on People and Little Is Known About Their Potential Harms, New Study Finds

GNDphotography/E+ via Getty Images Portable air cleaners aimed at curbing indoor spread of infections are rarely tested for how well they pr...