Vicky Leta
This year, we couldn’t stop hearing about how ChatGPT was going to change the world. But, has it? Humans have finally done the work we’ve only read about in sci-fi novels. Computers can now write poetry (sort of) like Emily Dickinson, create works of art (sort of) like Vincent van Gogh, and write books on Marxist philosophy. So why do our lives feel relatively unchanged?
OpenAI chief scientist Ilya Sutskever describes the system that powers ChatGPT as a “digital brain.” Artificial intelligence was modeled after human intelligence after all. Did you ever have a brilliant friend who simply lacked the motivation to apply themselves?…..Story continues…
By: Maxwell Zeff
Source: How will GPTs change our worlds?
.
Critics:
OpenAI introduced the first GPT model (GPT-1) in 2018, publishing a paper called “Improving Language Understanding by Generative Pre-Training.” It was based on the transformer architecture and trained on a large corpus of books. The next year, they introduced GPT-2, a larger model that could generate coherent text. In 2020, they introduced GPT-3, a model with over 100 times as many parameters as GPT-2, that could perform various tasks with few examples.
GPT-3 was further improved into GPT-3.5, which was used to create the chatbot product ChatGPT. Rumors claim that GPT-4 has 1.76 trillion parameters, which was first estimated by the speed it was running and by George Hotz. OpenAI stated that GPT-4 is “more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5.”
They produced two versions of GPT-4, with context windows of 8,192 and 32,768 tokens, a significant improvement over GPT-3.5 and GPT-3, which were limited to 4,096 and 2,049 tokens respectively. Some of the capabilities of GPT-4 were predicted by OpenAI before training it, although other capabilities remained hard to predict due to breaks in downstream scaling laws.
Unlike its predecessors, GPT-4 is a multimodal model: it can take images as well as text as input; this gives it the ability to describe the humor in unusual images, summarize text from screenshots, and answer exam questions that contain diagrams. It can now interact with users through spoken words and respond to images, allowing for more natural conversations and the ability to provide suggestions or answers based on photo uploads.
To gain further control over GPT-4, OpenAI introduced the “system message”, a directive in natural language given to GPT-4 in order to specify its tone of voice and task. For example, the system message can instruct the model to “be a Shakespearean pirate”, in which case it will respond in rhyming, Shakespearean prose, or request it to “always write the output of [its] response in JSON“, in which case the model will do so, adding keys and values as it sees fit to match the structure of its reply.
In the examples provided by OpenAI, GPT-4 refused to deviate from its system message despite requests to do otherwise by the user during the conversation. When instructed to do so, GPT-4 can interact with external interfaces. For example, the model could be instructed to enclose a query within <search></search>
tags to perform a web search, the result of which would be inserted into the model’s prompt to allow it to form a response.
This allows the model to perform tasks beyond its normal text-prediction capabilities, such as using APIs, generating images, and accessing and summarizing webpages. A 2023 article in Nature stated programmers have found GPT-4 useful for assisting in coding tasks (despite its propensity for error), such as finding errors in existing code and suggesting optimizations to improve performance.
The article quoted a biophysicist who found that the time he required to port one of his programs from MATLAB to Python went down from days to “an hour or so”. On a test of 89 security scenarios, GPT-4 produced code vulnerable to SQL injection attacks 5% of the time, an improvement over GitHub Copilot from the year 2021, which produced vulnerabilities 40% of the time. In November 2023, OpenAI announced the GPT-4 Turbo and GPT-4 Turbo with Vision model, which features a 128K context window and significantly cheaper pricing.
On May 13, 2024, OpenAI introduced GPT-4o (“o” for “omni”), a model that marks a significant advancement by processing and generating outputs across text, audio, and image modalities in real time. GPT-4o exhibits rapid response times comparable to human reaction in conversations, substantially improved performance on non-English languages, and enhanced understanding of vision and audio.
GPT-4o integrates its various inputs and outputs under a unified model, making it faster, more cost-effective, and efficient than its predecessors. GPT-4o achieves state-of-the-art results in multilingual and vision benchmarks, setting new records in audio speech recognition and translation.
OpenAI plans to immediately roll out GPT-4o’s image and text capabilities to ChatGPT, including its free tier, with voice mode becoming available for ChatGPT Plus users in coming weeks. They plan to make the model’s audio and video capabilities available for limited API partners in coming weeks. In its launch announcement, OpenAI noted GPT-4o’s capabilities presented new safety challenges, and noted mitigations and limitations as a result.
GPT-4 demonstrates aptitude on several standardized tests. OpenAI claims that in their own testing the model received a score of 1410 on the SAT , 163 on the LSAT (88th percentile), and 298 on the Uniform Bar Exam (90th percentile). In contrast, OpenAI claims that GPT-3.5 received scores for the same exams in the 82nd, 40th, and 10th percentiles, respectively. GPT-4 also passed an oncology exam, an engineering exam and a plastic surgery exam.
In the Torrance Tests of Creative Thinking, GPT-4 scored within the top 1% for originality and fluency, while its flexibility scores ranged from the 93rd to the 99th percentile. Researchers from Microsoft tested GPT-4 on medical problems and found “that GPT-4, without any specialized prompt crafting, exceeds the passing score on USMLE by over 20 points and outperforms earlier general-purpose models (GPT-3.5) as well as models specifically fine-tuned on medical knowledge (Med-PaLM, a prompt-tuned version of Flan-PaLM 540B).
Despite GPT-4’s strong performance on tests, the report warns of “significant risks” of using LLMs in medical applications, as they may provide inaccurate recommendations and hallucinate major factual errors. Researchers from Columbia University and Duke University have also demonstrated that GPT-4 can be utilized for cell type annotation, a standard task in the analysis of single-cell RNA-seq data.
In April 2023, Microsoft and Systems announced that they will provide healthcare providers with GPT-4-powered systems for assisting in responding to questions from patients and analysing medical records. Like its predecessors, GPT-4 has been known to hallucinate, meaning that the outputs may include information not in the training data or that contradicts the user’s prompt.
GPT-4 also lacks transparency in its decision-making processes. If requested, the model is able to provide an explanation as to how and why it makes its decisions but these explanations are formed post-hoc; it’s impossible to verify if those explanations truly reflect the actual process. In many cases, when asked to explain its logic, GPT-4 will give explanations that directly contradict its previous statements.
In 2023, researchers tested GPT-4 against a new benchmark called ConceptARC, designed to measure abstract reasoning, and found it scored below 33% on all categories, while models specialized for similar tasks scored 60% on most, and humans scored at least 91% on all. Sam Bowman, who was not involved in the research, said the results do not necessarily indicate a lack of abstract reasoning abilities, because the test is visual, while GPT-4 is a language model.
A January 2024 study conducted by researchers at Cohen Children’s Medical Center found that GPT-4 had an accuracy rate of 17% when diagnosing pediatric medical cases.
Leave a Reply