#4: The Language Advantage: Understanding and Mastering ChatGPT

LLMs like ChatGPT are a big deal in AI's evolution. I discuss why I think so, how ChatGPT works, & begin an exploration of how to be a power user

Aug 01, 2023

This is a newsletter on AI, entrepreneurship, creativity, and mindfulness. Season 1 breaks down Generative AI and its impact on creative jobs and industries. Post #1 is here.

Steven Pinker calls language “the jewel in the crown of (human) cognition.” In a similar vein, I believe that large language models (LLMs) like ChatGPT represent a huge milestone for AI. Below, I will explain why and also show what’s under the hood of ChatGPT with the goal of figuring out how to use ChatGPT as an exceptional writing companion.

Language and the Evolution of (Artificial) intelligence

Language has played a significant role in the evolution of human intelligence — enabling us to communicate complex ideas, share knowledge, and engage in collective problem-solving. Researchers believe that enhanced social coordination enabled by language – i.e. ability to plan collectively and coordinate activities– likely provided humans an evolutionary advantage by facilitating cooperation. Language also allowed the transmission of knowledge across generations. Unlike other species, where skills and information are primarily acquired through individual experience or genetic inheritance, humans could pass down accumulated knowledge through language. This ability to build upon the knowledge of previous generations likely accelerated the development of technology and culture. Although other species have some ability to communicate, none of them have complex language like humans and that may well be what separates homo sapiens from other species.

These observations lead me to believe LLMs represents a significant milestone for AI. LLMs possess the ability to “understand” (we’ll debate this later) and process vast amounts of natural language data, enabling them to comprehend the intricacies of human communication. This language understanding equips LLMs to better coordinate and communicate with human counterparts, allowing more effective collaboration towards shared goals. This ability to understand humans is also why they demonstrate versatility, performing tasks ranging from drafting legal contracts and text summarization to code completion and creative writing. No AI model has come remotely close to the versatility of LLMs and I believe that this comes down to the importance of mastering language.

The versatility of LLMs directionally align with the characteristics of Artificial General intelligence (AGI) — the ability to understand and perform tasks across different domains with human-like proficiency. While LLMs are clearly not AGI, I believe they are crucial stepping stone in the pursuit.

How do LLMs work?

ChatGPT exemplifies large language models, characterized by its substantial size in terms of the number of parameters. And by substantial, we don't mean 40 or 50 parameters as you would find in a typical regression model. With ChatGPT, we're talking about 175 billion parameters; OpenAI's newer system GPT4 boasts even more. To estimate such a vast number of parameters, you need exceptionally large datasets. In the case of ChatGPT, these datasets consist of resources like Wikipedia, online books, massive collections of internet websites, and more.

Fundamentally, the model's core function revolves around a specific type of prediction task. It predicts the next “token” in a sequence given a set of tokens (think of a token as a word for the sake of this discussion). In other words, when generating text, ChatGPT anticipates the most likely next word within a given sequence of words. For instance, when faced with the incomplete sentence "Elephants are the ___," the model's objective becomes predicting the next word, and subsequently, the word after that, and so forth. To select the most suitable next token, ChatGPT takes into account factors such as the frequency of the candidate word/token in English, its relevance to the sentence's context, and the coherence of the generated text. Based on these considerations, ChatGPT generates a list of potential candidate tokens with their associated probabilities such as the following:

Let's say the options for the next word are "largest" with an 11% probability, "smartest" with a 7% probability, and so on. The system then selects one of these choices. Suppose it opts for "largest" as the next word; it then regenerates candidate tokens to follow it such as "land animals," "pachyderms," "mammals," "descendants," and their respective probabilities. This iterative process continues as the model iteratively selects words and generates new possible words to follow.

The user can also influence the output of ChatGPT by adjusting the "temperature" setting. The temperature setting governs the level of randomness in the generated output. Lower temperature values yield more predictable responses (i.e. the model picks the most probable tokens/words), while higher temperature values foster creative and unpredictable responses. For instance, at a temperature of 0, ChatGPT might choose "largest" as the most suitable next token due to its highest probability, resulting in the completion of the sentence as "Elephants are the largest land animals." Opting for the most probable token seems like a logical approach but it might render the text relatively uninteresting, as what we perceive as creativity often lies in the unexpected or unlikely. Conversely, a higher temperature value might lead to the selection of a less probable token like "most impressive," giving rise to the sentence "Elephants are the most impressive creatures on Earth." As the temperature increases, we venture into more subjective and creatively unbounded realms.

When it comes to writing a news article about a sports game, the primary objective is often to present facts in a straightforward manner. In such cases, opting for the most probable word makes sense. A lower temperature setting suits this task, ensuring more predictable responses. However, when it comes to crafting creative fiction, unconventional choices make storytelling more intriguing so we might want to increase the temperature to push the system to occasionally select less probable words. There is a tradeoff: as the temperature increases, there's also an increased risk of the system generating mistakes, such as producing the sentence "elephants are the largest mammals," which is factually incorrect.

The above process of predicting and selecting next tokens/words as opposed to “looking up” answers to a question is also why ChatGPT hallucinates. For example, I recently asked ChatGPT's help in surfacing relevant papers on Human-AI collaboration for one of my research studies. I was excited to find the following paper in ChatGPT's literature review:

"In their paper "Human-AI Collaboration for Creative Tasks," Li et al. (2020) propose a framework for allocating tasks between humans and AI based on their respective strengths and weaknesses. The authors argue that AI systems are better suited to tasks that require high computational power and that can be easily automated, while humans are better suited to tasks that require creativity, judgement, and empathy."

Li, Z., Huang, X., Jiang, Y., & Liu, X. (2020). Human-AI Collaboration for Creative Tasks. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 13672-13679.

I was eager to read this paper because it was closely related to our study. But upon an exhaustive search, I realized that ChatGPT just made up this excellent paragraph and citation. In the future, I’d instruct ChatGPT to use a lower temperature when answering such factual questions.

This discussion about temperature provides a first clue on how to use ChatGPT like a pro. More on that in my next post.

Creative Intelligence | Kartik Hosanagar

Discussion about this post