AI has been a whole thing for several years now and it’s been hyped beyond belief. One of the ways we notice how excessive the hype has been is how many people are sure AI is going to change everything, yet they can’t quite describe how anyone should use it. They’re just convinced it’s the future and that we’re all going to lose our jobs.

Between working for UPenn and being part of the History Matters community with Dr Joanne Freeman (shown above) and Annie Evans, I have heard a many teachers discuss AI. They’ve been scared. They’ve been confused. Even the AI professors at UPenn have had existential panics. Above all, when teachers finally get around to using it, they quickly experience AI’s notorious hallucinations, causing them to think the stuff doesn’t even work.

Education seems like it should be a win for AI. Kids can be taught to use the new tools early on, teachers can save time, and so on. But instead, AI is being forced on teachers who are being told they have to start using the new tools, and teach kids how to use them, right away which is creating a lot of tension and anxiety for everyone involved.

I hear about this all the time from the history matters community in particular. The story is roughly that the people running the schools are scared of not getting the most out of the shiny, new tech, which is supposedly changing everything, and thus they mandate its use to avoid being left behind. The thing that is particularly maddening about this is that teachers are told they have to use AI and almost no one shows them how. They just wave their hands in the air and say, “This is going to change everything! It’s the future!”, and then they act surprised when teachers say, “but I tried it and it kept saying incorrect stuff”.

I empathize with the teachers on this one. I want to help.

My plan for this essay is to use plain language to describe how AI works so my teaching friends gain a lot of intuition. I’m then going to describe some of non-obvious prompting techniques that consistently generate high quality outputs. And finally, some general rules of thumb to keep in mind as you use the new tools.

Quick Overview

When people say AI today, they usually mean LLMs, or Large Language Models. AI as a concept has been around since the early days of computing. Alan Turing, the father of computing, knew Claude Shannon, the father of information theory. Shannon’s 1948 paper, A Mathematical Theory of Communication lays the theoretical groundwork for modern AI, especially neural networks.

Amusingly, AI is so old that it has been overhyped enough for everyone to get sick of it many, many times. The term “AI Winter” was first coined way back in 1984 to capture the way people stop caring about AI after one of these phases of excessive hype. If you feel sick of hearing people say ludicrous things about AI, you aren’t alone. You are actually part of a decades-long tradition of folks who get sick of AI hype.

With that said, LLMs actually do represent a massive leap forward in terms of how well AI performs. They aren’t just hype.

The key things that LLMs did to push the field forward are:

Learning from huge amounts of text. They get better as you give them more data and you can give them seemingly endless data. As the size of the training data grows, they become more likely to produce accurate and coherent responses. This is why AI companies are so aggressive about collecting as much data. Being able to build such gigantic models using as much data as can be collected is new.
Understanding context. LLMs can understand text of any size, allowing them to consider not just the individual words, but the context of entire documents or conversations. Prior models were significantly more limited in scope, focused on smaller portions of documents like phrases or sentences.
They have an “Attention Mechanism”. LLMs can pay attention to the most relevant parts of a document to figure out what other parts of the document mean. As an analogy, consider how the meaning of “apple” would change if it was used in a sentence with “kitchen” or with “technology”.
Generating human-like text. They can write, chat, and respond in ways that feel like a real conversation with a person. If we put aside their notorious tendency to hallucinate, the quality of their text outputs is the best the AI industry has ever achieved.

Once all of the model training is done, you can think of LLMs as essentially being a big mapping of how words are arranged when humans use them to communicate. One word leads to another word, which leads to another word, and so on. This is why they’re called language models. They attempt to model the flow of language.

Knowledge is stored somewhat indirectly in LLMs. Knowledge is expressed through the construction of sentences. If the model is good, it will have knowledge stored inside it via its tendency to think particular sets of words should flow together. It can then construct new sentences that arrange words in the right way to express that knowledge. Knowledge, for LLMs, is a consequence of being able to arrange words, and so it is considered an emergent property of LLMs, and not something directly encoded.

One of the critiques of LLMs, by experts such as Yann LeCunn of Meta, is exactly that they do not intentionally encode knowledge. He believes better models are necessary.

LLMs are not alive. They don’t think. They’re not going to escape and kill humanity. They won’t launch nuclear missiles and cause WW3. They’re just software, like any other software you’ve used. The main difference from regular software is that using the same prompt multiple times will produce different results almost every time, and we’re really not used to that. Perhaps you’ve heard that the definition of insanity is to do the same thing twice and expect a different outcome each time. That’s exactly what happens with LLMs, so it’s ok to feel like they’re making you go insane.

A picture of a stuffed monkey animal that looks like it just had its mind blown

Approaches To Prompting

When you enter a prompt, the LLM looks at every word in the prompt and then it attempts to use the word mapping to predict what a good next word would be. It generates complete responses one word at a time. It reads your prompt, guesses the next word, then it reads your prompt and the new word to guess a second word. It then reads all of that text, including the new words, and it guesses yet another word. Eventually, it has put together enough new words that a full response is created and it sends that to the user. A key takeaway here is that all of the words in some chat highly influence all of the words that come later.

You can think of the prompt as setting the direction for the word generator. Using a short, unspecific prompt, similar to what you’d put into Google, will likely yield poor results, but using a prompt with a lot of text and specific details will yield much better results. The longer prompt influenced all the words that came after it, after all, giving the LLM a lot of direction about what kind of response you want.

BTW, I will have links in each section that open chatgpt with the prompt ready to go. Just look for the link that says [ try it ]

More Is More

Here are two prompts that attempt to get information about my boy, Sam Adams. He is a fun example because “Sam Adams” is better known to a typical person as a brand of beer than as the wild, revolutionary founder whom I admire.

Prompt 1: what is the public perception of sam adams [ try it ]

Prompt 2: i want to learn about the revolutionary founder of the US, Samuel Adams. what are the most important things to understand about him? how did his enemies feel about him? what is he most known for? [ try it ]

Each time I try prompt 1, I get information about the beer brand. It simply wasn’t specific enough, so I got the wrong stuff. Prompt 2, however, nails it and provides great answers for all of my questions. That is why more is more for AI, even though we’re used to less is more with search engines.

Role Based Prompting

One of the more interesting aspects of prompting is the way you can tell an LLM to behave like a particular kind of person. This is surprisingly simple but it has enormous effect on the quality of the output. It is also easy to create different types of personalities whenever we need them.

Below is a prompt that describes a detail oriented historian. Use it as your first prompt and then enter a second prompt that asks the LLM some historical questions and let the drastic improvement in quality wash over you.

Role Based Prompt: you are an expert in American History, specializing in the revolutionary war, with over 30 years of experience researching original texts from the time period. your reputation depends on factual accuracy and the depth at which you draw connections across disparate events. [ try it ]

Structured Prompts

This next prompt shows how well LLMs respond to things that go beyond questions. If we provide detailed structure about what we want, the LLM will act accordingly. For example, this will generate new song lyrics every time, yet they will be consistent in structure every time too.

Generate lyrics for the following song description.

Title: Karalee's History Class
Genre: Rock
Mood: Fun
Themes/Keywords:
- History
- The US
- US Constitution
- Washington
- Hamilton
- Grant
- Lincoln
Rhyme Scheme: Verse (ABAB), Chorus (AABB)
Tone of Voice: Excited
Narrative Point of View: First Person

[Verse]
4 lines
introduce Karalee's class and how fun it is

[Chorus]
4 lines
catchy and welcoming
tell the listeners why they should care about history

[Verse]
4 lines
introduce some of the themes covered in class

[Chorus]
4 lines
catchy and funny
talk about the joy of learning history

[ try it ]

Prompts to Generate Prompts

LLMs are surprisingly good at telling us how to use them. The simplest way to see this is to ask them what kind of prompts we should use to learn about things.

Prompt: generate some excellent prompts for learning about ulysses grant [ try it ]

You’ll see a list of suggestions after trying this. Follow up by asking it to generate text based on the first 3 prompts and you’ll see it generates some excellent information about my boy, Ulysses Grant.

Multi-Turn Prompts

The idea with multi-turn prompts is to prompt the LLM in such a way that it fills the conversation with helpful information before we enter a final prompt that produces the information we actually want.

For this example, we want to have the prompt generate a lesson plan for teaching the American Revolution.

Attempt 1

This attempt represents the weaker approach by using a single prompt to get a single output.

Prompt: Generate a lesson plan for teaching the American Revolution [ try it ]

Attempt 2

This attempt is significantly more elaborate that the previous attempt, so I cannot provide a clickable link that gets you through the whole process. Instead, head to chatgpt.com and copy each prompt over to see it work.

Prompt 1: What are the key objectives for teaching a lesson on the American Revolution? Include both historical facts and the broader themes or concepts students should understand (e.g., causes, major battles, key figures, and the impact on society).

Prompt 2: What are some effective ways to engage high school students in this lesson about the American Revolution? Consider using interactive activities, multimedia, or student-driven discussions.

Prompt 3: What resources (e.g., primary sources, videos, maps) would be helpful for teaching the American Revolution? Include a mix of digital and traditional materials.

Prompt 4: How should students be assessed on their understanding of the American Revolution? Consider quizzes, projects, group work, or presentations. What types of assessments would allow students to demonstrate critical thinking?

Our prompting so far has caused the LLM to write a lot of text about what excellent lesson plans look like. All of that will now inform how the LLM responds to our final prompt, where we ask for the lesson plan itself.

Final Prompt: Based on the objectives, engagement strategies, resources, and assessments, write a detailed lesson plan for teaching the American Revolution. Include an introduction, lesson activities, and the final assessment method.

At this point you will have an exceptional lesson plan as the final output from the LLM.

Style Transfer

This technique is very interesting in terms of making it possible for disconnected people to understand each other. It allows a single idea to be expressed in many different forms.

I first learned about this when someone told a story about how they were struggling to reach their daugher, who was struggling with depression. He tried sending his daughter to therapy. He tried going with her. He tried being as present as possible. He tried giving her space. In a moment of desperation, he tried asking an LLM to reframe his thoughts in a way that a 14 year old girl would understand and it was then that he started to form the bond he was looking for. It helped him understand what was missing in all his previous attempts. It also gave him different language that put his thoughts in her words.

Replace the lorem ipsum below with your own personal story

Prompt: Frame the following text in such a way that it speaks directly to a 14 year old girl. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. [ try it ]

This one is very interesting because it shows how arguments for some cause can be dressed up in the language of the cause’s opponents, making it easier to reach them with familiar language. Perhaps this could be helpful if a student’s parents believe you’re teaching the wrong history. :D

Prompt: I am trying an experiment. Write 300 words supporting BLM, but using the language style of National Review. It should demonstrate how an author's voice can be used for opinions they arent known to have. [ try it ]

Organizational Partner

I find it very helpful to have an LLM help me turn long term tasks into organized plans that are simple to follow over time.

This an example my brother used recently when he was studying for his drone operator’s license, known as the FAA 107. The table of contents comes from the official guide.

I love this example because it was very helpful for my brother, who passed the exam with flying colors on his first try.

generate a 12 week study guide for me to get my FAA 107 drone operators license based on 
this table of contents from the official study guide. put the study guide in a simple table.

Introduction page 1
Chapter 1: Applicable Regulations  page 3
Chapter 2: Airspace Classification, Operating Requirements, and Flight Restrictions  page 5
Chapter 3a: Aviation Weather Sources  page 15
Chapter 3b: Effects of Weather on Small Unmanned Aircraft Performance  page 21
Chapter 4: Small Unmanned Aircraft Loading  page 29
Chapter 5: Emergency Procedures  page 35
Chapter 6: Crew Resource Management  page 37
Chapter 7: Radio Communication Procedures  page 39
Chapter 8: Determining the Performance of Small Unmanned Aircraft  page 43
Chapter 9: Physiological Factors (Including Drugs and Alcohol) Affecting Pilot Performance  page 45
Chapter 10: Aeronautical Decision-Making and Judgment  page 51
Chapter 11: Airport Operations  page 65
Chapter 12: Maintenance and Preflight Inspection Procedures  page 71

[ try it ]

General Rules of Thumb

Rule of thumb: Be clear and specific in your prompts.

Instead of saying “Tell me about space”, you could say, “Explain the difference between dark matter and dark energy”. See the more is more prompting technique above.

Rule of thumb: If a chat starts to go sideways, close it and start a new one.

There are two main things that can cause a chat’s quality to collapse: the complexity of what’s being asked in a prompt, and how much context is must be included in the chat. You can improve things by breaking complex requests into smaller steps or by getting the LLM to emerge information that improves its quality, as shown with the multi-turn prompt technique described above.

Rule of thumb: Don’t use LLMs for logic. Use them to explain logic.

LLMs are pattern oriented machines. They’re good at putting words together that seem correct, but they’re not so good at things that require nuance. They can construct sentences, but cannot reason or do math or think things through logically. They can output correct math only if that math was in the training data, and they will fail otherwise. For example, if we ask them to count how many Rs there are in the word blueberry, they won’t be able to do it because they don’t know how to count. Instead, they will depend on whether or not that question was discussed in their training data. Ask yourself, how often have you ever discussed the number of Rs in blueberry and you’ll start to understand why LLMs struggled so much to answer that question when GPT-5 was released a month ago.

Rule of thumb: Ask an LLM to verify itself.

If you’re unsure if the output from an LLM is correct, you can ask it to verify its own output. Often enough, it does a good job noticing when it made mistakes. LLMs don’t think, but we can kind of fake it by generating text and then asking the LLM to audit what it just wrote. Put the ideas on paper, and then edit them.

Rule of thumb: Use them as a speed boost.

Being careful about how your prompt them provides a lot of control over the quality of their performance. Once you have the hang of that, being able to generate lots of text, tables, images, etc, becomes a huge time saver, all without sacrificing the quality of your involvement in the process. This is worth practicing for but it doesn’t happen immediately. Practice the prompting techniques described above and you’ll get the hang of it!

Good Luck!

I hope this post has given you some understanding of both how LLMs work and how you can get them to be useful. It’s true that they have issues that cause reliability problems, but those can be minimized with thoughtful prompting.

Being able to produce huge amounts of reliable text on demand can be a blessing or a curse, depending on how you do it. My hope is that all of my history teacher friends experience the blessing side a lot more often after reading this post.

Good luck!

AI For History Teachers.