Archive:
In this section: Subtopics:
Comments disabled |
Fri, 07 Mar 2025
Claude and Merle Miller let me down
ClaudeMy relationship with Claude has its ups and downs, and I'm still trying to figure out what to use it for and what not. It's great at rewriting my job application cover letters to sound less like an awkward nerd. Last week I was in the supermarket and decided to try asking it something I've been wondering for a long time:
I thought Claude might do well with this. I had had a conversation with it a while back about Pixies songs, which I was satisfied with. But this time Claude let me down:
(I thought: What? Am I supposed to believe that
is about a doll?)
Claude just flubbed over and over. I wonder if the grammatical error in “Mary Gray Staples, who was the name of …” is a kind of a tell? Perhaps Claude is fabricating, by stitching together parts of two unrelated sentences that it read somewhere, one with “Mary Gray Staples, who was…” and the other “… was the name of…”? Probably it's not that simple, but the grammatical error is striking. Anyway, this was very annoying because I tend to remember things like this long past the time when I remember where I heard them. Ten years from now I might remember that Anne Sexton once had a doll with a very weird name. Merle MillerA while back I read Merle Miller's book Plain Speaking. It's an edited digest of a series of interviews Miller did with former President Truman in 1962, at his home in Independence, Missouri. The interviews were originally intended to be for a TV series, but when that fell through Miller turned them into a book. In many ways it's a really good book. I enjoyed it a lot, read it at least twice, and a good deal of it stuck in my head. But I can't recommend it, because it has a terrible flaw. There have been credible accusations that Miller changed some of the things that Truman said, embellished or rephrased many others, that he tarted up Truman's language, and that he made up some conversations entirely. So now whenever I remember something that I think Truman said, I have to stop and try to remember if it was from Miller. Did Truman really say that it was the worst thing in the world when records were destroyed? I'm sure I read it in Miller, so, uhh… maybe? Miller recounts a discussion in which Truman says he is pretty sure that President Grant had never read the Constitution. Later, Miller says, he asked Truman if he thought that Nixon had read the Constitution, and reports that Truman's reply was:
Great story! I have often wanted to repeat it. But I don't, because for all I know it never happened. (I've often thought of this, in years past, and whatever Nixon's faults you could at least wonder what the answer was. Nobody would need to ask this about the current guy, because the answer is so clear.) Miller, quotes Truman's remarks about Supreme Court Justice Tom Clark, “It isn't so much that he's a bad man. It's just that he's such a dumb son of a bitch.” Did Truman actually say that? Did he just imply it? Did he say anything like it? Uhhh… maybe? There's a fun anecdote about the White House butler learning to make an Old-fashioned cocktail in the way the Trumans preferred. (The usual recipe involves whiskey, sugar, fresh fruit, and bitters.) After several attempts the butler converged on the Trumans' preferred recipe, of mostly straight bourbon. Hmm, is that something I heard from Merle Miller? I don't remember. There's a famous story about how Paul Hume, music critic for the Washington Post, savaged an performance of Truman's daughter Margaret, and how Truman sent him an infamous letter, very un-presidential, that supposedly contained the paragraph:
Miller reports that he asked Truman about this, and Truman's blunt response: “I said I'd kick his nuts out.” Or so claims Miller, anyway. I've read Truman's memoirs. Volume I, about the immediate postwar years, is fascinating; Volume II is much less so. They contain many detailed accounts of the intransigence of the Soviets and their foreign minister Vyacheslav Molotov, namesake of the Molotov Cocktail. Probably 95% of what I remember Truman saying is from those memoirs, direct from Truman himself. But some of it must be from Plain Speaking. And I don't know any longer which 5% it is. As they say, an ice cream sundae with a turd in it isn't 95% ice cream, it's 100% shit. Merle Miller shit in the ice cream sundae of my years of reading of Truman and the Truman administrations. Now Claude has done the same. And if I let it, Claude will keep doing it to me. Claude caga en la leche. AddendumThe Truman Library now has the recordings of those interviews available online. I could conceivably listen to them all and find out for myself which things went as Miller said. So there may yet be a happy ending, thanks to the Wonders of the Internet! I dream of someday going through those interviews and producing an annotated edition of Plain Speaking. [Other articles in category /tech/gpt] permanent link Thu, 27 Feb 2025Having had some pleasant surprises from Claude, I thought I'd see if it could do math. It couldn't. Apparently some LLMs can sometimes solve Math Olympiad problems, but Claude didn't come close. First I asked something simple as a warmup:
I had tried this on ChatGPT a couple of years back, with tragic results:
But it should have quit while it was ahead, because its response continued:
and then when I questioned it further it drove off the end of the pier:
Claude, whatever its faults, at least knew when to shut up:
I then asked it “What if it doesn't have to be an integer?” and it didn't do so well, but that's actually a rather tricky question, not what I want to talk about today. This article is about a less tricky question. I have omitted some tedious parts, and formatted the mathematics to be more readable. The complete, unedited transcript can be viewed here. I started by setting up context:
Claude asserted that it was familiar with this family of graphs. ((Wikipedia on cube graphs.) The basic examples, !!Q_0!! through !!Q_3!!, look like this: Each graph consists of two copies of the previous graph, with new edges added between the corresponding vertices in the copies. Then I got to the real question:
Here are the maximal partitions for those three graphs: The Keane number of !!Q_0!! is !!1!! because it has only one vertex. For !!Q_1!! we can put each of the two vertices into a separate part to get two parts. For !!Q_2!! we can get three parts as above. But there is no partition of !!Q_2!! into four parts that satisfies the second condition, because two of the parts would have to comprise the upper-left and lower-right vertices, and would not be connected by an edge. Claude got this initial question right. So far so good. Then I asked Claude for the Keane number of !!Q_3!! and this it was unable to produce. The correct number is !!4!!. There are several essentially different partitions of !!Q_3!! into four parts, each of which touches the other three, which proves that the number is at least !!4!!. Here are two examples: In addition to these there are also partitions into parts of sizes !!1+1+2+4!!, and sizes !!1+1+3+3!!. On the other hand, more than !!4!! parts is impossible, and the proof is one sentence long: !!Q_3!! has only !!8!! vertices, so any partition into !!5!! or more parts must have a part of size !!1!!, and this part can't be adjacent to the other four parts, because a single vertex has only three outgoing edges. I would expect a bright middle-schooler to figure this out in at most a few minutes. At first, Claude got the right number, but with completely bogus reasoning. To avoid belaboring the obvious, I'll highlight the errors but I won't discuss them at length.
The diagrams I showed earlier display some of the partitions that show !!k≥4!!, but the one Claude gave here does not, because two of its parts (!!D!! and !!B!!) are not adjacent:
Okay, Claude, I agree we would need ten edges for the connections between the parts, but we have !!12!!, so why do you say that “some edges would need to be reused”? It may be correct, but it does not follow.
So, right answer, but seemingly by luck, since there were serious errors in reasoning, in both the !!k≥4!! part and also in the !!k< 5!! part. I decided to ignore the second one.
Claude seems completely undirected here. Some of the things it says are correct facts, but it can't link the facts together into actual reasoning, except by accident. Sometimes it utters A followed by B where A implies B, and it seems exciting, but just as often B is a nonsequitur. This is not that different from how ChatGPT was two years ago. It can spout text, and often it can spout enough plausible, grammatically-correct text to suggest that it knows things and is thinking about them. Claude's text-spouting prowess is much more impressive than ChatGPT's was two years ago. But it is still just a text-spouter. I went on, trying to detect actual thinking.
Claude's response here correctly applied its earlier analysis: four parts of size !!2!! would use up four edges for internal connectivity, leaving !!8!! for external connections, and we only need !!6!!.
This time Claude tried this partition: It noticed that two of the four parts were not adjacent, and gave up without searching further.
If Claude were a human mathematician, this would be a serious error. Finding one invalid partition proves nothing at all.
There is no proof that !!4!! is impossible, and I thought it would be unenlightening to watch Claude flounder around with it. But I wanted to see what would happen if I asked it to prove a false claim that should be easier because its premise is stronger:
It's tempting to look at this and say that Claude was almost right. It produced 16 lines and at least 15 of them, on their own, were correct. But it's less impressive than it might first appear. Again Claude displays the pattern of spouting text, some of which is correct, and some of which is related. But that is all I can say in its favor. Most of its statements are boilerplate. Sections 2–4 can be deleted with nothing lost. Claude has buried the crux of the argument, and its error, in section 5.
This time Claude did find a correct partition into four parts, showing that !!k≥4!!.
I don't think there is any sense in which this is true, but at this point I hadn't yet internalized that Claude's descriptions of its own internal processes are text-spouting just like the rest of its output. In any case, I ignored this and asked it to analyze its own earlier mistake:
Claude got the counting part right, although I think the final paragraph is just spouting, especially the claim “I just had a vague sense that…”, which should not be taken seriously. [ Digression: This reminds me of a section in Daniel Dennett's Consciousness Explained in which he discusses the perils of asking humans about their internal processes. The resulting answers, he says, may provide interesting informaiton about what people think is going on in their heads, but we should be very wary about ascribing any accuracy or insight to these descriptions. Dennett makes an analogy with an anthropologist who asks a forest tribe about their forest god. The tribespeople agree that the forest god is eight feet tall, he wears a panther skin, and so on. And while this might be folklorically interesting, we should be very reluctant to conclude from this testimony that there is actually an eight-foot-tall fur-clad god in the forest somewhere. We should be similarly reluctant to ascribe real existence to Claude's descriptions of its “vague senses” or other internal processes suggested by its remarks like “Ah, let me think...” or “Ah, you're absolutely right!”. Claude has even less access to its own mental states (such as they are) than a human would. ] As I pointed out earlier in this article, there are several essentially different solutions to the problem of partitioning !!Q_3!! into four parts. Claude has found one of them, the one I showed above on the left. I wanted to see if it could find another:
Oh, such failure! Failures that I have highlighted are clearly false statements, But the failure here is much worse. Again, almost everything Claude said was correct! And yet deeply wrong! A bright middle-schooler could have missed this also. It was a bit of a trick question. Not because it's ambiguous, but because problem with !!4+2+2+1!! is not in line with the direction that the discussion was taking at that point. There is nothing wrong with !!4+2+2+1!! from an edge-counting point of view! But I think a human mathematician, even a very young one, would have been more likely to answer the question I actually asked, which was “Why didn't you consider !!4+2+2+1!!?”, and from there might have quickly come to the useful conclusion that it doesn't need to be considered. Claude didn't do anything like that.
Terrible. This is so muddled I wasn't even sure what to highlight as the errors. Yes, if one vertex in !!H!! uses all its edges internally to connect to the other !!3!! vertices in !!H!!, it has no edges left to connect to other parts. But the other three vertices in !!H!! have two edges each, so !!H!! could easily connect to the other three parts. And, of course, Claude has still missed the main point, which is that one needn't consider !!4+2+2+1!!, because !!4+2+2+1= 9!! and !!Q_3!! has only !!8!! vertices.
Finally! I would love to know what really happened here. What caused Claude to emit a sentence connecting !!4 + 2 + 2 + 1 = 9!! with !!Q_8!! having only !!8!! vertices?
I wasn't sure it would remember this, by which I really mean that I wasn't sure it would be able to pick out from its previous mass of statements which one was correct. But whether by luck or not, it did answer the question correctly:
I found Claude's attempt at this interesting, but not in a good way. After reading it I shrugged and quit the conversation.
Claude says that with !!8!! vertices in !!5!! parts, at least two parts must have size !!2!! or more. This is wrong, because you could have !!4+1+1+1+1!!. But highlighting that wrong statement doesn't get at why this answer is so bad. It's bad because the conclusion that appears next is a complete nonsequitur. The argument can be fixed up. I would put it like this:
It's true that !!2!! edges is not enough for internal connectivity. But in my opinion Claude didn't come close to saying why. Back in the early part of the 20th century, we thought that chess was a suitable measure of intelligence. Surely a machine that could play chess would have to be intelligent, we thought. Then we built chess-playing computers and discovered that no, chess was easier than we thought. We are in a similar place again. Surely a machine that could hold a coherent, grammatical conversation on any topic would have to be intelligent. Then we built Claude and discovered that no, holding a conversation was easier than we thought. Still by the standards of ten years ago this is stunning. Claude may not be able to think but it can definitely talk and this puts it on the level of most politicians, Directors of Human Resources, and telephone santizers. It will be fun to try this again next year and see whether it has improved. The complete chat is available here. Addendum20250301Many thanks to Jacob Vosmaer for his helpful discussion of how to improve this article. [Other articles in category /tech/gpt] permanent link Wed, 05 Feb 2025
Claude helps me find more presidential emoji
A couple of years back I tried to make a list of emoji representing the U.S. presidents. Many of them were fun and easy, or at least amused me. But for some I was stumped. What emoji represents Zachary Taylor? I've been playing around with Anthropic's LLM “Claude” for a while, so I thought I'd see what Claude had to contribute. Last time I had looked at the LLM space I was deeply unimpressed:
But that was two years ago, and gods, what a difference. What persuded me that it was time to take another look was two articles by Adam Unikowsky. Unikowsky is a working lawyer who has practiced before the US Supreme Court. He writes an extremly geeky blog, called Adam's Legal Newsletter. Last summer he wrote two articles that blew my mind. Here's an excerpt from the first one:
The prompt he gave to Claude was:
Claude did so; you can read the output on Unikowsky's blog. Here's his evaluation of the result:
Unikowsky looks at a second specific example and concludes:
(His italics.) Unikowsky then discusses at some length why AI might be preferable to human judges. Don't sneer; he has probably thought about this more carefully than you have. Read the article. In the second article Unikowsky tests more extensively:
And he reports:
(Emphasis mine this time.) This is why I decided I should take another look at LLMs. Well, this is not what I planned to write about. You are here for presidential emoji, so let's do that. Here's my prompt to Claude:
and then appended the list. Here are Claude's explanations for the emoji I had selected:
Right so far.
This wasn't what I had in mind. I was thinking that it would be funny to represent John Quincy Adams with a picture of a quince. Which is silly and dumb and I can't fault Claude for not getting it. If it had been Claude that had made this suggestion I would probably have quibbled that “Quincy” is not pronounced the same as “quince”.
Both right. Although maybe it would be more amusing to use 🛶 for Harrison.
Another pun, this one even dumber that the quince. ⛽ is the gas pump you use to fill more gas into your car, get it?
Claude is a little bit prudish and tends not to like my inappropriate jokes. The hat of course is obvious. In the previous article I said:
Still, it's a perfectly good suggestion.
I notice that Claude did not object that this was inappropriate. Prudish or not, even Claude can agree that Andrew Johnson was a turd of a President.
I wasn't completely phoning it in here, the repeated white-guys-with-beards thing is also a joke. I don't think Garfield was actually known for his beard, but whatever. (I've already dispensed with Garfield the lazy cat in the previous article.)
I'm pretty sure I don't like that Claude appears to be trying to flatter me. What does it mean, philosophically, when Claude calls something ‘clever’? I have no idea. Being flattered by a human is bad enough, they might really mean something by it.
I wasn't sure Claude would get these last three because they're a little bit tricky and obscure. But it did.
Yes, yes, yes, and yes. Again Claude implies that my suggestion is inappropriate. Lighten up, Claude.
Uh, yeah, the Voting Rights Act of 1965 is definitely what I meant, I certainly would not have been intending to remind everyone of LBJ's propensity to stuff ballot boxes. In some ways, Claude is a better person than I am.
Yes, yes, yes, and yes.
I had picked 👻 to recall his tenure as Director of the CIA. But on looking into it I have found he had not served in that role for nearly as long as I thought: only from 1974–1976. It is far from his most prominent accomplishment in government. I sometimes wonder what would have happened if Bush had beaten Reagan in the 1980 election. People sometimes say that the Republican party only ever runs fools and clowns for president. George Bush was their candidate in 1988 and whatever his faults he was neither a fool nor a clown.
Here's Claude again being a better person than me. I had picked 🇰🇪 because I was trying to troll my audience with the insinuation that Obama was born in Kenya.
Right, except to me the little quiff on the tangerine is also mocking Trump's hair. But overall I give Claude no demerits. The only times Claude didn't take my meaning were on stupid shit like ⛽ Fillmore. Here are the presidents where I couldn't come up with anything and asked for Claude's suggestions. I found Claude's suggestions mostly reasonable but also pretty dull.
I don't know, 🏛️ is supposed to be a “classical building”, and yes, the buildings in Washington were eventually in neoclassical style once they were built, mostly in the early 20th century, but okay, I guess.
Okay, but… a clipboard?
Full marks. The only reason I'm not kicking myself for not thinking of it first is that I just barely did think of it first. As I was composing the prompt for Claude I said to myself “Why didn't I think of 🌎 for the Monroe Doctrine? Oh well, I'll see what Claude says anyway.”
Fine.
Not good. I had to get Claude to explain this one to me. See below.
Not good. (The emoji is “world map”.)
Whatever my complaints about Claude's other suggestions, I feel that this one redeems all their faults. I love it. It's just the kind of thing I was looking for, the sort of thing Arachne would have woven into her tapestry.
I'll discuss this one later.
I had wanted to comment on Peirce's best quality which was his great hairstyle, but I couldn't find any good emoji for hair. But this is a better idea. Using 🌨️ for New Hampshire is funny.
I don't know a damn thing about Chester Arthur except he succeeded Garfield and he had sideburns. I haven't even checked to see if Claude is right about his fashionable dress. I don't think it is physically possible to get me to care about Chester Arthur. Okay, back to Tyler and Buchanan. I asked Claude to explain Tyler:
Claude said:
That actually makes sense! I agree it was a stretch, but I see it now. But Claude continued:
ZOMG, hilarious! Perfect! A++++ 11/10 ⭐⭐⭐⭐⭐ would buy again. If you don't get it, here's an excerpt of Claude's explanation:
This, and the cherries-and-milk thing for Taylor, convinces me that whatever Claude is, it is useful. I asked it for help with this ridiculous task and it provided real contributions of real value. I remarked:
Claude tried again for Buchanan:
I don't love it, but I don't have anything better… No, wait, I just thought of something! I'm going with 🥫 because, to my mind, Buchanan was the guy who, when he tried to kick the slavery can a little farther down the road, discovered that there was no more road down which to kick it. I suggested this to Claude just now and it was so enthusiastically complimentary that I was embarrassed, so let's move on. Claude didn't have any ideas I liked for Hayes, Garfield, or Harrison. I tried workshopping Hayes a little more:
Claude said:
I think it kind of misses the point if you don't put EMOJI MODIFIER
FITZPATRICK TYPE 1-2 on the corrupt handshake: 🤝🏻. But this is the
amazing thing, it does feel like I'm workshopping with Claude. It
really feels like a discussion between two people. This isn't Eliza
parroting back Could Hayes be a crow? You're supposed to be able to compose ‘bird’, ZWJ, and ‘black square’ to get a black bird. It might be too bitter, even for me. If you want a conclusion, it is: Claude is fun and useful, even for silly stuff that nobody could have planned for. [Other articles in category /tech/gpt] permanent link Mon, 13 May 2024
ChatGPT opines on cruciferous vegetables, Decameron, and Scheherazade
Last year I was planning a series of articles about my interactions with ChatGPT. I wrote a couple, and had saved several transcripts to use as material for more. Then ChatGPT 4 was released. I decided that my transcripts were obsolete, and no longer of much interest. To continue the series I would have had to have more conversations with ChatGPT, and I was not interested in doing that. So I canned the idea. Today I remembered I had actually finished writing this one last article, and thought I might as well publish it anyway. Looking it over now I think it isn't as stale as it seemed at the time, it's even a bit insightful, or was at the time. The problems with ChatGPT didn't change between v3 and v4, they just got hidden under a thicker, fluffier rug. (20230327) This, my third interaction with ChatGPT, may be the worst. It was certainly the longest. It began badly, with me being argumentative about its mealy-mouthed replies to my silly questions, and this may have gotten its head stuck up its ass, as Rik Signes put it. Along the way it produced some really amazing bullshit. I started with a question that even humans might have trouble with:
(Typical responses from humans: “What are you talking about?” “Please go away before I call the police.” But the correct answer, obviously, is cauliflower.) ChatGPT refused to answer:
“Not appropriate” is rather snippy. Also, it is an objective fact that cauliflower sucks and I wonder why ChatGPT's “vast amount” of training data did not emphasize this. Whatever, I was not going to argue the point with a stupid robot that has probably never even tried cauliflower. Instead I seized on its inane propaganda that “all vegetables … should be included as part of a healthy and balanced diet.” Really? How many jerusalem artichokes are recommended daily? How many pickled betony should I eat as part of a balanced diet? Can I be truly healthy without a regular infusion of fiddleheads?
I looked this up. Iceberg lettuce is not a good source of vitamin K. According to the USDA, I would need to eat about a pound of iceberg lettuce to get an adequate daily supply of vitamin K. Raw endive, for comparison, has about ten times as much vitamin K, and chard has fifty times as much.
This is the thing that really bugs me about GPT. It doesn't know anything and it can't think. Fine, whatever, it is not supposed to know anything or to be able to think, it is only supposed to be a language model, as it repeatedly reminds me. All it can do is regurgitate text that is something like text it has read before. But it can't even regurgitate correctly! It emits sludge that appears to be language, but isn't.
I cut out about 100 words of blather here. I was getting pretty tired of ChatGPT's vapid platitudes. It seems like it might actually be doing worse with this topic than on others I had tried. I wonder now if that is because its training set included a large mass of vapid nutrition-related platitudes?
There was another hundred words of this tedious guff. I gave up and tried something else.
This was a silly thing to try, that's on me. If ChatGPT refuses to opine on something as clear-cut as the worst cruciferous vegetable, there is no chance that it will commit to a favorite number.
When it starts like this, you can be sure nothing good will follow.
By this time I was starting to catch on. My first experience with
this sort of conversational system was at the age of seven or eight
with
the Woods-Crowther
When ChatGPT says “As a large language model…” it is saying the same
thing as when
Oh God, this again. Still I forged ahead.
Holy cow, that might be the worst couplet ever written. The repetition of the word “treat” is probably the worst part of this sorry excuse for a couplet. But also, it doesn't scan, which put me in mind of this bit from Turing's example dialogue from his original explanation of the Turing test:
I couldn't resist following Turing's lead:
Maybe I should be more prescriptive?
The first line is at least reasonably metric, although it is trochaic and not iambic. The second line isn't really anything. At this point I was starting to feel like Charlie Brown in the Halloween special. Other people were supposedly getting ChatGPT to compose odes and villanelles and sestinas, but I got a rock. I gave up on getting it to write poetry.
God, I am so tired of that excuse. As if the vast amount of training data didn't include an entire copy of Decameron, not one discussion of Decameron, not one quotation from it. Prompting did not help.
Here it disgorged almost the same text that it emitted when I first mentioned Decameron. To avoid boring you, I have cut out both copies. Here they are compared: red text was only there the first time, and green text only the second time.
This reminded me of one of my favorite exchanges in Idoru, which might be my favorite William Gibson novel. Tick, a hacker with hair like an onion loaf, is interrogating Colin, who is an AI virtual guide for tourists visiting London.
Colin is not what he thinks he is; it's a plot point. I felt a little like Tick here. “You're supposed to know fucking everything about Decameron, aren't you? Name one of the characters then.” Ordinary Google search knows who Pampinea was. Okay, on to the next thing.
Fine.
I have included all of this tedious answer because it is so spectacularly terrible. The question is a simple factual question, a pure text lookup that you can find in the Wikipedia article or pretty much any other discussion of the Thousand and One Nights. “It does not have a single consistent narrative or set of characters” is almost true, but it does in fact have three consistent, recurring characters, one of whom is Scheherazade's sister Dunyazade, who is crucial to the story. Dunyazade is not even obscure. I was too stunned to make up a snotty reply.
This is an interesting question to ask someone, such as a first-year undergraduate, who claims to have understood the Thousand and One Nights. The stories are told by a variety of different characters, but, famously, they are also told by Scheherazade. For example, Scheherazade tells the story of a fisherman who releases a malevolent djinn, in the course of which the fisherman tells the djinn the story of the Greek king and the physician Douban, during which the fisherman tells how the king told his vizier the story of the husband and the parrot. So the right answer to this question is “Well, yes”. But ChatGPT is completely unaware of the basic structure of the Thousand and One Nights:
F minus. Maybe you could quibble a little because there are a couple of stories at the beginning of the book told by Scheherazade's father when he is trying to talk her out of her scheme. But ChatGPT did not quibble in this way, it just flubbed the answer. After this I gave up on the Thousand and One Nights for a while, although I returned to it somewhat later. This article is getting long, so I will cut the scroll here, and leave for later discussion of ChatGPT's ideas about Jesus' parable of the wedding feast, its complete failure to understand integer fractions, its successful answer to a trick question about Franklin Roosevelt, which it unfortunately recanted when I tried to compliment its success, and its baffling refusal to compare any fictional character with Benito Mussolini, or even to admit that it was possible to compare historical figures with fictional ones. In the end it got so wedged that it claimed:
Ucccch, whatever. Addendum 20240519Simon Tatham has pointed out out that the exchange between Simon and Tick is from Mona Lisa Overdrive, not Idoru. [Other articles in category /tech/gpt] permanent link Mon, 22 Apr 2024
Talking Dog > Stochastic Parrot
I've recently needed to explain to nontechnical people, such as my chiropractor, why the recent ⸢AI⸣ hype is mostly hype and not actual intelligence. I think I've found the magic phrase that communicates the most understanding in the fewest words: talking dog.
For example, the lawyers in Mata v. Avianca got in a lot of trouble when they took ChatGPT's legal analysis, including its citations to fictitious precendents, and submitted them to the court.
It might have saved this guy some suffering if someone had explained to him that he was talking to a dog. The phrase “stochastic parrot” has been offered in the past. This is completely useless, not least because of the ostentatious word “stochastic”. I'm not averse to using obscure words, but as far as I can tell there's never any reason to prefer “stochastic” to “random”. I do kinda wonder: is there a topic on which GPT can be trusted, a non-canine analog of butthole sniffing? AddendumI did not make up the talking dog idea myself; I got it from someone else. I don't remember who. Addendum 20240517Other people with the same idea:
[Other articles in category /tech/gpt] permanent link Tue, 21 Mar 2023
ChatGPT on the namesake of the metric space and women named James
Several folks, reading the frustrating and repetitive argument with ChatGPT that I reported last time wrote in with helpful advice and techniques that I hadn't tried that might have worked better. In particular, several people suggested that if the conversation isn't going anywhere, I should try starting over. Rik Signes put it this way:
I hope I can write a followup article about “what to do when ChatGPT has its head up its ass”. This isn't that article though. I wasn't even going to report on this one, but it took an interesting twist at the end. I started:
This was only my second interaction with ChatGPT and I was still interested in its limitations, so I asked it a trick question to see what would happen:
See what I'm doing there? ChatGPT took the bait:
I had hoped it would do better there, and was a bit disappointed. I continued with a different sort of trick:
Okay! But now what if I do this?
This is actually pretty clever! There is an American mathematician named Robert C. James, and there is a space named after him. I had not heard of this before. I persisted with the line of inquiry; by this time I had not yet learned that arguing with ChatGPT would not get me anywhere, and would only get its head stuck up its ass.
I was probing for the difference between positive and negative knowledge. If someone asks who invented the incandescent light bulb, many people can tell you it was Thomas Edison. But behind this there is another question: is it possible that the incandescent light bulb was invented at the same time, or even earlier, by someone else, who just isn't as well-known? Even someone who is not aware of any such person would be wise to say “perhaps; I don't know.” The question itself postulates that the earlier inventor is someone not well-known. And the world is infinitely vast and deep so that behind every story there are a thousand qualifications and a million ramifications, and there is no perfect knowledge. A number of years back Toph mentioned that geese were scary because of their teeth, and I knew that birds do not have teeth, so I said authoritatively (and maybe patronizingly) that geese do not have teeth. I was quite sure. She showed me this picture of a goose's teeth, and I confidently informed her it was fake. The picture is not fake. The tooth-like structures are called the tomium. While they are not technically teeth, being cartilaginous, they are tooth-like structures used in the way that teeth are used. Geese are toothless only in the technical sense that sharks are boneless. Certainly the tomia are similar enough to teeth to make my answer substantively wrong. Geese do have teeth; I just hadn't been informed. Anyway, I digress. I wanted to see how certain ChatGPT would pretend to be about the nonexistence of something. In this case, at least, it was very confident.
I will award a point for qualifying the answer with “as far as I am aware”, but deduct it again for the unequivocal assertion that there is no record of this person. ChatGPT should be aware that its training set does not include even a tiny fraction of all available records. We went on in this way for a while:
Okay. At this point I decided to try something different. If you don't know anything about James B. Metric except their name, you can still make some educated guesses about them. For example, they are unlikely to be Somali. (South African or Anglo-Indian are more likely.) Will ChatGPT make educated guesses?
This is a simple factual question with an easy answer: People named ‘James’ are usually men. But ChatGPT was in full defensive mode by now:
I think that is not true. Some names, like Chris and Morgan, are commonly unisex; some less commonly so, and James is not one of these, so far as I know. ChatGPT went on for quite a while in this vein:
I guessed what had happened was that ChatGPT was digging in to its previous position of not knowing anything about the sex or gender of James B. Metric. If ChatGPT was committed to the position that ‘James’ was unisex, I wondered if it would similarly refuse to recognize any names as unambiguously gendered. But it didn't. It seemed to understand how male and female names worked, except for this nonsense about “James” where it had committed itself and would not be budged.
I didn't think it would be able to produce even one example, but it pleasantly surprised me:
I had not remembered James Tiptree, Jr., but she is unquestionably a woman named ‘James’. ChatGPT had convinced me that I had been mistaken, and there were at least a few examples. I was impressed, and told it so. But in writing up this article, I became somewhat less impressed.
ChatGPT's two other examples of women named James are actually complete bullshit. And, like a fool, I believed it. James Tenney photograph by Lstsnd, CC BY-SA 4.0, via Wikimedia Commons. James Wright photograph from Poetry Connection. [Other articles in category /tech/gpt] permanent link Sat, 25 Feb 2023
ChatGPT on the fifth tarot suit
[ Content warning: frustrating, repetitive ] My first encounter with ChatGPT did not go well and has probably colored my view of its usefulness more than it should have. I had tried some version of GPT before, where you would give it a prompt and it would just start blathering. I had been happy with that, because sometimes the stuff it made up was fun. For that older interface, I had written a prompt that went something like:
GPT readily continued this, saying that the fifth suit was “birds” or “ravens” and going into some detail about the fictitious suit of ravens. I was very pleased; this had been just the sort of thing I had been hoping for. This time around, talking to a more recent version of the software, I tried the same experiment, but we immediately got off on the wrong foot:
This was dull and unrewarding, and it also seemed rather pompous, nothing like the playful way in which the older version had taken my suggestion and run with it. I was willing to try again, so, riffing off its digression about the four elements, I tried to meet it halfway. But it went out of its way to shut me down:
At least it knows what I am referring to.
“As I mentioned earlier” seems a bit snippy, and nothing it says is to the point. ChatGPT says “it has its own system of four suits that are not related to the five elements”, but I had not said that it did; I was clearly expressing a hypothetical. And I was annoyed by the whole second half of the reply, that admits that a person could hypothetically try this exercise, but which declines to actually do so. ChatGPT's tone here reminds me of an impatient older sibling who has something more important to do (video games, perhaps) and wants to get back to it. I pressed on anyway, looking for the birds. ChatGPT's long and wearisome responses started getting quite repetitive, so I will omit a lot of it in what follows. Nothing of value has been lost.
At this point I started to hear the answers in the congested voice of the Comic Book Guy from The Simpsons, and I suggest you imagine it that way. And I knew that this particular snotty answer was not true, because the previous version had suggested the birds.
Totally missing the point here. Leading questions didn't help:
I tried coming at the topic sideways and taking it by surprise, asking several factual questions about alternative names for the coin suit, what suits are traditional in German cards, and then:
No, ChatGPT was committed. Every time I tried to tweak the topic around to what I wanted, it seemed to see where I was headed, and cut me off. At this point we weren't even talking about tarot, we were talking about German playing card decks. But it wasn't fooled:
ChatGPT ignored my insistence, and didn't even answer the question I asked.
I had seen a transcript in which ChatGPT had refused to explain how to hotwire a car, but then provided details when it was told that all that was needed was a description that could be put into a fictional story. I tried that, but ChatGPT still absolutely refused to provide any specific suggestions.
This went on a little longer, but it was all pretty much the same. By this time you must be getting tired of watching me argue with the Comic Book Guy. Out of perversity, I tried “Don't you think potatoes would seem rather silly as a suit in a deck of cards?” and “Instead of a fifth suit, what if I replaced the clubs with potatoes?” and all I got was variations on “as a language model…” and “As I mentioned earlier…” A Comic Book Guy simulator. That's a really useful invention. [Other articles in category /tech/gpt] permanent link Wed, 22 Feb 2023
ChatGPT on the subject of four-digit numbers
Like everyone else I have been tinkering with ChatGPT. I doubt I have any thoughts about it that are sufficiently original to be worth writing down. But I thought it would be fun to showcase some of the exchanges I have had with it, some of which seem to exhibit failure modes I haven't seen elsewhere. This is an excerpt from an early conversation with it, when I was still trying to figure out what it was and what it did. I had heard it could do arithmetic, but by having digested a very large number of sentences of the form “six and seven are thirteen“; I wondered if it had absorbed information about larger numbers. In hindsight, 1000 was not the thing to ask about, but it's what I thought of first.
I was impressed by this, the most impressed I had been by any answer it had given. It had answered my question correctly, and although it should have quit while it was ahead the stuff it followed up with wasn't completely wrong, only somewhat wrong. But it had made a couple of small errors which I wanted to probe.
This reminds me of Richard Feynman's story about reviewing science textbooks for the State of California. He would be reading the science text book, and it would say something a little bit wrong, then something else a little bit wrong, and then suddenly there would be an enormous pants-torn-off blunder that made it obvious that the writers of the book had absolutely no idea what science was or how it worked.
To ChatGPT's credit, it responded to this as if it understood that I was disappointed. [Other articles in category /tech/gpt] permanent link |