Thanks a lot for this post! As someone who has been doing some work on the use of AI tools for research in the social and behavioral sciences (resulting in a couple of workshops/talks and publications), I find this very interesting.
Regarding your hunch about the word "delve(d)", a recently published preprint provides some systematic empirical evidence for this:
Kobak, D., Márquez, R. G., Horvát, E.-Á., & Lause, J. (2024). Delving into ChatGPT usage in academic writing through excess vocabulary (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2406.07016
Just a heads up - the footnotes aren't present in the app and don't work properly on the web version. I think they're linked to the paid version of the post or something.
Appreciate the insight into your workflow. The way you're using it matches my own intuitions about what sort of stuff it could be useful for. Claude's response to the article suggested a thought to me. The part:
> 1. Writing capability: While you mentioned not liking my writing style, this might be an overgeneralization. Al writing capabilities can vary significantly depending on the specific instructions and context provided. With proper guidance, I can often adjust my style to better match a user's preferences.
I feel like this is a good demonstration of what LLMs are not good at/valuable for, and why a lot of it feels like such empty blather. Everything it wrote is true, but, like you wrote about using it for critique in general, it's vibes-level. It's going to say something like this regardless of *how* much in practice its suggestions will help. What's missing is that a (helpful) human would only write that if the advice crossed some threshold of genuine usefulness to me given my query. If it's just Claude writing that, I have no idea how seriously to take its critique.
This is a really good summary and pretty much matches my experience with LLMs (mostly ChatGPT).
I work with text from non-native writers a lot (either translating or editing) and I find that VERY considered prompting really helps, particularly getting the level of intervention right: tell it to merely proofread and it misses a lot of stuff, tell it to "revise" and it adds content.
As to writing new text: "get it to write a draft" vs "get it to edit or fill an outline" I feel it very much depends on the topic. The more obscure the topic and/or the higher level, the worse they are on writing vs editing. My attempts so far were useful as initial kernels which I ended up completely rewriting anyway.
They seem very good at reducing wordcount. Very useful for anyone who is bound by limits.
On the "not so good to really poor" side, I constantly come across confabulations, to the point that I only use LLM for anything with a factual element if I KNOW the facts already and otherwise just Google traditionally. I don't know if I ask weird questions but the amount of simply incorrect very simple information I get is astonishingly high, and not just on obscure niche topics but on practical things like "can you use product X in baking". On the other hand, I have not so far received dangerous suggestions unless you count "agony aunt" advice that counsels prioritising the asker's emotional comfort over pretty much any other social obligation, fully subscribing to the pop psychology gospel of selfishness. But I guess this is the type of material it's trained on and the biases of these models' human trainers.
On the other hand they're REALLY good at bouncing ideas off and clarifying my own thinking. Very useful for someone who thinks by talking and is both socially and cognitively lonely a lot of the time on a lot of topics.
The most surprising fail (apart from making up completely non existing papers and occasionally books as references) is (at least from ChatGPT) its inability to create citations in the required style. It will do it roughly but in many cases not QUITE right, especially for MHRA. It's better at Harvard, and excellent at APA.
I've just started using Copilot (cos it's free and convenient). So far I've been very impressed. It's not primarily an LLM AI: think of it as BING on steroids. It's been very effective at finding half-forgotten papers. And since it mainly functions as a search engine it shouldn't generate fictional papers. And unlike ChatGPT it has no cut-off date.
You presented a challenge. Here's it's reply:
Here are some quotes from prominent scholars discussing the replication crisis in social psychology:
Daniel Kahneman: Nobel Prize-winning psychologist Daniel Kahneman referred to the situation in social psychology as a "mess" and called on social psychologists to clean up their act.
Brian D. Earp and David Trafimow: In their paper, they discuss the crisis in confidence in social psychology, highlighting the importance of replication and falsification. They state, "The (latest) crisis in confidence in social psychology has generated much heated discussion about the importance of replication, including how it should be carried out as well as interpreted by scholars in the field2."
Wojciech Świątkowski and Benoît Dompnier: They wrote about the replicability crisis in social psychology, noting that the debates have reached a climax with the discovery of a relatively weak rate of replicability of published literature. They argue that social psychology could benefit from adopting a perspectivist approach to knowledge construction3.
These quotes reflect the growing concern among scholars about the reliability and validity of research findings in social psychology. The replication crisis has sparked important discussions about research practices and the need for more rigorous methods.
It gave three references and they all lead to real papers that looked relevant.
Great post. I've been dabbling with AI recently, most often Claude, and have reached similar conclusions.
I've also learned how effective it can be to ask if it has any questions or needs any more detail from me to follow a prompt.
Finally, being polite seems to help. I might be imagining this but at the very least, it makes me feel like I'm having a conversation which I'm sure affects my mindset and makes the process more enjoyable!
Thank you for such an illuminating and useful piece. I’ve not used ChatGPT all that much, but I have already switched to using it for research purposes, before or instead using Google, and several times it’s produced far better results. For example, I asked for information on Rosa Luxemburg’s relationship with Lenin, and it came up with exactly the kind of info I needed, while Google didn’t. And no, you aren’t wrong about “delve” - LLMs do seem to be fond of it. Another word they seem to like is “elevate”.
Interesting! I’ve gotten into the habit of pseudo-debating AI for my writing. I agree that it’s good at explaining things and can be provocative and lead me towards new insights when approached correctly. I too find it somewhat lacking when getting into the nitty gritty of the logic of an argument. But it’s been great as a sounding board and I’ve been worried about being mislead by misinformation, so it’s nice to hear when someone I trust uses AI in similar ways.
I thought I'd try something more adventurous with copilot: a demo of Mercator projection.
Copilot seemed to understanbd what I wanted:
Here’s the drawing that illustrates the Mercator projection step by step:
This should help you visualize the process:
Transparent Cylinder: Representing the cylindrical surface.
Wire-frame Globe Inside: Showing Earth with latitude and longitude lines.
Light Source at the Center: Projecting the globe's features onto the cylinder.
Projected Continents: Displaying how the continents would appear on the cylindrical surface.
Unwrapped Cylinder: Showing the Mercator map when the cylinder is flattened out.
I hope this makes the concept clearer! If you need any further adjustments or details, let me know. 😊
I had several iterations to get this far but the resulting image was nowhere near. (And it was my concept that it's claiming to explain to me in a very patronizing way!) I'd show the result if I knew how to introduce an image. (It's so different from what it described as to be rather funny!)
Thanks a lot for this post! As someone who has been doing some work on the use of AI tools for research in the social and behavioral sciences (resulting in a couple of workshops/talks and publications), I find this very interesting.
Regarding your hunch about the word "delve(d)", a recently published preprint provides some systematic empirical evidence for this:
Kobak, D., Márquez, R. G., Horvát, E.-Á., & Lause, J. (2024). Delving into ChatGPT usage in academic writing through excess vocabulary (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2406.07016
thanks for the citation!
Examples in (2) seem to kind of miss the point, those all seem like pejoratives, so of course they're annoying.
IMO should be something like "patriot", "family values", "taxpayer", "personal responsibility" etc.
Sorry, but I think you’re anthropomorphising text prediction too much here, though I appreciate your vignette about your uses.
thanks. Do you really think that the anthropomorphic language is a problem? I don't think people take it literally.
Just a heads up - the footnotes aren't present in the app and don't work properly on the web version. I think they're linked to the paid version of the post or something.
Appreciate the insight into your workflow. The way you're using it matches my own intuitions about what sort of stuff it could be useful for. Claude's response to the article suggested a thought to me. The part:
> 1. Writing capability: While you mentioned not liking my writing style, this might be an overgeneralization. Al writing capabilities can vary significantly depending on the specific instructions and context provided. With proper guidance, I can often adjust my style to better match a user's preferences.
I feel like this is a good demonstration of what LLMs are not good at/valuable for, and why a lot of it feels like such empty blather. Everything it wrote is true, but, like you wrote about using it for critique in general, it's vibes-level. It's going to say something like this regardless of *how* much in practice its suggestions will help. What's missing is that a (helpful) human would only write that if the advice crossed some threshold of genuine usefulness to me given my query. If it's just Claude writing that, I have no idea how seriously to take its critique.
This is a really good summary and pretty much matches my experience with LLMs (mostly ChatGPT).
I work with text from non-native writers a lot (either translating or editing) and I find that VERY considered prompting really helps, particularly getting the level of intervention right: tell it to merely proofread and it misses a lot of stuff, tell it to "revise" and it adds content.
As to writing new text: "get it to write a draft" vs "get it to edit or fill an outline" I feel it very much depends on the topic. The more obscure the topic and/or the higher level, the worse they are on writing vs editing. My attempts so far were useful as initial kernels which I ended up completely rewriting anyway.
They seem very good at reducing wordcount. Very useful for anyone who is bound by limits.
On the "not so good to really poor" side, I constantly come across confabulations, to the point that I only use LLM for anything with a factual element if I KNOW the facts already and otherwise just Google traditionally. I don't know if I ask weird questions but the amount of simply incorrect very simple information I get is astonishingly high, and not just on obscure niche topics but on practical things like "can you use product X in baking". On the other hand, I have not so far received dangerous suggestions unless you count "agony aunt" advice that counsels prioritising the asker's emotional comfort over pretty much any other social obligation, fully subscribing to the pop psychology gospel of selfishness. But I guess this is the type of material it's trained on and the biases of these models' human trainers.
On the other hand they're REALLY good at bouncing ideas off and clarifying my own thinking. Very useful for someone who thinks by talking and is both socially and cognitively lonely a lot of the time on a lot of topics.
The most surprising fail (apart from making up completely non existing papers and occasionally books as references) is (at least from ChatGPT) its inability to create citations in the required style. It will do it roughly but in many cases not QUITE right, especially for MHRA. It's better at Harvard, and excellent at APA.
I've just started using Copilot (cos it's free and convenient). So far I've been very impressed. It's not primarily an LLM AI: think of it as BING on steroids. It's been very effective at finding half-forgotten papers. And since it mainly functions as a search engine it shouldn't generate fictional papers. And unlike ChatGPT it has no cut-off date.
You presented a challenge. Here's it's reply:
Here are some quotes from prominent scholars discussing the replication crisis in social psychology:
Daniel Kahneman: Nobel Prize-winning psychologist Daniel Kahneman referred to the situation in social psychology as a "mess" and called on social psychologists to clean up their act.
Brian D. Earp and David Trafimow: In their paper, they discuss the crisis in confidence in social psychology, highlighting the importance of replication and falsification. They state, "The (latest) crisis in confidence in social psychology has generated much heated discussion about the importance of replication, including how it should be carried out as well as interpreted by scholars in the field2."
Wojciech Świątkowski and Benoît Dompnier: They wrote about the replicability crisis in social psychology, noting that the debates have reached a climax with the discovery of a relatively weak rate of replicability of published literature. They argue that social psychology could benefit from adopting a perspectivist approach to knowledge construction3.
These quotes reflect the growing concern among scholars about the reliability and validity of research findings in social psychology. The replication crisis has sparked important discussions about research practices and the need for more rigorous methods.
It gave three references and they all lead to real papers that looked relevant.
I use Co-pilot as a search engine too and it's very good. Rarely has it not found what I was after
Great post. I've been dabbling with AI recently, most often Claude, and have reached similar conclusions.
I've also learned how effective it can be to ask if it has any questions or needs any more detail from me to follow a prompt.
Finally, being polite seems to help. I might be imagining this but at the very least, it makes me feel like I'm having a conversation which I'm sure affects my mindset and makes the process more enjoyable!
Thank you for such an illuminating and useful piece. I’ve not used ChatGPT all that much, but I have already switched to using it for research purposes, before or instead using Google, and several times it’s produced far better results. For example, I asked for information on Rosa Luxemburg’s relationship with Lenin, and it came up with exactly the kind of info I needed, while Google didn’t. And no, you aren’t wrong about “delve” - LLMs do seem to be fond of it. Another word they seem to like is “elevate”.
Interesting! I’ve gotten into the habit of pseudo-debating AI for my writing. I agree that it’s good at explaining things and can be provocative and lead me towards new insights when approached correctly. I too find it somewhat lacking when getting into the nitty gritty of the logic of an argument. But it’s been great as a sounding board and I’ve been worried about being mislead by misinformation, so it’s nice to hear when someone I trust uses AI in similar ways.
Hell of an ethical dilemma hidden in those Baldwin explanations for the increasingly youthful and - non-human?
I thought I'd try something more adventurous with copilot: a demo of Mercator projection.
Copilot seemed to understanbd what I wanted:
Here’s the drawing that illustrates the Mercator projection step by step:
This should help you visualize the process:
Transparent Cylinder: Representing the cylindrical surface.
Wire-frame Globe Inside: Showing Earth with latitude and longitude lines.
Light Source at the Center: Projecting the globe's features onto the cylinder.
Projected Continents: Displaying how the continents would appear on the cylindrical surface.
Unwrapped Cylinder: Showing the Mercator map when the cylinder is flattened out.
I hope this makes the concept clearer! If you need any further adjustments or details, let me know. 😊
I had several iterations to get this far but the resulting image was nowhere near. (And it was my concept that it's claiming to explain to me in a very patronizing way!) I'd show the result if I knew how to introduce an image. (It's so different from what it described as to be rather funny!)