Materials of Fraud
ChatGPT, please generate a blog post about how AI is enabling academic cheating
Over the last three weeks there’s been three major stories about AI: first, and most hilariously, that someone at “xAI” (Elon Musk’s AI venture) fat-fingered grok, the anti-woke AI, into inserting talking points about “White genocide” in South Africa into every single query. xAI does not confirm or deny that it was Elon Musk himself. Secondly, a second-year PhD student at MIT was forced to retract (and almost certainly expelled) a paper when it came out that he had forged some or most of the data involved, in a paper that, funnily enough, was also about AI. And lastly, everyone was talking about articles in New York Magazine and the New York Times about AI and cheating in college classes1. In a particularly grim excerpt from the NYMag piece, a student admits that she uses ChatGPT to write an essay on how learning is what makes people human. But if nobody likes it, then why is it still rampant?
The Talented Doctor Ripley
Last December, a second year PhD student at MIT (the most prestigious economics department on the planet) named Aidan Toner-Rodgers released a pre-print (basically a final draft) of a paper that was well into the submission process at the Quarterly Journal of Economics, or QJE (the most prestigious economics journal on the planet). The paper got coverage at the Wall Street Journal, The Atlantic, and Nature, among other prestigious publications, and was co-signed by Nobel Laureate Daron Acemoğlu and David Autor, both among the most important researchers of the impact of technology on labor market outcomes and on AI more particularly.
The paper used quite advanced and novel techniques to evaluate the impact of AI tools on scientists at an R&D lab at a “materials science lab”, which, based on the specifications (over 1,000 working scientists distributed roughly evenly between biomaterials, ceramics/glasses, metals, and plastics), can only apply realistically to few companies, including giants such as Dupont or Dow, and Corning. The paper looks at whether having access to AI tools impacts the number of materials discovered, the numbers of patents filed, the number of products filed on those materials, how scientists use their time between experimentation, judgment, and ideation (of materials, not suicide), as well as how scientists felt about AI. The study used some pretty advanced techniques to study each question (it’s 78 pages long) and finds clear, substantial increases in all metrics: 44% increase in materials, a 39% increase in patents, and a 17% rise in downstream product innovation, as well as a 57% increase in “idea generation” tasks - which lets the top researchers increase their output substantially while bottom researchers benefit little. However, 82% of scientists reported lower satisfaction, since they had to focus on tediously testing AI’s new ideas rather than coming up with them.
The paper, however, was not without its detractors: Robert Palgrave, a professors of materials chemistry at University College London, had a quite lengthy Twitter thread about the paper: basically, he was skeptical that Toner-Rogers was actually categorizing the quality of innovations in four (rather different) categories of materials correctly, especially considering it was “just one student”. In a follow-up thread after Toner-Rogers was exposed, he elaborated on what in hindsight were red flags: why the study was set up in 2022 but the data was only handed out to a PhD student who wasn’t even working at the time, why they even gave any data to a PhD at MIT and not a real high-level (or internal) economist, and a lot of questions about the specifics of the methodology used - in particular, why a lab would have 1,000 people just rotely testing out new materials, and a lot of issues with how the materials were classified. Ben Shindel, a materials scientist himself, wrote a an excellent blog post about it where he also points out that some data just seems outright copied from a famous paper. And Corning also filed a complaint with the World Intellectual Property Organization against Toner-Rogers for registering a website called “corning research” on January 2025, which seems to point to a heck of a lot of fraud. Anyways, MIT publicly stated that the paper was bunk and requested ArXiv (a repository for working papers) to take it down due to ethical violations - the MIT statement seems to imply that it concluded that the data used was made up to some extent, and also if you read between the lines a little bit it also points toward Toner-Rogers being expelled from the program. In particular, a computer scientist with experience in the field (I heard it was Ethan Mollick, who is a great follow on Twitter, but can’t confirm) asked basic questions to Acemoglu and Autor, who raised the topic to MIT.
The obvious comparison for the paper is the work of Francesca Gino (a Harvard Business School professor who forged extensive amounts of data for her papers on nudges, tying her for “most fraudulent business school paper” with every single other business school paper anyways), but I think a better comparison is the Freakonomics School of Economics, which focused more heavily on publishing empirical takedowns of intuitive opinions using economic tools but on non-economic topics. The key example is the PhD thesis of Emily Oster, who sought to debunk the idea that sex-selective abortions and infanticide were responsible for Asia having more men than women (a very politically important fact!) - Oster explained this with hepatitis B reducing the odds of having a girl in posterior pregnancies, except an actual subject-matter expert immediately clocked that rates of hepatitis infection were just way lower than the ones necesary for Oster’s argument to be even remotely plausible. Just to clarify, Oster immediately copped to her mistake, but the incident seems to point out that economists are just not at all capable of evaluating whether major publications contain grievous errors in the domain of other disciplines.
Harder, Better, Faster, Stronger
Okay, but why did everyone go nuts (go apeshit) for the Toner-Rogers paper? Well, because it’s a clean, high-quality paper with a finding that is surprising but intuitive: that AI boosts productivity by a lot by making top performers better. Is this actually true? Well, it’s useful (and ironic) that the AI-enabled cheating on a paper is actually about whether AI makes people better at work, because its conclusion is basically the opposite of what every other paper on the topic finds, and because it lets us look into an important question for AI-based cheating: does AI actually improve performance?
Well, what does the literature say?
A Danish study of LLM adoption on companies find that there’s basically 0 impact on earnings or recorded hours, and that productivity gains are modest.
A controlled experiment with GitHub Copilot finds that software engineers can complete a standard task in almost half the time with access to AI assistance.
An experimental study that randomly assigned tasks to professional workers finds large productivity gains by replacing raw worker effort in editing and rough-drafting with work on idea generation and editing, primarily in low performers - and improved employee satisfaction.
Another experiment with consulting workers found that AI can easily do some tasks but not others without a clear linear relationship to difficulty - and AI was really good at improving productivity and quality in the first group of tasks, but mostly for below-average consultants.
A paper examining randomized trials of an AI product found that common tasks like looking up information in data banks, catching up after meetings, and writing documents became significantly faster even while controlling for quality.
A paper looking at the introduction of customer-support AI assistants found that workers becoming significantly more productive, but mostly by “catching up” more experienced and less skilled workers, while more experienced workers work more quickly but make more mistakes - and the gains are largest for rare problems which very few workers know how to handle, but which are on the training data.
A study of Kenyan entrepreneurs did find that AI advice widened gaps between low and high performers but mostly because the high performers were better at following the correct advice from AI.
A paper examining workers with access to AI tools finds that they reduce the time spent on rote, repetitive tasks like email significantly, seem to draft documents only slightly faster, and don’t gain at all in time spent in meetings.
A lot of studies focus on freelancers (mostly because of ease of measurement) and find that ChatGPT significantly reduced demand even for high-quality service providers; that demand for skills that are easily substitutable (such as translation) declined by 20-50% but demand for other skills that are complementary (like programming) increased by up to 24%.
So overall I think that the picture painted is one where AI can help equalize highly skilled and less skilled employees, in large part by reducing the amount of time spent on the tedious parts of work, and that there’s a lot of capacities that AI still doesn’t have. This explains why AI use has grown quickly (more so than the personal computer) since its introduction, but not evenly: multiple studies find that younger, more educated, and higher earning workers use AI more frequently.
While a large amount of tasks can be affected by AI, it’s not at all clear whether this will have a simple impact on employment: most users tend to automate specific tasks, and staff at companies with high AI adoption tend to be trained on how to use their time more efficiently. So far it seems that a large reason why AI hasn’t been deployed as widespread as expected appears to be organizational: firms with remote work also have higher AI adoption rates in large part because they invested heavily on technology skills and the managerial capabilities to adapt to new technology. Since the quality of corporate management is important for business dynamism and therefore performance, it’s also important to keep in mind how equipped a firm’s managers are to decide when and how to incorporate new technology across the firm.
We must imagine Raskolnikov happy
The profit of the crime is the force which urges man to delinquency: the pain of the punishment is the force employed to restrain him from it. If the first of these forces be the greater, the crime will be committed; if the second, the crime will not be committed.
Jeremy Bentham, “An Introduction to the Principles of Morals and Legislation”
I think that understanding why someone as promising as Toner-Rogers would stoop to the level of forging every single data point in his paper just to get a publication, we can look at the economics of cheating more broadly, which also links up with the other instance of cheating using AI: the articles following college students and professors talking about how AI-generated work is taking over classrooms and letting kids get excellent grades without actually doing any work.
A lot of explanations have been cultural, about how permisiveness and “softness” on the left or the devaluation of education to its market value from the right have disincentivized young kids from putting in any work. But I think that’s, frankly, a load of horseshit. Let’s take an example that is substantially similar to cheating in exams: cheating the law, also known as “crime”. For the longest time, theories of criminal behavior were mostly centered on either the psychological (or, in more sinister theories, on biology) aspects of the criminal or the sociological aspects of a given environment that produced criminality, but not really on the decision-making process, which is the sort of thing that economists love to insert themselves into (that, and their grad students). Economists usually say that economics is the study of rational choice - where rational means, broadly, “decisions made strategically to account for costs and benefits”. I don’t like this definition, and agree with the obvious comeback that economics is the study of rational choice in the same way that architecture and urbanism are the study of bricks. But regardless, the basic takeaway is that to study a behavior using the “economics approach” as outlined above, you have to consider the expected payoffs and the expected costs of each course of action.
The foundational paper in what we call “the economics of crime” comes from Nobel Laureate Gary Becker, who outlined his thoughts on the matter in 1968 in “Crime and Punishment: An Economic Approach”. Becker argues that we have to treat the criminal as an individual with rational preferences on how to spend their time - by rational he means, basically, that the criminal considers the costs of benefits of the crime and decides to commit it. The costs and benefits can be monetary, but they can also be social (for example, people who believe in the “American Dream” commit fewer crimes), or just the thrill of the act itself, as well as the expected value of being punished, which means that different people can be somewhat more prone to commit various crimes depending on their preferences - for example, young people commit more crimes for various hard-to-parse reasons. An easier way to think about it is that if you’re poor, then you might be more inclined to steal, because the benefits are higher than someone who has money, and if you get caught and go to prison, it’s not that big a deal, because being poor sucks, compared to someone who would be losing a good career. As a result, for example, high economic inequality means that economic booms translate less as lower crime, because it boosts the income of the poor less than in the alternative.
Of course, it’s not especially easy to test this, because the data is really hard to untangle - for example, areas with more crime might have harsher sentences as a consequence of the crime rates, which might make it seem like harsher punishments don’t matter. One example of a paper that tries to test this comes from Stephen Levitt (of Freakonomics fame), who looked at the spread of LoJack, a device that makes it easier to track stolen cars: when LoJack became more common in an area, car thefts went down for all cars, because the expected payoff was just way lower. Or, for example, when Argentina suffered two devastating terrorist attacks against Jewish institutions in 1992 and 1994, police presence around synagogues was heightened, which resulted in fewer crimes recorded around those synagogues compared to other similar areas.
Once a cheater, always a cheater?
So how does the stuff about the economics of crime explain academic cheating? Well, we have to look at the cost, the benefits, and the odds of being caught in each situation, and look at how AI plays into each.
For mister (not doctor) Toner-Rogers, I think the situation is pretty clear: the economics job market is not very good. Last year, it was all the rage talking about how competitive it had gotten, to the point were high school students were doing RA work to bump up their PhD chances - and it’s gotten even worse, since for example the Federal Reserve (the biggest employer in economics by far) will cut its staff by 10%. Economics PhD students have way worse mental health than other doctoral wannabes, in large part due to very high pressure on them. Citations are extremely important for the profession, and especially citations in the “top five” journals (QJE, Econometrica, AER, JPE, and REStud) - so the pressure to publish there is very strong - for example, economists used to pretend “AER P&P” (a lower tier of paper) were real citations to pad their resumes. A lot of people stated that they thought the peer-review process at QJE would catch the fraud, but I don’t think so. Economics academia is driven by prestige a lot more than people realize: 60% of professors in the top 100 schools are from the top fifteen schools, advisors are really important for career progression and so are professional networks, and listing a Nobel Laureate as a coauthor increases the odds that a paper will be published by a factor of ten (from 2% to 20%). The profession has also become more prestige bound as time passes on (as measured by the growing seniority of authors and the assortative seniority in the field) and more focused on empirical data. Social sciences (including economics) have longstanding data quality and code quality issues, and it’s fairly common to make claims in the abstract that are not supported by the paper’s methodology. While economics publishing is very slow, this is largely a result of procrastination on comments and not necessarily on improved quality or thoroughness, at the same time as the discipline has a cultural issue with not asking important questions, resulting in economists largely not really listening to comments or critiques of their papers, and in retracted papers being cited at best 10% less. So, the odds of Toner-Rogers getting caught weren’t especially high even without him generating tailor-made data to make him appear smart as hell, especially when two of the most prestigious economists on the planet were backing him up, and since the key aspects of his fraud were on fields that even top economists know next to nothing about2.
Let’s look at students. First of all, we’ve all heard the stories about how competitive college admissions have gotten. Going to college by itself is obviously important: college graduates make more money than non-graduates, their wages grow faster over the life cycle, and the skills developed in higher education have enormous economic value, although these benefits are decreasing due to higher wage inequality, and other factors like regional sorting. It’s also important to note that colleges can (in many cases) help increase the socioeconomic mobility of low income students, and that these moves can help the students’ children go up the income ladder too - but not by getting into elite schools, which are mostly used to help induct children into the elite in general (meaning that elite school admissions have super-high stakes, since they’re a ticket into the elite). Higher grades are very important: many kids have the scores to get into top schools, but don’t have other opportunities (or are, for example, discriminated against). Additionally, getting higher grades in school is associated with higher wages, although this also comes from the fact that grades are a strong predictor of ability - meaning that students could try to hijack the signal by pretending to have higher ability by cheating and getting higher grades. On the contrary, there aren’t many benefits to going to college and flunking out: most college graduates drop out due to not caring about their education very much and also because they don’t think they have the abilities, both of which are helped by AI-powered cheating (remember that for tedious routine tasks, AI helps the lowest performers the most). While it’s hard to estimate the economic impact of dropping out (because the lowest performers are also likeliest to drop out), having unpaid student loan debt tends to be bad for your finances (and forgiveness enables borrowing more on other items), while letting students who dropped out finish their degrees boosts their economic performance, which all points to college dropouts not having a big wage advantage relative to people who didn’t attend school.
So students are considering the benefits of doing well at school (money) versus the costs of failing (having a lot of debt), and AI-enabled cheaters are students who don’t especially care about their degrees and who wouldn’t get them otherwise, which tracks with the literature: people consider college an economic decision, and weigh it for example against the minimum wage, and at the same time students tend to consider very carefully which colleges to go based on the expected value. A similar case is that high-income students who aren’t very smart face more competition when public schools have more investment, so they go to private schools and increase their educational attainment, but with a lower average wage premium.
But why aren’t schools doing anything? The answer is that schools have longstanding issues with how demanding they are: college graduation rates and grades have increased consistently since the 1990s without any real increase in student quality, institutional resources, or composition of the education sector. This phenomenon, known as grade inflation, has a large number of fairly complex reasons: unlike more cultural ones (for example, that schools are “coddling” their students to try and raise their self esteem),other explanations focus on facts such as top schools having gotten more selective; changes to student practices (fewer classes taken outside of major and fewer classes taken in general, withdrawing to avoid a fail) and better teaching practices, but also of just more generous grading. Well, but why have grading practices changed? When asked “why”, an economist answered the question asked the simple question of qui bono - “Who is the constituency for tougher grades? I can tell you as a professor, my students are definitely interested in higher grades”. In particular, since getting good grades is so important, students tend to choose classes where they can get higher grades; at the same time, universities are incentivized to improve numbers (retention, graduation, etc.) for basically financial reasons - especially because “inflated” grades are not actually easy to notice. The reason, then, why nobody is doing anything is that this is a coordination problem: any professor that gets more demanding would hurt their own career by getting lower evaluations (which are systematically linked to strictness), which can be seen in the fact that schools with more untenured faculty also have worse outcomes.
But the obvious question is that, if it is very distasteful, is it actually a disservice to students? Well, yes and no. The chance to retake classes they failed is linked with better college outcomes, and grade retention doesn’t seem to improve student performance that much, so there’s a strong case that “being tough” doesn’t do much. However, the truth is that higher grading standards actually do help: higher grading standards improve student performance by both benefiting top students, and incentivizing low-performers to “catch up”, without reducing overall graduation rates. However, the relationship between grade inflation and achievement is not linear: having your grades inflated reduces future educational achievement, but having teachers who pass students more generously is linked to higher future achievement. This is reflected by the fact that teachers who grade-inflate have negative value added on test scores, but positive value added on other qualities like engagement with the class - for example, because students have more incentives to engage with the material, and because hard classes are less engaging for average students. The saddest part is that teachers being incentivized to teach “badly” is also bad for students: teaching practices matter a lot, and high-quality educators have strong impacts on student achievement and therefore on their future economic outcomes.
Conclusion
So, cheating is an economic phenomenon just like any other, and everyone, from PhD economists to lowly undergrads asking ChatGPT to churn out their bottom feeding essays, is responding to economic incentives: the high stakes of all educational institutions, the low odds of being caught, and the high benefits from better performance. Obviously, this is bad because cheating is bad. But it’s also bad because, much like Elon Musk pushing his AI to share Holocaust denial propaganda, it reduces trust on a valuable technology and on valuable institutions - it makes college degrees less valuable as signals of ability and merit (which means a return to more discretion-based forms of conduct, which are also more discriminatory), and it makes AI-based information less useful, reducing overall trust in a world with less social behavior, less verifiable information and less social trust all around. Artificial Intelligence could also really help economics research, so it’s a bad idea to let it be taken over by fraudsters.
I am actually not very pessimistic (or optimistic, depending on how you count it) on the impact of AI on employment, and I’m also not one of the people who think that AI is going to worsen discrimination or gender/race/etc disparities. Technologies are tools, and many tools just reflect the societies that produce them - if we can make sure that our world has less discrimination, less fraud, less cheating, and more trust, then AI is only going to keep it that way. But you have to try.
Note: made some slight edits a day after (21/5) because a sentence I added a bit before sending wasn’t showing up.
I am not going to try to make any sort of case for why it is cheating. But the students are handing in assignments that they clearly did not do themselves, which counts as cheating by any reasonable definition of the word. Pulling a God and Man at Yale here.
An example of how little scrutiny papers face is that a study “found” that AI could ace MIT exams, a bunch of students tested it and found it was bunk based on their first hand experience in “taking those exact classes”, and then it turned out that a profession had just bamboozled his coauthor into publishing it under false pretenses. The professor also wasn’t even an actual professor at MIT and the other guy had asked him to help him write a paper about how AI could detect which mandatory classes weren’t actually necessary.