If you incentivize researchers to publish papers, individuals will find ways to game the system, meeting the minimum quality bar, while taking the least effort to create the most papers and thereby receive the greatest reward.
Similarly, if you reward content creators based on views, you will get view maximization behaviors. If you reward ad placement based on impressions, you will see gaming for impressions.
Bad metrics or bad rewards cause bad behavior.
We see this over and over because the reward issuers are designing systems to optimize for their upstream metrics.
Put differently, the online world is optimized for algorithms, not humans.
Blame people, bad actors, systems of incentives, the gods, the devils, but never broach the fault of LLMs and their wide spread abuse.
Blaming LLMs is unproductive. They are not going anywhere (especially since open source LLMs are so good.)
If we want to achieve real change, we need to accept that they exist, understand how that changes the scientific landscape and our options to go from here.
I'old enough to remember when GAN where going to be used to scam millions of people and flood social media with fake profiles.
I think such statements are likely projections of people's own unwillingness to part with such tools given their own personal perceived utility.
I, for one, wouldn't give up LLMs. Too useful to me personally. So, I will always seek them out.
LLMs are not submitting these papers on their own, people are. As far as I'm concerned, whatever blame exists rests on those people and the system that rewards them.
Guns are entirely inert objects, devoid of either free will nor volition, they have no rights and no responsibilities.
LLMs likewise.
To every man is given the key to the gates of heaven. The same key opens the gates of hell.
-Richard Feynmanhttps://www.goodreads.com/quotes/421467-to-every-man-is-give...
LLMs are not the root of the problem here.
I heard someone say something similar about the “homeless industrial complex” on a podcast recently. I think it was San Francisco that pays NGOs funds for homeless aid based on how many homeless people they serve. So the incentive is to keep as many homeless around as possible, for as long as possible.
https://www.cnbc.com/2018/04/11/goldman-asks-is-curing-patie...
Ditto for views, etc. Really what you care about as eg; youtube is conversions for the products that are advertised. Not impressions. But there's an attribution problem there.
What many people don’t realize is just how many normal life hurdles are significantly easier to overcome with a stable housing environment, even if the client is willing and available to work. Employment, for example, has several precursors that you need. Often you need an address. You need an ID. For that you need a birth certificate. To get the birth certificate you need to have the resources and know how to contact the correct agency. All of these things are much harder to achieve without a stable housing environment for the client.
> rewarding people for the volume ... rather than the quality.
I suspect this is a major part of the appeal of LLMs themselves. They produce lines very fast so it appears as if work is being done fast. But that's very hard to know because number of lines is actually a zero signal in code quality or even a commit. Which it's a bit insane already that we use number of lines and commits as measures in the first place. They're trivial to hack. You even just reward that annoying dude who keeps changing the file so the diff is the entire file and not the 3 lines they edited...I've been thinking we're living in "Goodhart's Hell". Where metric hacking has become the intent. That we've decided metrics are all that matter and are perfectly aligned with our goals.
But hey, who am I to critique. I'm just a math nerd. I don't run a multi trillion dollar business that lays off tons of workers because the current ones are so productive due to AI that they created one of the largest outages in history of their platform (and you don't even know which of the two I'm referencing!). Maybe when I run a multi trillion dollar business I'll have the right to an opinion about data.
How would an online world that is optimized for humans, not algorithms, look like?
Should content creators get paid?
Hiring and tenure review based on a candidate’s selected 5 best papers.
Already standard practice at a few enlightened places, I think. (of course this also probably increases the review workload for top venues)
To a lesser extent, bean-counting metrics like citations and h-index are an attempt to quantify non-volume-based metrics. (for non-academics, h-index is the largest N such that your N-th most cited paper has >= N citations)
Note that most approaches like this have evolved to counter “salami-slicing”, where you divide your work into “minimum publishable units”. LLMs are a different threat - from my selfish point of view, one of the biggest risks is that it takes less time to write a bogus paper with an LLM than it does for a single reviewer to review it. That threatens to upend the entire peer reviewing process.
I don't think so. Youtube was a better place when it was just amateurs posting random shit.
Everybody "creates content" (like me when I take a picture of beautiful sunset).
There is no such thing as "quality". There is quality for me and quality for you. That is part of the problem, we can't just relate to some external, predefined scale. We (the sum of people) are the approximate, chaotic, inefficient scale.
Be my guest to propose a "perfect system", but - just in case there is no such system - we should make sure each of us "rewards" what we find of quality (being people or content creators), and hope it will prevail. Seemed to have worked so far.
After 1989, most academics complained the system is not merit-based and practical (applied) enough. So we changed it to grants and publications metrics (modeled after the West). For a while, it worked.. until people found too much overbearing bureaucracy and some learned how to game the system again.
I would say, both systems have failure modes of a similar magnitude, although the first one is probably less hoops and less stress on each individual. (During communism, academia - if you could get there, especially technical sciences - was an oasis of freedom.)
Sure, publishing on important papers has its weight, but not as much as getting cited.
arXiv believes that there are position papers and review articles that are of value to the scientific community, and we would like to be able to share them on arXiv. However, our team of volunteer moderators do not have the time or bandwidth to review the hundreds of these articles we receive without taking time away from our core purpose, which is to share research articles.
From TFA. The problem exists. Now.
Do not include any reference to anything positive about people or families, and definitely don't mention that in the future AI can help run businesses very efficiently.[1] "
[0] https://medium.com/@rviragh/life-as-a-victim-of-someone-else...
[1]
> Technically, no! If you take a look at arXiv’s policies for specific content types you’ll notice that review articles and position papers are not (and have never been) listed as part of the accepted content types.
You cannot upload the journal’s version, but you can upload the text as accepted (so, the same content minus the formatting).
Why not? I don't know about in CS, but, in math, it's increasingly common for authors to have the option to retain the copyright to their work.
I think every project more or less deviates from its original goal given enough time. There are few exceptions in CS like GNU coreutils. cd, ls, pwd, ... they do one thing and do it well very likely for another 50 years.
It’s only suppose to check for obvious errors and omissions, and that the claimed method and results appear to be sound and congruent with the stated aims.
Not as gate-keepy as journals and not as anarchic as purely open publishing. Should be cheap, too.
Fundamentally, we want research that offers something new (“what did we learn?”) and presents it in a way that at least plausibly has a chance of becoming generalizable knowledge. You call it gate-keeping, but I call it keeping published science high-quality.
It's related to the same problems you have with e.g. Sybil attacks: https://en.wikipedia.org/wiki/Sybil_attack
I'm not saying it wouldn't be worthwhile to try, just that I expect there to be a lot of very difficult problems to solve there.
That is to say I also think it would be worthwhile to try.
Also look how frequently they publish. Do you really think it's reasonable to produce a paper every week or two? Even if you have a team of grad students? I'll put it this way, I had a paper have difficulty getting through reviewer for "not enough experiments" when several of my experiments took weeks wall time to run and one took a month (could not run that a second time lol)
We don't do a great job at ousting frauds in science. It's actually difficult to do because science requires a lot of trust. We could alleviate some of these issues if we'd allow publication or some reward mechanism for replication, but the whole system is structured to reward "new" ideas. Utility isn't even that much of a factor in some areas. It's incredibly messy.
Most researchers are good actors. We all make mistakes and that's why it's hard to detect fraud. But there's also usually high reward for doing so. Though most of that reward is actually getting a stable job and the funding to do your research. Which is why you can see how it might be easy to slip into cheating a little here and there. There's ways to solve that that don't include punishing anyone...
Wouldn’t most people subscribe to a default set of trusted citers?
Sure. This happens with ad blockers, for example. I imagine Elsevier or Wikipedia would wind up creating these lists. And then you’d have the same incentives as you have now for fooling that authority.
> or people just don't care very much
This is my hypothesis. If you’re an expert, you have your web of trust. If you’re not, it isn’t that hard to start from a source of repute.
And to bring this back to the original arxiv topic. I think reputation system is going to face problems with some people outside CS lack of enough technical abilities. It also introduce biases in that you would endorse people who you like for other reasons. Actually some of the problems are solved and you would need careful proposal. But the change for publishing scheme needs push from institutions and funding agencies. Authors don't oppose changes but you have a lobby of the parasitic publishing cartel that will oppose these changes.
I don't think publishing a PGP key with your work does anything. There's no problem identifying the author of the work. The problem is identifying _untrustworthy_ authors. Especially in the face of many other participants in the system claiming the work is trusted.
As I understand it, the current system (in some fields) is essentially to set up a bunch of sockpuppet accounts to cite the main account and publish (useless) derivative works using the ideas from the main account. Someone attempting to use existing reasearch for it's intended purpose has no idea that the whole method is garbage / flawed / not reproducible.
If you can only trust what you, yourself verify, then the publications aren't nearly as useful and it is hard to "stand on the shoulders of giants" to make progress.
Is it though? Should we care about authors or about the work? Yes, many experiments are hard to reproduce, but isn't that something we should work towards, rather than just "trust" someone. People change. People do mistakes. I think more open data, open access, open tools, will solve a lot, but my guess is that generally people do not like that because it can show their weaknesses - even if they are well intentioned.
Edit: For clarification I’m agreeing with OP
Loosely speaking, the "received wisdom" has generally been that if you have a .edu address, you can probably publish fairly freely. But my understanding is that the rules are a little more nuanced than that. And I think there are other, non .edu domains, where you will also get auto-endorsed. But they don't publish a list of such things for obvious reasons.
[0]: Unless things have changed since I created my account, which was originally created with my personal email address. That was quite some time ago, so I guess it's possible changes have happened that I'm not aware of.
Which includes some very large ones like @google.com
Her suggestion was simple: Kick out all non-ivy league and most international researchers. Then you have a working reputation system.
Make of that what you will ...
[1] https://en.wikipedia.org/wiki/Grigori_Perelman [2] https://www.ams.org/notices/200808/tx080800930p.pdf
Treat everyone equally. After 10 years of only quality you get chance to get back. Before that though luck.
(1) because ivy league also produces a lot of work that's not so great (i.e. wrong (looking at you, Ariely) or un-ambitious) and
(2) because from time to time, some really important work comes out of surprising places.
I don't think we have a good verdict on the Orthega hypothesis yet, but I'm not a professional meta scientist.
That said, your proposal seems like a really good idea, I like it! Except I'd apply it to individuals and/or labs.
Asking for a small amount of money would probably help. Issue with requiring peer reviewed journals or conferences is the severe lag, takes a long time and part of the advantage of arxiv was that you could have the paper instantly as a preprint. Also these conferences and journals are also receiving enormous quantities of submissions (29.000 for AAAI) so we are just pushing the problem.
The papers could also be categorized as unreviewed, quick check, fully reviewed, or fully reproduced. They could pay for this to be done or verified. Then, we have a reputational problem to deal with on the reviewer side.
You might be vastly underestimating the cost of such a feature
That's if anyone wants the publishing to be closer to thr scientific method. Arxiv themselves might not attempt all of that. We can still hope for volunteers to review papers in a field with little, peer review. I just don't think we can call most of that science anymore.
> Before being considered for submission to arXiv’s CS category, review articles and position papers must now be accepted at a journal or a conference and complete successful peer review.
Edit: original title was "arXiv No Longer Accepts Computer Science Position or Review Papers Due to LLMs"
ArXiv CS requires peer review for surveys amid flood of AI-written ones
- nothing happened to preprints
- "summarization" articles always required it, they are just pointing at it out loud
"In the past few years, arXiv has been flooded with papers. Generative AI / large language models have added to this flood by making papers – especially papers not introducing new research results – fast and easy to write."
"Fast forward to present day – submissions to arXiv in general have risen dramatically, and we now receive hundreds of review articles every month. The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues."
Surely a lot of them are also about LLMs: LLMs are the hot computing topic and where all the money and attention is, and they're also used heavily in the field. So that could at least partially account for why this policy is for CS papers only, but the announcement's rationale is about LLMs as producing the papers, not as their subject.
As someone commented, due to the increasing volume, we would actually need and benefit from more reviews -- with a fixed cycle preferably, and I do not mean LLM slop but SLRs. And in contrary to someone's post, it is actually nice to read things from the industry, and I would actually want that more.
And not only are they taking a stance on science but they have also this allegation:
"Please note: the review conducted at conference workshops generally does not meet the same standard of rigor of traditional peer review and is not enough to have your review article or position paper accepted to arXiv."
In fact -- and supposedly related to the peer review crisis, the situation is exactly the opposite. That is, reviews are usually today of much higher quality at specialized workshops organized by experts in a particular, often niche area.
Maybe arXiv people should visit PubPeer once in a while to see what kind of fraud is going on with conferences (i.e., not workshops and usually not review papers) and their proceedings published by all notable CS publishers? The same goes for journals.
Even if AI writes the paper for you, it's still kind of a pain in the ass to go through the submission process, get the LaTeX to compile on their servers, etc., there is a small cost to you. Why do this?
"One specific criterion is the ‘authorship of scholarly articles in professional or major trade publications or other major media’. The quality and reputation of the publication outlet (e.g., impact factor of a journal, editorial review process) are important factors in the evaluation”
I've never seen arXiv papers counted towards your publications anywhere that the number of your publications are used as a metric. Is USCIS different?
PaperMatch [1] helps solve this problem (large influx of papers) by running a semantic search on top of abstracts, for all of arXiv.
That said, AI-generated papers have already been spotted in other disciplines besides cs, and some of them are really obvious (arXiv:2508.11634v1 starts with a review of a non-existing paper). I really hope arXiv won't react by narrowing its scope to "novel research only"; in fact there is already AI slop in that category and it is harder to spot for a moderator.
("Peer-reviewed papers only" is mostly equivalent to "go away". Authors post on the arXiv in order to get early feedback, not just to have their paper openly accessible. And most journals at least formally discourage authors from posting their papers on the arXiv.)
Beyond hosting cost, there is some prestige to seeing an arXiv link versus rando blog post despite both having about the same hurdle to publishing.
The idea is the site is for academic preprints. Academia has a long history of circulating preprints or manuscripts before the work is finished. There are many reasons for this, the primary one is that scientific and mathematical papers are often in the works for years before they get officially published. Preprints allow other academics in the know to be up to date on current results.
If the service is used heavily by non-academics to lend an aura of credibility to any kind of white paper then the service is less usable for its intended purpose.
It's similar to the use of question/answer sites like Quora to write blog posts and ads under questions like "Why is Foobar brand soap the right soap for your family?"
It is a bit different in other fields where interpretations or know-how might be communicated in a review paper format that is otherwise not possible. For example, in biology relating to a new phenomena or function.
1) new grad students to end up with something nice to publish after reviewing the literature or,
2) older professors to write a big overview of everything that happened in their field as sort of a “bible” that can get you up to speed
The former is useful as a social construct; I mean, hey, new grad students, don’t skimp on your literature review. Finding out a couple years in that folks had already done something sorta similar to my work was absolutely gut-wrenching.
For the latter, I don’t think LLMs are quite ready to replace the personal experiences of a late-career professor, right?
I don't understand the appeal of an (majorly-)LLM generated review paper. A good review paper is a hard task to write well, and frankly the only good ones I've read have come from authors who are at apex of their field (and are, in particular, strong writers). The 'lossy search' of an LLM is probably an outstanding tool for _refining_ a review paper, but for fully generating it? At least not with current LLMs.
As one of those practitioners, I've found good review/survey papers to be incredibly valuable. They call my attention to the important publications and provide at least a basic timeline that helps me understand how the field has evolved from the beginning and what aspects people are focusing on now.
At the same time, I'll confess that I don't really see why most such papers couldn't be written by LLMs. Ideally by better LLMs than we have now, of course, but that could go without saying.
The problem is you can’t. Not without careful review of the output. (Certainly not if you’re writing about anything remotely novel and thus useful.)
But not everyone knows that, which turns private ignorance into a public review problem.
If you’re an expert. If you’re not, you’ll publish, best case, bullshit. (Worst case lies.)
LLMs are good at plainly summarizing from the public knowledge base. Scientists should invest their time in contributing new knowledge to public base instead of doing the summarization.
These things will ruin everything good, and that is before we even start talking about audio or video.
It is also turning people into spammers because it makes bluffers feel like experts.
ChatGPT is so revealing about a person's character.
Are you saying that there's an automated method for reliably verifying that something was created by an LLM?
No, not really. From the blog post:
> In the past few years, arXiv has been flooded with papers. Generative AI / large language models have added to this flood by making papers – especially papers not introducing new research results – fast and easy to write. While categories across arXiv have all seen a major increase in submissions, it’s particularly pronounced in arXiv’s CS category. > [...] > Fast forward to present day – submissions to arXiv in general have risen dramatically, and we now receive hundreds of review articles every month. The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues.
If so, I think the solution is obvious.
(But I remind myself that all complex problems have a simple solution that is wrong.)
That's without even being able to backprop through the annotator, and also with me actively trying to avoid reward hacking. If arxiv used an open model for review, it would be trivial for people to insert a few grammatical mistakes which cause them to receive max points.
Doubt
LLMs are experts in generating junk. And generally terrible at anything novel. Classifying novel vs junk is a much harder problem.
1. Require LLM produced papers to be attributed to the relevant LLM and not the person who wrote the prompt.
2. Treat submissions that misrepresent authorship as plagiarism. Remove the article, but leave an entry for it so that there is a clear indication that the author engaged in an act of plagiarism.
Review papers are valuable. Writing one is a great way to gain, or deepen, mastery over a field. It forces you to branch out and fully assimilate papers that you may have only skimmed, and then place them in their proper context. Reading quality review papers is also valuable. They're a great way for people new to a field to get up to speed and they can bring things that were missed to the fore, even for veterans of the field.
While the current generation of AI does a poor job of judging significance and highlighting what is actually important, they could improve in the future. However, there's no need for arXiv to accept hundreds of review papers written by the same model on the same field, and readers certainly don't want to sift through them all.
Clearly marking AI submissions and removing credit from the prompters would adequately future-proof things for when, and if, AI can produce high quality review papers. Clearly marking authors who engage in plagiarism as plagiarists will, hopefully, remove most of the motivation to spam arXiv with AI slop that is misrepresented as the work of humans.
My only concern would be for the cost to arXiv of dealing with the inevitable lawsuits. The policy arXiv has chosen is worse for science, but is less likely to get them sued by butt-hurt plagiarists or the very occasional false positive.
If you want to blame someone, blame all the people LARPing as AI researchers.
Meanwhile, banning review articles written by humans would be harmful in many fields. I'm not in CPSC, but I'd hate to see this policy become the norm for all disciplines.
I have to agree with their justification. Since "Attention Is All You Need" (2017) I have seen maybe four papers with similar impact in the AI/ML space. The signal to noise ratio is really awful. If I had to pick a semi-related paper published since 2020 that I actually found interesting, it would have to be this one: https://arxiv.org/abs/2406.19108 I cannot think of a close second right now.
All of the machine learning papers are pure slop to me now. The last one I looked at had an abstract that was so long it put me to sleep. Many of these papers aren't attempting basic decorum anymore. Mandatory peer review would fix a lot of this. I don't think it is acceptable for the staff at arXiv to have to endure a Sisyphean mountain of LLM shit. They definitely need to push back.
Not every paper can be a world-changing breakthrough. Which doesn't mean that more modest papers are noise (although some definitely are). What Kuhn calls "normal science" is also needed for science to work.
Lets say 50000€ fine, or 1 year in prison. :)
Don’t understand why it restricted one category when the problem spans multiple categories.
So many "research papers" by "AI companies" that are blog posts or marketing dressed up as research. They contribute nothing and exist so the dudes running the company can point to all their "published research".
What’s the new method?
For example: https://prereview.org/en-us
Anecdotally, a lot of researchers will run their paper pdfs through an AI iteration or two during drafting which also (kinda but not really) counts as a self-review. Although that is not comparable to peer review ofc.
Sorry folks but we lost.
There are far more ways to produce expensive noise with LLMs than signal. Most non-psychopathic humans tend to want to produce veridical statements. (Except salespeople, who have basically undergone forced sociopathy training.) At the point where a human has learned to produce coherent language, he's also learned lots of important things about the world. At the point where a human has learned academic jargon and mathematical nomenclature, she has likely also learned a substantial amount of math. Few people want to learn the syntax of a language with little underlying understanding. Alas, this is not the case with statistical models of papers!
How will journals or conferences handle AI slop?
And it's a unequal arms race, in which generating endless slop is way cheaper than storing it, because slop generators are subsidised (by operating at a loss) but arXiv has to pay the full price for their hosting.
so the LLM detection problem is (theoretically) impossible for SOTA LLMs; in practice, it could be easier due to the RLHF stage inserting idiosyncrasies.
Anecdotal: A few weeks ago, I came across a story on HN where many commenters immediately recognized that an LLM had written the article, and the author had actually released his prompts and iterations. So it was not a one-shot prompt but more like 10 iterations, and still, many people saw that an LLM wrote it.
And anyway, those accuracies tend to be measured on 100% human-generated vs. 100% machine-generated texts by a single LLM... good luck with texts that contain a mix of human and LLM contents, mix of contents by several LLMs, or an LLM asked to "mask" the output of another.
I think detection is a lost cause.
They should solve the real problem of obtaining more funding and volunteers so that they can take on the increased volume of submissions. Especially now that AI's here and we can all be 3 times as productive for the same effort.
Huh, I guess it's only a subset of papers, not all of them. My brain doesn't work that way, because I don't like assigning custom rules for special cases (edit: because I usually view that as a form of discrimination). So sometimes I have a blind spot around the realities of a problem that someone is facing, that don't have much to do with its idealization.
What I mean is, I don't know that it's up to arXiv to determine what a "review article and position paper" is. Because of that, they must let all papers through, or have all papers face the same review standards.
When I see someone getting their fingers into something, like muddying/dithering concepts, shifting focus to something other than the crux of an argument (or using bad faith arguments, etc), I view it as corruption. It's a means for minority forces to insert their will over the majority. In this case, by potentially blocking meaningful work from reaching the public eye on a technicality.
So I admit that I was wrong to jump to conclusions. But I don't know that I was wrong in principle or spirit.
Those are terms of art, not arbitrary categories. They didn't make them up.
This does not seem like a win even if your “fight AI with AI plan works.”