FilterHN

Strilanc

4 months ago

[-]

When I went to the APS March Meeting earlier this year, I talked with the editor of a scientific journal and asked them if they were worried about LLM generated papers. They said actually their main worry wasn't LLM-generated papers, it was LLM-generated reviews.

LLMs are much better at plausibly summarizing content than they are at doing long sequences of reasoning, so they're much better at generating believable reviews than believable papers. Plus reviews are pretty tedious to do, giving an incentive to half-ass it with an LLM. Plus reviews are usually not shared publicly, taking away some of the potential embarrassment.

empiko

4 months ago

[-]

We already got an LLM generated meta review that was very clearly just summarization of reviews. There were some pretty egregious cases of borderline hallucinated remarks. This was ACL Rolling Review, so basically the most prestigious NLP venue and the editors told us to suck it up. Very disappointing and I genuinely worry about the state of science and how this will affect people who rely on scientometric criteria.

Al-Khwarizmi

4 months ago

[-]

This is a problem in general, but the unmitigated disaster that is ARR (ACL Rolling Review) doesn't help.

On the one hand, if you submit to a conference, you are forced to "volunteer" for that cycle. Which is a good idea from a "justice" point of view, but its also a sure way of generating unmotivated reviewers. Not only because a person might be unmotivated in general, but because the -rather short- reviewing period may coincide in your vacation (this happened to many people with EMNLP, whose reviewing period was in the summer) and you're not given any alternative but to "volunteer" and deal with it.

On the other hand, even regular reviewers aren't treated too well. Lately they implemented a minimum max load of 4 (which can push people towards choosing uncomfortable loads, in fact, that seems to be the purpose) and loads aren't even respected (IIRC there have been mails to the tune of "some people set a max load but we got a lot of submissions so you may get more submissions than your load, lololol").

While I don't condone using LLMs for reviewing and I would never do such a thing, I am not too surprised that these things happen given that ARR makes the already often thankless job of reviewing even more annoying.

To be honest, lately, I have gotten better quality reviews from the supposedly second-tier conferences that haven't joined ARR (e.g. this year's LREC-COLING) than from ARR. Although sample size is very small, of course.

jll29

4 months ago

[-]

Most conferences have been flooded with submissions, and ACL is no exception.

A consequence of that is that there are not sufficient numbers of reviewers available who are qualified to review these manuscripts.

Conference organizers might be keen to accept many or most who offer to volunteer, but clearly there is now a large pool of people that have never done this before, and were never taught how to do this. Add some time pressure, and people will try out some tool, just because it exists.

GPT-generated docs have a particular tone that you can detect if you've played a bit with ChatGPT and if you have a feel for language. Such reviews should be kicked out. I would be interested to view this review (anonymized if you like - by taking out bits that reveal too narrowly what it's about).

The "rolling" model of ARR is a pain, though, because instead of slaving for a month you feel like slaving (conducting scientific peer review free of charge = slave labor) all year round. Last month, I got contacted by a book editor to review a scientific book for $100. I told her I'm not going to read 350 pages, to write two pages worth of book review; to do this properly one would need two days, and I quoted my consulting day rate. On top of that, this email came in the vacation month of August. Of course, said person was never heard of again.

joshvm

4 months ago

[-]

We had what we strongly suspect is an LLM-written review for NeurIPS. It was kind of subtle if you weren't looking carefully and I can see that an AC might miss it. The suggestions for improvement weren't _wrong_, but the GPT response picked up on some extremely specific things in the paper that were mostly irrelevant (other reviewers actually pointed out the odd typo and small corrections or improvemnts where we'd made statements).

Pretty hard to combat. We just rebutted as if it were a real review - maybe it was - and hope that the chairs see it. Speaking to other folks, opinions are split over whether this sort of review should be flagged. I know some people who tried to query a review and it didn't help.

There were other small cues - the English was perfect, while other reviewers made small slips indicative of non-native speakers. One was simply the discrepancy between the tone of the review (generally very positive) and the middle-of-the-road rating and confidence. The structure of the review was very "The authors do X, Y, Z. This is important because A, B, C." and the reviewer didn't bother to fill out any of the other review sections (they just wrote single-word answeres to all of them).

The kicker was actually putting our paper in to 4o and asking it to write a review and seeing the same keywords pop up.

userbinator

4 months ago

[-]

so basically the most prestigious NLP venue

I see "dogfooding" has now been taken to its natural conclusion.

ahartmetz

4 months ago

[-]

> people who rely on scientometric criteria

Not defending LLM papers at all, but these people can go to hell. If "scientometrics" was ever a good idea, after making the measure the target, it for sure isn't anymore. A longer, carefully written, comprehensive paper is rated worse than many short, incremental, hastily written papers.

reliabilityguy

4 months ago

[-]

Well, given that the only thing that matters for tenure reviews is the “service”, i.e., roughly a list of conferences the applicant reviewed/performed some sort of service at, this is barely a surprise.

Right now there is now incentive to do a high quality review unless the reviewer is motivated.

Der_Einzige

4 months ago

[-]

With NeurIPS 2024 reviews going on right now, I'm sure that a whole lot of these kind of reviews are being generated daily.

bravura

4 months ago

[-]

With ICLR paper deadline coming up, I guess it's worth wargaming how GPT4 would review my submission.

joshvm

4 months ago

[-]

See my other post - we had exactly this for NeurIPS. It is definitely worth seeing what GPT says about your paper if only because it's a free review. The criticisms it gave us weren't wrong per se, they were just weakly backed up and it would still be up to a reviewer to judge how relevant they are or not. Every paper has downsides, but you need domain knowledge to judge if it's a small issue or a killer. Amusingly, our LLM-reviewer gave a much lower score than when we asked GPT to provide a rating (and also significantly lower than the other reviewers).

One example was that GPT took an explicit geographic location from a figure caption and used that as a reference point when suggesting improvements (along the lines of "location X is under-represented on this map") I assume because it places some high degree of relevance to figures and the abstract when summarising papers. I think you might be able to combat this by writing defensively - in our case we might have avoided that by saying "more information about geographic diversity may be found in X and the supplementary information"

zaptrem

4 months ago

[-]

Better yet, generate some adversarial perturbations to the text (or an invisible prompt) to cause it to give you a perfect review!

nextaccountic

4 months ago

[-]

Could you share it publicly or would you face adverse consequences?

If you can please publish it and maybe post here on HN or reddit.

jampekka

4 months ago

[-]

LLMs reviewing LLM generated articles via LLM editors is more or less guaranteed to become a massive thing given the incentive structures/survival pressures of everyone involved.

Researchers get massive CVs, reviewers and editors get off easy, admins get to show great output numbers from their institutions, and of course the publishers continue making hand over fist.

It's a rather broken system.

basch

4 months ago

[-]

It might follow to say that current LLM;s arent trained to generate papers, BUT they also don't really need to reason.

They just need to mimic the appearance of reason, follow the same pattern of progression. Ingesting enough of what amounts to executed templates will teach it to generate its own results as if output from the same template.

Eisenstein

4 months ago

[-]

What is the difference between 'reasoning' and 'appearing to be reasoning' if the results are the same with the same input?

ben_w

4 months ago

[-]

The outputs aren't really the same, they simply seem plausible at first glance.

For example, I recently experimented with using ChatGPT to translate a Wikipedia article, on the grounds that it mighy maintain all the formatting and that Transformer models are also used by Google Translate.

As it was an experiment, I did actually check the results before submitting the translated article.

First roughly 3/4 were fine. Final quarter was completely invented but plausible, including references.

LLMs are very useful tools, I'll gladly use them to help with various tasks and they can (with low reliability but it has happened) even manage a whole project, but right now they should treated with caution and not left unsupervised — Peter principle, being promoted beyond their competence, still applies even though they're not human employees.

Sakos

4 months ago

[-]

Because the results aren't the same? I use AI every day for software development and a number of other topics. It's very easy to recognize the points where the illusion breaks and how it breaks clearly indicates to me that there's no actual reasoning in the response I've gotten. It often feels like I'm doing the reasoning for the AI and not the other way around.

nis0s

4 months ago

[-]

From what I’ve seen, the results are not the same. In the latter scenario, there’s a risk of encountering a non sequitur all of a sudden, and the citations may be nonexistent. There’s also no guarantee that what you’re stating is factually correct when your logic is unbounded by reality.

kovezd

4 months ago

[-]

I can see how LLMs contribute to raise the standard in that field. For example, surveying related research. Also, maybe in the not too distant future, reproducing (some) of the results.

jll29

4 months ago

[-]

Writing consists of iterated re-writing (to me, anyways), i.e. better and better ways to express content 1. correctly, 2. clearly and 3. space-economically.

By writing it down (yourself) you understand what claims each piece of related work discussed has made (and can realistically make - as there sometims are inflationary lists of claims in papers), and this helps you formulate your own claim as it relates to them (new task, novel method for known task, like older method but works better, nearly as good as a past method but runs faster etc.).

If you outsource it to a machine you no longer see it through, and the result will be poor unless you are a very bad writer.

I can, however, see a role for LLMs in an electronic "learn how to write better" tutoring system.

Eisenstein

4 months ago

[-]

Does every researcher write summaries of related research themselves?

jpeloquin

4 months ago

[-]

Pretty much yes. Critical analysis is a necessary skill that needs practice. It's also necessary to be aware of the intricacies of work in one's own topic area, defined narrowly, to clearly communicate how one's own methods are similar/different to others' methods.

refibrillator

4 months ago

[-]

Hmm there may be a bug in the authors’ python script that searches google scholar for the phrases "as of my last knowledge update" or "I don't have access to real-time data". You can see the code in appendix B.

The bug happens if the ‘bib’ key doesn’t exist in the api response. That leads to the urls array having more rows than the paper_data array. So the columns could become mismatched in the final data frame. It seems they made a third array called flag which could be used to detect and remove the bad results, but it’s not used any where in the posted code.

Not clear to me how this would affect their analysis, it does seem like something they would catch when manually reviewing the papers. But perhaps the bibliographic data wasn’t reviewed and only used to calculate the summary stats etc.

4 months ago

[-]

That sounds important enough to contact the authors. Best case, they fixed it up manually; worst case, lots of papers are publicly accused of being made up and the whole farming/fish-focused summary they produced is completely wrong.

krisrs

3 months ago

[-]

Hi there! My name is Kristofer, one of the authors of this research note. I also wrote the script. We were notified via email about this comment. Please see below for our response. Thank you for your interest in our research! (I'm removing the sender's name to respect their privacy)

""" Dear XXXX,

MY name is Kristofer, I’m one of the co-authors for the GPT paper. I also wrote the script for the data collection. Jutta forwarded your email regarding the possible bug.

First of all, let me apologise for the late response. Apparently your email made its way to the spam folder, which of course is regrettable. I would also like to thank you for reaching out to us. We are pleased to see the interest of the HN community in transparent and reliable research.

We looked at the comment and the concern around the bug. We’d like to point out that the original commenter was right in saying “it does seem like something they would catch when manually reviewing the papers”. We in fact reviewed the output manually and carefully for any potential errors. In other words, we opened and searched for the query string manually, which also helped determine whether the use of LLMs was declared in some form or other. This is of course a sensitive topic and we took great care to be thorough.

Nevertheless, we once more did a manual review of the code and the data, in light of this potential bug, and we’re glad to say no row-column mismatch is present. You can find the data here: https://doi.org/10.7910/DVN/WUVD8X

Please don’t hesitate if you have any more questions.

All the best, Kristofer """

https://www.hb.se/en/research/research-portal/researchers/JU...

hiddencost

4 months ago

[-]

Contact info for the first author

Lerc

4 months ago

[-]

As a tangent to the paper topic itself, what should be the standard procedure for publishing data gathering code like this? Given that they don't specify which version of any libraries or APIs used and that updates occur over time, API's change etc. inevitably resulting in code rot. It will eventually be impossible to figure out exactly what this code did.

With meticulous version records it should at least be possible to ascertain what the code did by reconstructing that exact version (assuming stored back versions exist)

jpeloquin

4 months ago

[-]

In my opinion, archive the data that was actually gathered and the code's intermediate & final outputs. Write the code clearly enough that what it did can be understood by reading it alone, since with pervasive software churn it won't be runnable as-is forever. As a bonus, this approach works even when some steps are manual processes.

jerpint

4 months ago

[-]

Using a colab with printed outputs could be a good option to at the very least hint to reproducing results independently

https://www.youtube.com/results?search_query=academic+fraud

nomilk

4 months ago

[-]

GPT might make fabricating scientific papers easier, but let's not forget how many humans fabricated scientific research in recent years - they did a great job without AI!

For any who haven't seen/heard, this makes for some entertaining and eye-opening viewing!

benreesman

4 months ago

[-]

I think it’s important to remember that while the tidal wave of spam just starting to crest courtesy of the less scrupulous LLM vendors is uh, necessary to address, this century’s war on epistemology was well underway already in the grand traditions of periodic wars on the idea that facts are even aspirationally, directionally worthwhile. The phrase “alternative facts” hit the mainstream in 2016 and the idea that resistance is futile on broad-spectrum digital weaponized bytes was muscular then (that was around the time I was starting to feel ill for being a key architect of it).

Now technology is a human artifact and always ends up resembling its creators or financiers or both: I’d have nice fonts on my computer in 2024 most likely either way, but it’s directly because of Jobs they were available in 1984 to a household budget.

If someone other than Altman had or some other insight than “this thing can lie in a newly scalable way” was the escape velocity moment on LLMs then we’d still have test sets and metrics and just science going on in the Commanding Heights of the S&P 500, but these people are a symptom of our apathy around any noble instinct. If we had stuck firm on our values no effective altruism cult leader type would even make the press.

eli_gottlieb

4 months ago

[-]

>(that was around the time I was starting to feel ill for being a key architect of it).

Now this sounds like a story worth hearing!

greesil

4 months ago

[-]

The metric is in fact the stock price.

__loam

4 months ago

[-]

Post-modernism was a mistake.

benreesman

4 months ago

[-]

Indeed. I used to think that when it hybridized with Objectivism that was the nastiest malware around but god damn if Amodei and co haven’t rootkitted society to a new level.

anileated

4 months ago

[-]

Difficulty and scale matter where it comes to fabrication.

Academia is a lot about barriers, which while sometimes unpleasant and malfunctioning nevertheless serve a purpose (unfortunately, it is impossible to evaluate everything fully on per-case basis, so humans need shortcuts to filter out noise and determine quicker if it is worth spending attention on). One of the barriers is in the form of the paper itself. The fall of this barrier (notably through often unauthorised use of others’ IP) would likely bring about not sudden idyllic meritocracy but increased noise and/or strengthening of other barriers.

EasyMark

4 months ago

[-]

Sure, but that takes time, AI has the potential to generate “real sounding”papers in under a second. At least the fake papers before were rate limited.

bumby

4 months ago

[-]

Is there good data on how many are fraudulent? I know there’s reasonable data on replicability issues, but that’s potentially different.

croes

4 months ago

[-]

But AI is to papers what the assembly line was to cars.

pcrh

4 months ago

[-]

This kind of fabricated result is not a problem for practitioners in the relevant fields, who can easily distinguish between false and real work.

If there are instances where the ability to make such distinctions is lost, it is most likely to be so because the content lacks novelty, i.e. it simply regurgitates known and established facts. In which case it is a pointless effort, even if it might inflate the supposed author's list of publications.

As to the integrity of researchers, this is a known issue. The temptation to fabricate data existed long before the latest innovations in AI, and is very easy to do in most fields, particularly in medicine or biosciences which constitute the bulk of irreproducible research. Policing this kind of behavior is not altered by GPT or similar.

The bigger problem, however, is when non-experts attempt to become informed and are unable to distinguish between plausible and implausible sources of information. This is already a problem even without AI, consider the debates over the origins of SARS-CoV2, for example. The solution to this is the cultivation and funding of sources of expertise, e.g. in Universities and similar.

EnigmaFlare

4 months ago

[-]

Non-experts actually attempting to become informed (instead of just feeling like they're informed) can easily tell the difference too. The people being fooled are the ones who want to be fooled. They're looking for something to support their pre-existing belief. And for those people, they'll always find something they can convince themselves supports their belief, so I don't think it matters what false information is floating around.

It seems to be kind of a new thing for laymen to be reading scientific papers. 20 years ago, they just weren't accessible. You had to physically go to a local university library and work out how to use the arcane search tools, which wouldn't really find what you wanted anyway. And even then, you couldn't take it home and half the time you couldn't even photocopy it because you needed a student ID card to use the photocopier.

tkgally

4 months ago

[-]

For a paper that includes both a broad discussion of the scholarly issues raised by LLMs and wide-ranging policy recommendations, I wish the authors had taken a more nuanced approach to data collection than just searching for “as of my last knowledge update” and/or “I don’t have access to real-time data” and weeding out the false positives manually. LLMs can be used in scholarly writing in many ways that will not be caught with such a coarse sieve. Some are obviously illegitimate, such as having an LLM write an entire paper with fabricated data. But there are other ways that are not so clearly unacceptable.

For example, the authors’ statement that “[GPT’s] undeclared use—beyond proofreading—has potentially far-reaching implications for both science and society” suggests that, for them, using LLMs for “proofreading” is okay. But “proofreading” is understood in various ways. For some people, it would include only correcting spelling and grammatical mistakes. For others, especially for people who are not native speakers of English, it can also include changing the wording and even rewriting entire sentences and paragraphs to make the meaning clearer. To what extent can one use an LLM for such revision without declaring that one has done so?

daghamm

4 months ago

[-]

Last time we discussed this, someone basically searched for phrases such as "certainly I can do X for you" and assumed that meant GPT was used. HN noticed that many of the accused papers actually predated openai.

Hope this research is better.

xandrius

4 months ago

[-]

How else would that phrase go into a real paper then?

hodgesrm

4 months ago

[-]

> Two main risks arise... First, the abundance of fabricated “studies” seeping into all areas of the research infrastructure... A second risk lies in the increased possibility that convincingly scientific-looking content was in fact deceitfully created with AI tools...

A third risk: ChatGPT has no understanding of "truth" in the sense of facts reported by established, trusted sources. I'm doing a research project related to use of data lakes and tried using ChatGPT to search for original sources. It's a shitshow of fabricated links and pedestrian summaries of marketing materials.

This feels like an evolutionary dead end.

kenjackson

4 months ago

[-]

It sounds like your use of AI is one of the worst uses. Standard semantic search would be much better and appropriate.

passion__desire

4 months ago

[-]

Existence of LLMs make Google search even more relevant for cross-checking rather than less relevant for deep research. Daniel Dennett said we should have all levels of searches available for everyone i.e. from basic string matching to semantic matching. [0]

[0] https://youtu.be/arEvPIhOLyQ?t=1139

hodgesrm

4 months ago

[-]

No disagreement with that. My expectations were not high--but I was still surprised how bad it was. There are absolutely no guardrails.

nis0s

4 months ago

[-]

If summarization and analysis isn’t the main use of AI, then what is?

HeatrayEnjoyer

4 months ago

[-]

How do you run a semantic search

4 months ago

[-]

> tried using ChatGPT to search for original sources

That's a bad idea, do not do that. Regardless of the the knowledge contained in ChatGPT, it's a completely wrong tool/tech - like using a jackhammer as a screwdriver. If your want original sources, then services like https://perplexity.ai can do it. It's not even an issue with ChatGPT as such, it was never intended for that - that's why they're trying to create search as well https://openai.com/index/searchgpt-prototype/

hodgesrm

4 months ago

[-]

Perplexity.ai looks a lot better. Thanks for the link.

(edited: typo)

4 months ago

[-]

I appreciate that, appropriately, the article image is not AI-generated.

EnigmaFlare

4 months ago

[-]

It's silly that there's a stigma attached to AI generated images in cases where it's perfectly reasonable to do. People seem to appreciate things more for the fact that they were created by spending time out of another human's life more than what it actually is.

4 months ago

[-]

It would be silly if they were indistinguishable from human-created images, but they aren't, exhibiting the typical AI artifacts and weirdness, and thereby signal a lack of care/caring.

EnigmaFlare

4 months ago

[-]

"lack of care" - that's the part about spending time out of another human's life. It's not the poor quality that's the problem but the lack of human effort. Oil paintings are full of visible brush strokes which are an artifact but people love them. For most applications of art - advertising, background decorations, news article pictures, etc. there really is no need to show that humans spent effort on it.

The human effort idea is even a bit morally objectionable. You can feel that you're worth more than others because more of the lives of others were consumed to create your possessions. It's a zero sum game where poor people can never afford high-care art because their time is worth less than the artist's.

__loam

4 months ago

[-]

It's built on theft and it's a negative quality signal usually.

judge2020

4 months ago

[-]

I was able to get pretty close with chatgpt: https://rr.judge.sh/Commabutterfly/76b34e/nmwguWGt8pIe.jpg

> create a picture of scrabble pieces strewn on a table, with a closeup of a line of scrabble letters spelling "CHATGPT" on top of them. photographic, realistic quality, maintain realism and believability

https://ideogram.ai/assets/image/lossless/response/vF81gKjHS...

GaggiX

4 months ago

[-]

You can get much better results with Ideogram 2 (also free):

https://ideogram.ai/assets/image/lossless/response/EcRpDLumS...

Almost the same prompt.

https://replicate.com/p/xm41nvz05drm00chsywb6am7f0

vunderba

4 months ago

[-]

You can also get reasonably close with an open model that you can run locally (flux dev).

https://replicate.com/p/kdw8bnkj39rm40chsyzbyg5e04

But of course anyone who has even a passing familiarity with scrabble is going to be able to tell that something's off.

GaggiX

4 months ago

[-]

The biggest problem with the default Flux model is that it generates images with that strong AI look, probably caused by the distillation of the CFG. You should try some LoRAs for this, and also prompt the model to generate the rack that holds the letters.

vunderba

4 months ago

[-]

Good point. I have a comfyui setup for it but its super basic right now just the diffusion model / clip loader / vae. Another thing you've probably noticed is that 99% of images from Flux tend to have that classic narrow depth of field look. I've seen people occasionally be able to get around it with pretty amusing prompt tokens like "instagram photo, selfie, gopro, etc." though.

renewiltord

4 months ago

[-]

Great tip. The text handling here is far superior.

4 months ago

[-]

The “1”s are still inconsistent, and of course the numbers are all wrong.

skybrian

4 months ago

[-]

This seems like a good idea for a contest.

4 months ago

[-]

The number markings on the Scrabble pieces are nonsensical, the wooden ground looks like plastic, there are strange artifacts like the white smudge on the edge of the “E” tile in the front, and so on.

AI-generated images are clearly identifiable as such, and it just gets annoying to continually see those desultory fabrications.

Daub

4 months ago

[-]

I prefer yours. Better lighting.

doesnt_know

4 months ago

[-]

The points on all the tiles are messed up and there are tiles with random squiggles where there should be letters...

kgeist

4 months ago

[-]

I wonder how many of the GPT-generated papers are actually made by people whose native language is not English and who want to improve their English. That would explain various "as of my last knowledge update" still left intact in the papers, if the authors don't fully understand what it means.

diggan

4 months ago

[-]

I'm guessing that we don't want people to write papers in a language where they don't understand "as of my last knowledge update", as probably a lot of terms in their paper have more advanced language than that.

Would be better in those cases for people to write their paper in their native language and let readers translate it for themselves.

EasyMark

4 months ago

[-]

It’s not a black and white problem. Some people may have good ability to read but not write/speak a language (I’m that way with Spanish) so the cases will vary as to which would work best user or author translated, it could be good to include both version in any given paper and fix both problems.

oefrha

4 months ago

[-]

How about people stop responding to titles for a change. This isn’t about papers that merely used ChatGPT and got caught by some cutting edge detection techniques, it’s about papers that blatantly include ChatGPT boilerplates like

> “as of my last knowledge update” and/or “I don’t have access to real-time data”

which suggests no human (don’t even need to be a researcher) read every sentence of these damn “papers”. That’s a pretty low bar to clear, if you can’t even bother to read generated crap before including it in your paper, your academic integrity is negative and not a word from you can carry any weight.

RobotToaster

4 months ago

[-]

> which suggests no human (don’t even need to be a researcher) read every sentence of these damn “papers”.

Which also suggests none of the so called reviewers or editors read the entire paper before including it in their journal...

rosmax_1337

4 months ago

[-]

I think we might be entering a dark age of sorts.

RobotToaster

4 months ago

[-]

If the papers are correct, what does it matter if the author used AI?

If the papers are incorrect, then the reviewers should catch them.

4 months ago

[-]

Just because ChatGPT was used to help write a paper doesn't in itself mean that the data or findings are fabricated.

ceejayoz

4 months ago

[-]

Sure, but there are some... pretty egregious cases. https://mashable.com/article/ai-rat-penis-diagram-midjourney...

hakanderyal

4 months ago

[-]

That’s the funniest piece of writing I’ve read in a longtime, thanks!

I wonder what they were thinking submitting the paper.

CatWChainsaw

4 months ago

[-]

They let the machines think for them, that's the whole problem.

[0] https://www.scientificamerican.com/article/chatbots-have-tho...

riedel

4 months ago

[-]

True. I am seeing chatgpt used by my colleagues (mostly no English native speakers) day to day and it mostly improves their writing (except for those wotfd that pop up a bit too often [0] like utilize [1]). So not all bad.

I am also hearing that a lot of reviewers and readers use it though. So we are often joking that PhD students (in CS) nowadays only write bullet point from their research. Generate prose that is used to generate bullet points.

[1] https://medium.com/learning-data/words-and-phrases-that-make...

CuriouslyC

4 months ago

[-]

Scientific writing is pretty bad usually so I'll count this as an improvement

j16sdiz

4 months ago

[-]

How can I trust the paper when there is no proper proofreading?

4 months ago

[-]

How do you know there is no proper proofreading? There is no way to tell, is there? Just because content was generated by an LLM doesn't in itself mean that it wasn't proofread.

TonyTrapp

4 months ago

[-]

> Methods

> We searched and scraped Google Scholar using the Python library Scholarly (Cholewiak et al., 2023) for papers that included specific phrases known to be common responses from ChatGPT and similar applications with the same underlying model (GPT3.5 or GPT4): “as of my last knowledge update” and/or “I don’t have access to real-time data” (see Appendix A).

If noone bothered to even spot and remove these, you can be pretty sure that no human ever read the whole paper before publication.

4 months ago

[-]

IMO, at this point, AI is very necessary as a pre-reviewer to weed out such papers that haven't been proofread. This is at both the journal as well as the preprint levels, preventing them from getting an audience.

anigbrowl

4 months ago

[-]

You can probably find some quality stuff in your local landfill too, but I am personally unwilling to sift through garbage.

cratermoon

4 months ago

[-]

The problem is not that a paper has fabricated content generated by ChatGPT, the problem is that there are many papers and they are polluting scholarship to the point that the base of evidence used in policy-making could be poisoned to the point of uselessness.

4 months ago

[-]

Firstly, "fabricated content" is a meaningless phrase. For the sake of argument, I use Github Copilot for "fabricating" every line of code. Does this make my code polluted? No, because I review every line of code, editing what's necessary, and more. It's the same way with scholarship. It doesn't say anything in itself.

Perhaps "unreviewed scholarship" would be a more concerning claim, but I don't yet see the evidence for it being a major concern.

gerdesj

4 months ago

[-]

Colour me surprised. An IT related search will generally end up with loads of returns that lead to AI generated wankery.

For example, suppose you wish to back up switch configs or dump a file or whatever and tftp is so easy and simple to setup. You'll tear it down later or firewall it or whatever.

So a quick search "linux tftp serevr" gets you to say: https://thelinuxcode.com/install_tftp_server_ubuntu/

All good until you try to use the --create flag which should allow you to upload to the server. That flag is not valid for tftp-hpa, it is valid on tftpd (another tftp daemon)

That's a hallucination. Hallucinations are fucking annoying and increasingly prevalent. In Windows land the humans hallucinate - C:\ SFC /SCANNOW does not fix anything except for something really madly self imposed.

4 months ago

[-]

That's not an AI hallucination. The content comes from Ubuntu community wiki https://help.ubuntu.com/community/TFTP - it was written in 2015. And at least in Debian, tftpd-hpa man page lists --create as valid https://manpages.debian.org/testing/tftpd-hpa/tftpd.8.en.htm...

Seems valid upstream too https://github.com/Distrotech/tftp-hpa/blob/5e95f248e8435eb3...

mkl

4 months ago

[-]

It says to put the --create option in /etc/default/tftpd-hpa. tftpd-hpa does support --create (at least on Ubuntu). The client program tftp-hpa (no d) doesn't support --create, but that's not what the instructions are talking about.

bongodongobob

4 months ago

[-]

It's funny you mention this because yesterday I had it write me a shell script to set up a TFTP server from scratch. I had it walk me through the process first, then said "ok now make that into a script." And it did and it works.

jeremynixon

4 months ago

[-]

There is article shows no evidence of fabrication, fraud or misinformation, while making accusations of all of them. All it shows is that ChatGPT was used, which is wildly escalated into "evidence manipulation" (ironically without evidence).

Much more work is needed to show that this means anything.

4 months ago

[-]

If the result was not read even to check for obvious boilerplate GPT markers, then we can't expect anything else in them was. That means anything else, numbers, interpretation, conclusion was potentially never checked.

The authors use fraud in a specific sense here: "using ChatGPT fraudulently or undeclared" where they proved that the produced text was included without proper review. They also never accused those papers of misinformation, so they don't need to show evidence of that.

Barrin92

4 months ago

[-]

Honestly what we need to do is establish much stronger credentialing schemes. The "only a good guy with an AI can stop a bad guy with an AI" approach of trying to filter out bad content is just a hopeless arms race and unproductive.

In a sense we need to go back two steps and websites need to be much stronger curators of knowledge again, and we need some reliable ways to sign and attribute real authorship to publications. So that when someone publishes a fake paper there is always a human being who signed it and can be held accountable. There's a practically unlimited number of automated systems, but only a limited number of people trying to benefit from it.

In the same way https went from being rare to being the norm because the assumption that things are default-authentic doesn't hold, the same just needs to happen to publishing. If you have a functioning reputation system and you can put on a price on fake information 99% of it is dis-incentivized.

tbrownaw

4 months ago

[-]

Is this not already a thing? You can look up purported papers by DOI, and whatever journal it came from supposedly had it reviewed and should know who sent it to them.

(And if that doesn't work, how is what you're suggesting meaningfully different?)

Barrin92

4 months ago

[-]

It's not at all a thing. Here's a recent study looking at citation fraud on Google Scholar including professional citation boosting services including with fake identities. It's widespread practice. https://arxiv.org/abs/2402.04607

Having a machine verifiable, cryptographic identity system that renders these kinds of things transparent, basically the equivalent of a ledger but instead of using it for get-rich schemes using it for identity would probably make verification enforceable.