FilterHN

amadeuspagel

4 months ago

[-]

Great idea.

- The frontpage should directly show the list of papers, like with HN. You shouldn't have to click on "trending" first. (When you are logged in, you see a list of featured papers on the homepage, which isn't as engaging as the "trending" page. Again, compare HN: Same homepage whether you're logged in or not.)

- Ranking shouldn't be based on comment activity, which ranks controversial papers, rather papers should be voted on like comments.

- It's slightly confusing that usernames allow spaces. It will also make it harder to implement some kind of @ functionality in the comments.

- Use HTML rather then PDF. Something that could be trivial with HTML, like clicking on an image to show a bigger version, requires you to awkwardly zoom in with PDF. With HTML, you would also have one column, which would fit better with the split paper/comments view.

https://arxiv.org/html/2404.00541v1

impendia

4 months ago

[-]

> Use HTML rather then PDF.

The PDF is the original paper, as it appears on arXiv, so using PDF is natural.

In general academics prefer PDF to HTML. In part, this is just because our tooling produces PDFs, so this is easiest. But also, we tend to prefer that the formatting be semi-canonical, so that "the bottom of page 7" or "three lines after Theorem 1.2" are meaningful things to say and ask questions about.

That said, the arXiv is rolling out an experimental LaTeX-to-HTML converter for those who prefer HTML, for those who usually prefer PDF but may be just browsing on their phone at the time, or for those who have accessibility issues with PDFs. I just checked this out for one of my own papers; it is not perfect, but it is pretty good, especially given that I did absolutely nothing to ensure that our work would look good in this format:

So it looks like we're converging towards having the best of both worlds.

throw10920

4 months ago

[-]

> In general academics prefer PDF to HTML. In part, this is just because our tooling produces PDFs, so this is easiest.

The tooling producing PDF by default absolutely makes the preference for PDF justifiable. However, tooling is driven by usage - if more papers come with rendered HTML (e.g. through Pandoc if necessary), and people start preferring to consume HTML, then tooling support for HTML will improve.

> But also, we tend to prefer that the formatting be semi-canonical, so that "the bottom of page 7" or "three lines after Theorem 1.2" are meaningful things to say and ask questions about.

Couldn't you replace references like "the bottom of page 7" with others like "two sentences after theorem 1.2" that are layout-independent? This would also make it easier to rewrite parts of the paper without having to go back and fix all of your layout-dependent references when the layout shifts.

HTML has strong advantages for both paper and electronic reading, so I think it's worth making an effort to adopt.

When I print out a paper to take notes, the margins are usually too narrow for my note-taking, and I additionally have a preference for a narrow margin on one side and a wide margin on the other (on the same side, not alternating with page parity like a book), which virtually no paper has in its PDF representation. When I read a paper electronically, I want to eliminate pagination and read the entire thing as a single long page. Both of these things are significantly easier to do with HTML than LaTeX (and, in the case of the "eliminate pagination" case, I've never found a way to do it with LaTeX at all).

(also, in general, HTML is just far more flexible and accessible than PDF for most people to modify to suit their preferences - I think most on HN would agree with that)

jltsiren

4 months ago

[-]

HTML still lacks one key feature: a way of storing the entire document as a single file that remains fully functional offline and can be reasonably expected to be widely supported for decades. Research papers are used both for communicating new results and for archiving them. The long-term stability needed for the latter has never been a strong point of web technology.

https://www.dolekemp96.org/main.htm

impendia

4 months ago

[-]

Indeed, I posted my first paper in 2006. It is still live on the internet in exactly the same format, and I've done absolutely nothing to maintain it.

I'm guessing there are few web pages of any significance which need to stay exactly the same for a long time. Here is one example which I've seen trotted out from time to time on HN:

This is clearly the exception. It seems that maintainers of web pages usually expect that they'll need to maintain and update them for as long as they want them to be accessible, and that's definitely not something I'd care to do for research papers.

faangguyindia

4 months ago

[-]

You can make an HTML file self-contained by embedding CSS in a `<style>` tag and converting images to Base64, embedding them directly in the `<img>` tag as data URLs. This removes the need for external files, making everything contained within a single HTML file.

throw10920

4 months ago

[-]

I agree that PDF is better than web technologies in terms of stability. I'm not objecting to PDFs being available (like you said, for archive purposes you want them provided by the authors), but to PDFs being the default, and oftentimes only, format available.

setopt

4 months ago

[-]

Note that ePub is basically just a zipped HTML file, and has become quite common for ebooks. I don’t know how that might be for archiving purposes?

I generally stick to PDF myself, but I do sometimes wish it would be more ergonomic to reflow a 2-column paper for reading on mobile on the go, for example. Also, ePub is easier to read in night mode than PDF recoloring, and seems easier to search through (try searching for a Greek letter in a PDF…).

EDIT: How is the math support in ePub though? Are people embedding KaTeX/MathJax or just relying on MathML, and how is the quality compared to TeX?

4 months ago

[-]

> Couldn't you replace references like "the bottom of page 7" with others like "two sentences after theorem 1.2" that are layout-independent?

Yes, but I think such references are inherently harder to locate. Personally I try to just avoid making references to specific locations in the document and instead name anything that needs to be referenced (e.g. Figure 5, Theorem 3.2).

throw10920

4 months ago

[-]

Yes, I absolutely agree - I just figured that there had to be a reason that someone would want to do that. Chesterton's Fence and whatnot.

impendia

4 months ago

[-]

> This would also make it easier to rewrite parts of the paper without having to go back and fix all of your layout-dependent references when the layout shifts.

Just thinking about having to change layout-dependent references, every time I add two sentences to the introduction, gives me a migraine.

I never do anything like this in the paper itself, nor does anyone else that I'm aware of. I'm thinking of informal discussions, where I ask another mathematician about something specific in a paper.

gwern

4 months ago

[-]

I increasingly recommend against the Arxiv HTML version. I thought it had an acceptable start and they would fix the remaining problems and rapidly become on par with the PDF, but that seems to not be happening.

The HTML version is seriously buggy; and the worst part is, a lot of those bugs take the form of silently dropping or hiding content. It's bad enough when half the paper is gone, because at least you notice that quickly, but it'll also do things like silently drop sections or figures, and you won't realize that until you hit a reference like 'as discussed in Section 3.1' and you wonder how you missed that. I filed like 25 bugs on their HTML pages, concentrating on the really big issues (minor typographic & styling issues are too legion to try to report), and AFAIK, not a single one has been fixed in a year+. Whatever resources they're devoting to it, it's apparently totally inadequate to the task.

generationP

4 months ago

[-]

I think development on the TeX-to-HTML compiler has slowed down at some point, and it's far from perfect yet. Some of the issues are probably HTML5 limitations, unlikely to be fixed any time soon (unless one wants formulas to become graphics).

But there is another problem: It takes too long to load on mobile and doesn't reflow. I thought mobile was one of the reasons people wanted HTML in the first place!

gwern

4 months ago

[-]

> Some of the issues are probably HTML5 limitations, unlikely to be fixed any time soon (unless one wants formulas to become graphics).

You can convert a lot of formulas into either Mathjax/Katex-style fonts or MathML, or even just HTML+Unicode. (I get a very long way with pure HTML+Unicode+CSS on Gwern.net, and didn't even have to write a TeX-to-HTML compiler - just a long LLM prompt: https://github.com/gwern/gwern.net/blob/master/build/latex2u... )

But that's missing the point. Who cares about all of the refinements like reflow or pretty equations, when you are routinely serving massively corrupted and silently incomplete HTML versions? I don't care how good the typography is in your book if it's missing 5% of pages at random and also doesn't have any page numbers or table of contents...

amadeuspagel

4 months ago

[-]

In PDFs on arXiv, syntax highlighted codeblocks are graphics.

davrosthedalek

4 months ago

[-]

I think that's essentially only true if they are that in the original source. You can check for yourself, most papers have the TeX source available on arxiv.

ethanol-brain

4 months ago

[-]

> That said, the arXiv is rolling out an experimental LaTeX-to-HTML

Some history: https://www.arxiv-vanity.com/

stogot

4 months ago

[-]

I’m ok with the PDF but the title should be in HTML. The pdf failed to load for me due to tracker blockers (also why?!) so I was confused because there was no title but had comments

throw_pm23

4 months ago

[-]

Counterpoint: please don't do any of the above and keep arxiv as it is. It is too valuable to mess it up, it is the few things on the internet that have not been ruined yet, and the "comment activity" can happen in the articles themselves at the scale of years, decades, and centuries.

Epa095

4 months ago

[-]

This seems to be a completely different team than arxiv, making a discussion forum on the side.

And I prefer this over discussions on 'X'.

4 months ago

[-]

I actually want the above, BUT I don't want it to be open to everyone. Not all gate keeping is bad.

We don't lack places that the public can engage with researchers and experts. What we do lack are places where researchers/experts can communicate with one another __and expect the other person to be a peer__. The bar to arxiv is (absurdly) low, and I think that's fine.

Not everything has to be for everyone.

My longer comment: https://news.ycombinator.com/item?id=41484123

I'm going to go crazy if I get more GitHub issues asking where the source code is or how to fine tune a model. My research project page is not a Google Search engine nor ChatGPT...

musicale

4 months ago

[-]

> it is the few things on the internet that have not been ruined yet

Hmm, this is an interesting point.

Retr0id

4 months ago

[-]

> rather papers should be voted on like comments.

I don't think this is an inherently better approach, but maybe there should be an option for different ranking mechanisms. You could also rank by things like cite-frequency, cite-recency, "cite pagerank", etc.

throwthrowuknow

4 months ago

[-]

Agree, don’t sink a bunch of effort into creating a ranking algorithm. Expose metrics that users can sort or filter by which will work for both signed in and signed out. If you want to add more tools for signed in users then let them define their own filters that they can save like comment activity plus weighted by author, commenter, recency, topic etc. See the nntp discussion that was on here the other day.

dartos

4 months ago

[-]

Yep. User driven ranking leads to people gaming the system for internet points.

anamexis

4 months ago

[-]

It doesn't seem like citations would be good for discovery, because there must be a significant latency between when a paper is released and when citations start coming in.

bee_rider

4 months ago

[-]

Probably it would be best to just get a site on the web and expose a bunch of different metrics so people can sort by whatever.

Citations are probably not the best metric for discovery, but also this really just makes me wonder if papers are not the best thing for discovery. An academic produces ideas, not papers, those are just a side-effect. The path is something like:

* make a idea

* write short conference papers about it

* present it in conferences

* write journal papers about it

* maybe somebody writes a thesis about it

(Talking to people about it throughout).

If we want to discover ideas as they are being worked on, I guess we’d want some proxy that captures whether all that stuff is progressing, and if anybody has noticed…

Finding that proxy seems incredibly difficult, maybe impossible.

4 months ago

[-]

I'm not sure I agree about papers just being a side effect. An idea by itself has significantly less value than an idea which has been clearly documented and evaluated. I think a paper is often still the best way to do this.

sestep

4 months ago

[-]

Tiny note: Stack Exchange also allows spaces in display names, and they make @ functionality work regardless: https://meta.stackexchange.com/a/43020/297476

Agreed that it makes it more complicated though.

4 months ago

[-]

Great idea, we'll look into making the home page the trending page soon.

Regarding HTMl, our original site actually only supported HTML (because it was easier to build an annotator for an HTML page). the issue is that a good ~25% of these papers don't render properly which pisses off a lot of academics. Academics spend a lot of time making their papers look nice for PDF, so when someone comes along and refactors their entire paper in HTML, not everyone is a fan.

That being said, I do think long term HTML makes a lot of sense for papers. It allows researchers to embed videos and other content (think, robotics papers!). At some point we do want to incorporate HTML papers back into the site (perhaps as a toggle).

DoctorOetker

4 months ago

[-]

I apologize for changing topic here:

Did you bulk download the arxiv metadata, PDF and or LaTeX files?

I am trying to figure out what the required space is for just the most recent version of the PDF's.

I can find mentions of the total size in their S3 bucket but unclear if that also includes older versions of the PDF's.

I also wonder if the Kaggle dataset is kept up to date since it states merely 1.7M articles instead of 2.4 I read elsewhere.

Edit: I just found the answers to my question here: https://info.arxiv.org/help/bulk_data_s3.html

ZeroSolstice

4 months ago

[-]

> The frontpage should directly show the list of papers, like with HN.

I disagree. There are numerous times where I have browsed the comments on a HN post where people haven't read the article and are just responding to the comment thread. The workflow for this seems a bit different in that a person would have already read a paper and wanted to read through existing discussions or respond to discussion. With that, having the search front and center would follow as the next steps for a person who read a paper and wanted to "search" for discussions related to that paper in particular.

HN is more an aimless browsing which is a bit different than researching a specific area or topic.

diggan

4 months ago

[-]

> - Ranking shouldn't be based on comment activity, which ranks controversial papers, rather papers should be voted on like comments.

How about not ranking things at all? I don't feel like things like this should be a popularity/"like" contest and instead let the content of the paper/comments speak for themselves. Yes, there will be some chaff to sort through when reading, but humanity will manage.

Just sort things by updated/created/timestamp and all the content will be equal.

pessimizer

4 months ago

[-]

> let the content of the paper/comments speak for themselves.

People can't read everything, and have rely on others to filter up the good stuff. If you read something random, based on no recommendation, it's charity work (the odds are extremely good that it is bad) and you should recommend that thing to other people if it turns out to be useful. Ultimately, that's the entire point of any of this design: if we don't care about any of the metadata on the papers, they could just be numbered text files at an ftp site.

The fewer things I have to read to find out they're shit, the longer life I have.

I say the opposite: put a lot of thought into how papers are organized and categorized, how comments on papers are organized and categorized, the means through which papers can be suggested to users who may be interested in them, and the methods by which users can inject their opinions and comment into those processes. Figure out how to thwart ways this process can be gamed.

Treat the content equally, don't force the content to be equal. Hacker News shouldn't just be the unfiltered new page.

gus_massa

4 months ago

[-]

Sorted by "new"...

Most articles are not interesting, most of the interesting ones are interesting only for a niche of a few researchers. The front page will be flowed by uninteresting stuff.

thornewolf

4 months ago

[-]

thats ranking by recency, which means i can abuse it by churning low quality content out to arXive

gradus_ad

4 months ago

[-]

> Ranking shouldn't be based on comment activity, which ranks controversial papers

But don't we want people's attention drawn to controversial/conversation generating papers? The whole point of the platform is to drive conversation

4 months ago

[-]

My guess is that they're trying to deal with a problem they're creating: being open to everyone. The problem with places like Twitter, Reddit, Github, HN, etc is that you don't know you're talking to a peer and the idiots asking irrelevant questions or proposing dumb things outnumber researchers a million to one. Even allowing public to upvote or affect the rankings is not beneficial to science.

I'm all for casting wide nets and making things available to everyone, but a little gate keeping is not bad (just don't gate keep by race, class, or those things). But I'm sorry, research is hard. There's a reason people spend decades researching things that at face value look trivial. Rabbit holes are everywhere and just because you don't know about them doesn't mean your opinion has equal weight.

We seriously lack areas where experts can talk to other experts.

woodson

4 months ago

[-]

The concern may be about what effect this will have on future papers (just like news headlines engineered for clickbait).

runningmike

4 months ago

[-]

Many people on earth have names with spaces. So good that a username can reflect a real name a person has.

cgshep

4 months ago

[-]

Tenured prof here. Every paper of mine goes on Arxiv with no exceptions, published under CC BY-NC-ND licenses. Some of us are working hard to overcome the system (e.g. look at the IACR's efforts). Unfortunately, academics are still hindered by institutional inertia; in fact, many prefer the status quo, usually those who rely on prestige over actual quality to advance their careers.

4 months ago

[-]

> usually those who rely on prestige over actual quality to advance their careers

Unfortunately for those of us pre-tenure, it's difficult to balance these as I'm sure you aware. We're evaluated by people who may have the best intentions, but don't work directly in our field. They then determine whether we keep our jobs. It's difficult not to consider prestige as a factor when you know those evaluating you will.

4 months ago

[-]

I cannot understand how people think that trying to evaluate people who are pushing the bounds of human knowledge is anything short of "a clusterfuck." I understand the appeal of metrics and prestige, as they provide some signal in an insanely noisy system, but to think they are perfectly aligned is ludicrous. Are we unwilling to just admit that the process is noisy? Is it really not okay that it is (and that we'll probably never escape this aspect)? Because I can't see us doing good science if we're not.

gigatexal

4 months ago

[-]

Thank you, thank you, thank you! I've no skin in the game (not an academic and a math idiot but I've a hole in my heart for Aaron Swartz and what he stood for) but I love that there are professors like yourself that believe in the free sharing of knowledge.

Ar-Curunir

4 months ago

[-]

What do you mean by the IACR’s efforts here? In the crypto community it’s very much the norm to put everything on eprint, and it is very rare to find a crypto paper not on there

4 months ago

[-]

  > Unfortunately, academics are still hindered by institutional inertia

As an ABD this has been a real pain point for me. Maybe I came into academia thinking what mattered most was the research. But now I'm the stereotypical PhD who passionately despises academia for its lack of being academic. I'm happy to have competition, but at the end of the day are we all not on the same team?

How the hell did we create a system where it is the norm that an advisor does not read a thesis, to read papers, to mentor? For that to be the norm among a committee? When I've had issues with getting works getting through review (even when they have high citations due to arxiv) I don't understand why it's acceptable for a response to be "keep trying" instead of "here, I read the paper and reviewer responses, let me help"[0]. It seems inefficient that we throw students into the deep end and watch them sink or learn to swim. I think there'd be a lot fewer dejected PhD students if there was a stronger focus on academics, mentorship, and collaboration over churning out ̶w̶i̶d̶g̶e̶t̶s̶ papers.

I think what pisses me off the most is thinking that research significance and success can be measured __purely__ through metrics like citation counts, H-indices, i10's, awards, etc. I'm not saying those are useless, but that we can evaluate without looking at the content? (as you say, actual quality of work) It's like we learned about Goodhart's Law and decided it was a feature not a bug.

(I know this is not always the case and there are many amazing advisors, but I'd be impressed if someone didn't know this is happening at least somewhere within their department.)

[0] If it takes a village to raise a child, it takes a department to mint a PhD. These types of things should come from committees, not just advisors. Our annual meetings and review shouldn't just be going through the motions.

4 months ago

[-]

> Tenured prof here.

Yeah, but every pre-tenure or postdoc is like “I can’t fight the system right now, I need to publish enough to still have a job two years from now”

DoctorOetker

4 months ago

[-]

helpful would be cheaper equipment and tools used in research, and unrestricted popular access to scientific literature

4 months ago

[-]

> (...) in fact, many prefer the status quo, usually those who rely on prestige over actual quality to advance their careers.

Your comment doesn't read like one from anyone with any relationship with academia. If you had, you'd know that the issue is not a vacuous "prestige" but funding being dependent on hard metrics such as impact factor, and in some cases with metrics being collected exclusively from a set of established peer-reviewed journals that must be whitelisted.

And ArXiv is not one of them.

This means that a big share of academia has their professional and future, as well as their institution's ability to raise funding, dependent on them publishing on a small set of non-open peer-reviewed journals.

Reading your post, you make it sound like anyone can just upload a random PDF to a random file server and call it a publication. That ain't it. If you fail to understand the problem, you certainly ain't the solution.

BeetleB

4 months ago

[-]

> Your comment doesn't read like one from anyone with any relationship with academia.

Your comment reads likewise.

He didn't say he publishes them exclusively on Arxiv. It's quite common for professors to post it there as well as submit to journals. Many (most?) journals allow for it - they don't insist the ones in arxiv be taken down - as long as they're posting preprints and not the final (copyrighted) version.

As an academic, you should also know that practices vary widely with discipline. As an example:

> dependent on them publishing on a small set of non-open peer-reviewed journals.

IIRC, NIH grants require publishing in open peer-reviewed journals.

Also, lots of disciplines are not heavily reliant on funding. In both universities I attended, the bulk of math professors did not even apply for grants! It's not required to get tenure (unlike engineering/physics). Also often true in some economics departments.

As an aside, your comment violates a number of HN guidelines.

4 months ago

[-]

  > Your comment reads likewise.

FWIW I thought it read like reviewer 2, which makes me actually think they have a relationship with academia.

The problem I have with their comment is that it rejects the critique by pointing to a different issue. As if there is a singular issue with academia that leads to the mess. So it comes off as a typical "reviewer 2" comment to me where the is more complaint and disagreement than critique.

FWIW, I think we in academia need to embrace the fuzziness and noise of evaluation. I think the issue is that by trying to make sense out of an extremely noisy process we placed too much value in the (still noisy) signals we can use. It is a problem to think that these are objective answers and deny the existence of Goodhart's Law (this is especially ironic in ML where any intro class discusses reward hacking). And in this, I think there's a strong coupling between cgshep's and chipdart's complaints.

As for publishing, I think we also lost sight of the main reason we publish: to communicate with our peers. Publishers played an important role since not even a century ago we could not make our works trivially available to our peers. But now the problem is information overload, not lack of information. And I think in part the review process is getting worse and worse each year as we place so little value on the act of reviewing, do not hold anyone accountable for a low quality review[0,1], do not hold ACs or Metas accountable for, and we have so many papers to review that I don't think we can expect high quality reviewing even if we actually incentivized it. I mean in ML we expect a few hundred ACs to oversee over ten thousand submissions?

My question is if we'll learn that the noisiness of the process and the randomness of rejection creates a negative feedback of papers. Where you "roll the dice" on the next conference. You resubmit without changes, as well as your new work (publish or perish baby). If we had quality reviewing at least this would push pressure for papers to get incrementally better instead of just being recycled. But recycling is a very efficient strategy right now and we've seen plenty of data to suggest it is.

[0] I understand the reasons for this. It is a tricky problem

[1] I'd actually argue we incentivize bad reviews. No one questions you when you reject a work and finding reasons to reject a work are far easier to accept one. There's always legitimate reasons to reject any work. Not to mention that the whole process is zero sum, since venue prestige is for some reason based on the percentage of papers rejected. As if there isn't even variance in year to year quality.

JadeNB

4 months ago

[-]

> And ArXiv is not one of them.

But putting your papers on the arXiv, as your parent said, doesn't mean you only put them on the arXiv. I put all my papers on the arXiv, but I also submit them for publication in journals that will help me make the case for funding and promotion.

dguest

4 months ago

[-]

I all fairness, I don't think the grandparent post disagrees with anything that is in the parent post here.

Yes, academia has tried to quantify prestige via impact factor and peer-reviewed journals. Yes, lots of people (even in Academia) feel that the system is being gamed, with by the publishing houses that own the journals being a common scapegoat.

The system isn't broken, but it also keeps its integrity through some dynamic tension: a bit of criticism is a good thing.

4 months ago

[-]

  > the issue is not a vacuous "prestige" but funding being dependent on hard metrics such as impact factor

These things are not in contention.

There is no singular problem to be solved, which is why it is so difficult. No smoking gun.

  > And ArXiv is not one of them.

ArXiv has a large impact on metrics and so called impact factor. But let's also not be delusional to the fact that a paper from a prestigious institution will always receive more citations (or any other metric) than an equal quality paper from a less prestigious institution. All our metrics can be hacked through publicity.

  > Reading your post

Reading yours, it sounds like you stand in the way of resolving issues in academia. Not because you don't have issues with it that you want to solve, but because you have already found the answer.

Your comment reads like someone who has a relationship with academia.

4 months ago

[-]

Just had an idea that may help the moderation AND encourage higher levels of discourse — comments are not published immediately.

When I was doing peer reviews, it would often take a day or more to read a paper, think it through, and then write up something thoughtful and constructive.

If you introduce a mechanism to delay comments (eg, holding all messages for 24-72 hours before publishing or only releasing new comments on Monday mornings) it would:

- encourage commenters to write longer thoughtful responses rather than short quick comment threads

- reduce back and forth flame wars

- ease the burden on moderators and give them time to do batches of work

- see if multiple commenters come to the same conclusions/critiques to minimize bandwagon effects

w-m

4 months ago

[-]

Hey alphaxiv, you won’t let me claim some of my preprints, because there’s no match with the email address. Which there can’t be, as we’re only listing a generic first.last@org addresses in the papers. Tried the claiming process twice, nothing happened. Not all papers are on Orcid, so that doesn’t help.

I think it’ll be hard growing a discussion platform, if there’s barriers of entry like that to even populate your profile.

phreeza

4 months ago

[-]

How would you propose making claiming possible without the risk of hijacking/misrepresentation?

supriyo-biswas

4 months ago

[-]

The only way I see this working is for paper authors to include their public keys in the paper; preferably as metadata and have them produce a signed message using their private key which allows them to claim the paper.

While the grandparent is understandably disappointed with the current implementation, relying on emails was always doomed from the start.

4 months ago

[-]

Given that the paper would have be changed regardless, including the full email address is a relatively easy solution. ORCID is probably easier than requiring public keys and a lot of journals already require them.

westurner

4 months ago

[-]

W3D Decentralized Identifiers are designed for this use case.

westurner

4 months ago

[-]

Decentralized identifier: https://en.wikipedia.org/wiki/Decentralized_identifier

W3C TR did-core: "DID Decentralized Identifiers 1.0": https://www.w3.org/TR/did-core/

W3C TR did-use-cases: "Use Cases and Requirements for Decentralized Identifiers" https://www.w3.org/TR/did-use-cases/

"Email addresses are not good 'permanent' identifiers for accounts" (2024) https://news.ycombinator.com/item?id=38823817#38831952

4 months ago

[-]

I'm sure that would work, but most researchers already have an ORCID are required to provide it other places anyway.

westurner

4 months ago

[-]

In comparison to DIDs, ORCIDs aren't sk/pk pairs that can be used to cryptographically sign.

A person can generate (and optionally register) additional DIDs if they please.

A person can request additional ORCIDs if they please

lucioperca

4 months ago

[-]

Probably it would be better to merge the project with Arxiv and tie it to the accounts there.

w-m

4 months ago

[-]

The data on which authors are part of which arxiv papers is already in the arXiv database, and in Google Scholar, and in other libraries. I appreciate that it's not an easy task to get that as a third party. But the burden should be on the operators of alphaxiv to figure out a solution for this platform to take off, not for me as a user?

phreeza

4 months ago

[-]

Yea I agree it shouldn't be on you as a customer, was more asking out of curiosity.

I don't think Google scholar has this fully solved either, I've seen many misattributed papers there.

riedel

4 months ago

[-]

The claiming was 'solved' and ORCID, which both basically do no checking at all. It's just a yes/no clicking for fuzzy matched author name lists. So I guess it is enough until there is a dispute. If you are important enough to be the target of trolls than you are in a league beyond most research platforms.

xyst

4 months ago

[-]

There should be an equivalent of S/MIME for researchers if e-mail is not accessible.

4 months ago

[-]

Perhaps by linking to their actual arxiv id?

4 months ago

[-]

Thanks for reaching out, I am one of the students working on this. We are adding google scholar support soon. If your paper isn't on Scholar or ORCID, you will need to submit a claim that our team reviews. There isn't really any other option, arXiv doesn't allow us to view the author's submission email automatically (although we are in the process of becoming an arXiv labs project soon).

auggierose

4 months ago

[-]

Upload a new version of your paper on arxiv, this time with an email address that works.

4 months ago

[-]

Why should they need to? Their email address did work at the time of publication.

auggierose

4 months ago

[-]

They don't have to. But then they cannot claim the paper.

It is a good idea in general to make sure that your papers contain up-to-date contact information. One way of doing this is to use an orc-id.

4 months ago

[-]

I’ve never understood why we need emails in papers.

Who sends emails to paper authors? How often do they respond? How fast do the email addresses go out of date? I lost access to my email address included in most my papers within 2 years of publication.

I see little to no value to have it included in the paper.

Maken

4 months ago

[-]

I do email paper authors and I do respond to requests and inquiries about my own papers. Even if you don't work at the same institution any longer, most universities let you redirect your email for many years after you left.

Also, I don't think we are yet at the point when human2human communication is not possible.

4 months ago

[-]

You don’t need emails in archival PDFs for human-to-human communication.

CamperBob2

4 months ago

[-]

Who sends emails to paper authors?

I do, when I'd like to read a paper that's locked behind a paywall and not available on sci-hub. Authors of scientific papers are much like any other authors... they want to be read. The more enlightened among them understand that obscurity is a problem rather than a perk. They also tend to appreciate engagement in the form of follow-up questions (at least from people who actually read the paper.)

Obviously it's not a major concern on arxiv, but in a larger historical sense, this type of communication was a key original application of email.

4 months ago

[-]

If an author wants to be read then they will keep the preprint PDFs on their website (along with their current email address). An added benefit is that Google Scholar indexes and links directly to the PDFs instead of the publisher website.

xyst

4 months ago

[-]

> Who sends emails to paper authors?

I do when the paper is not easily available or the publisher charges some outrageous fee (have seen $50 for a paper in the past).

Authors typically despise the publishers and are happy to share their work to anybody interested.

4 months ago

[-]

For sure. That is why I keep the preprint PDFs on my website (along with my current email address).

4 months ago

[-]

So you've put a fake email address on your papers? As in, one that you can't receive from? Why?

4 months ago

[-]

> So you've put a fake email address on your papers?

I think you're failing to understand the basics of the problem, and even the whole problem domain.

Email addresses are not created/maintained for life. You can have an email address, them have your org change name and email provider switch, and not to mention that researchers leave research institutions and thus lose access to their accounts.

You have multiple scenarios where papers can be published with authors using email addresses which they lose access to.

gexaha

4 months ago

[-]

> You have multiple scenarios where papers can be published with authors using email addresses which they lose access to.

Btw, why is it considered normal? I think it would be much better to mention an e-mail, to which you will have (more-or-less) permament access.

MereInterest

4 months ago

[-]

Here’s an example. I have a firstname.lastname@gmail.com address, which was intended to be permanent. Google turned on two-factor authentication, despite not having a second form of authentication available. Instead, they required the recovery address for 2FA. The recovery address was another Gmail address, which I haven’t used since 2010, and which also had 2FA turned on using its recovery address. That was an SBCGlobal address, a company which has long since been purchased by AT&T, and the email address is entirely defunct.

I place the blame here entirely on Google for misusing forms of identification. Two-factor authentication is having two locks on the same door, where recovery addresses are having two doors with separate locks. Using a recovery address for 2FA is absurd, and caused me to be locked out of my permanent email address.

epanchin

4 months ago

[-]

“I place the blame on Google because I didn’t update my recovery address to one that worked”

MereInterest

4 months ago

[-]

First, recovery addresses are for recovery when access has been lost. They are an alternate method of entry when the primary method of entry has been lost. They are NOT an extra method of validation to be used for the primary method of entry.

When Google switched from offering 2FA to requiring 2FA, it would have been acceptable for them to require a second form of authentication to be added on the next log-in. It is not acceptable for Google to pretend that they have a second form of authentication when they do not.

Second, up until the moment it was needed, I had access to my recovery address. Google locked me out of my primary address and my recovery address simultaneously.

QuadmasterXLII

4 months ago

[-]

Did you notice that the issue was that O0P had failed to update the recovery address of their recovery address, and google removed access to both the main email and the recovery email at the same time?

4 months ago

[-]

> Btw, why is it considered normal?

What leads you to believe it isn't normal? I mean, do you have an eternal email address? Have you ever switched jobs?

Most papers are authored/co-authored by graduate students. Do you think all of them will hold onto their institutional address after they graduate? A big chunk of them will not even continue in the field.

znpy

4 months ago

[-]

There’s nothing permanent in life.

Dumb example: you might have published a paper while working at a company, but years later the company went bankrupt and ceased to operate. Now somebody else is owning the domains and they will not make you the favour to give you an email address.

Notable example: Sun Microsystems. But there are many more, of course.

Or you just moved from one university to another. Or you published while on grad school and then moved somewhere else.

atoav

4 months ago

[-]

Why would you expect any institution to support all email addresses of their ex-employees ad infinitum?

This would be a security nightmare for them. It is pretty normal for universities to have some sort of identity managmemt system that automatically provisions emails when you are employed there and deprovision them once you are gone.

acka

4 months ago

[-]

Why not have a system where students and staff have actual email inboxes but alumni have their email forwarded?

Most universities use a portal of some sort for easy access to personal information and preferences anyway, so it shouldn't be too difficult to limit access for alumni to only allow them to change a few personal details like name / address / phone number and the like, plus email forwarding settings. I think the extra cost is negligible compared to what universities already spend on alumni like newsletters, conferences, dinners, etc.

oefrha

4 months ago

[-]

If I run a university IT system I certainly don’t want someone who possibly attended a program thirty years ago walking around with an apparent affiliation with my institution. I find my institutions’ policies of (IIRC) one year forwarding + permanent alumni email pretty reasonable.

Additionally, making people who want to cold email work a little to acquire the current email address is actually a good thing, especially if they want to talk about something years old. I’ve generally had a lot more pleasant and engaging correspondence with people who worked out my email (say from a side project I develop pseudonymously) than ones who directly lifted my email from my professional profiles. So, expiring emails in papers generally isn’t a real problem anyway, and it’s basically never a hurdle if your target is still in academic circles. It only becomes a problem in this specific context of automated authentication (based on something not intended for that purpose).

blackenedgem

4 months ago

[-]

An awful lot of free student access programs revolve around the uni email address being accredited. Foe example Jetbrains will give you a full version of their products if you register with a uni email, then require you to verify it yearly.

If you forward emails automatically then you'd lose this accreditation. I suppose the solution would be an accreditatiom domain that forwards to your uni address only, but that's extra work now.

atoav

4 months ago

[-]

I can't answer for everybody, but my (German) university is prohibited from doing so by law. We are state employees and as such our university needs a comtract with the people runnimg services that process our (or our students) data.

Obviously our university isn't gonna make a 10k€/month contract just because some prof wants their mail forwarded to gmail. Especially not if they are not working here anymore.

msteffen

4 months ago

[-]

That is just not always possible. An example that should be familiar to HN: I worked for a period at startup, and used my email at that startup (my only work email at the time, as that’s where I was working!). Then the startup ran out of money money and was sold. Hence the email no longer worked.

Should I have waited until the startup had more revenue? We were profitable at the time (we were B2B and the layoffs did us in)

dleeftink

4 months ago

[-]

Security and affiliation purposes mostly.

limit499karma

4 months ago

[-]

Conflating email addresses with identity in the digital age is a global techical debt.

4 months ago

[-]

I infer that you interpreted this question as an attack, or at least some sort of criticism. None was meant, I really just wanted to know if the email adress as written in the document was deliberately invalid or not.

creer

4 months ago

[-]

You were declaring the address to be "fake". Presumed facts not in evidence.

4 months ago

[-]

I wasn't declaring anything! It was a question, which is why it ended with a question mark. It's a totally standard construction in English, and would probably include a rising tone if spoken.

I cannot understand how what's written there could have been confidently construed as a statement.

fragmede

4 months ago

[-]

because you used the word fake in an accusatory tone. that seems to not be what you intended, but that seems more on you to word it better than to expect all readers to interpret your words differently.

qwertox

4 months ago

[-]

> Email addresses are not created/maintained for life

Then don't pretend that it is an email address.

I mean, it's true that email addresses are not guaranteed to be assigned for life, but putting a fake email address on a paper is misleading.

4 months ago

[-]

Let's say that John Smith at XYZ Corp has authored a paper. The company obviously wants recognition and so they use their corporate email address "jsmith@xyz.com".

John has since moved on and is earning more at ABC Corp instead. XYZ Corp has duly reclaimed John's old email address, and John cannot receive emails at said address any longer.

This is the situation the OP is in. It was never a "fake email address". They did not literally type "first.last@org", that was an example suitable for using in their comment.

[edit: I'm actually wrong with that last statement, as it turns out. While it wasn't a fake email address, the situation is slightly more nuanced in that OP actually did say "{first}.{last}@hhi.fraunhofer.de" in the paper, as there were multiple authors who all had the same email address format - see https://news.ycombinator.com/item?id=41479618. I still think this is a valid method, though, and it's certainly not fake. Besides, the problem I outlined sounds like it probably remains an issue even if it's not the exact problem OP is experiencing.]

qwertox

4 months ago

[-]

Ok, so they used a template on the paper, namely "{first}.{last}@hhi.fraunhofer.de", while the email addresses, if the names are applied to the template, do in fact yield valid email addresses.

It sounded as if they were using "john.doe@hhi.fraunhofer.de" while in reality it was an invalid email address ("because there’s no match with the email address"), that he would have tried to claim co-authorship via his "real" address, which might be something like "j.doe2@hhi.fraunhofer.de" (but luckily is not).

It's all clear now. Thank you for your explanation.

4 months ago

[-]

This is what I was asking about and I thank you and GP for clarifying the situation. There also send to be an unnecessary flamewar about the impermanence of email addresses generally, that's an unfortunate accident.

4 months ago

[-]

> Then don't pretend that it is an email address.

I think you don't know what a email address is, and how they are used.

> (...) but putting a fake email address(...)

This nonsense of "fake email address" was only brought up as a baseless accusation. There is zero substance to it, and it's been used as a red herring in this discussion.

Focus on the problem: do you expect any and all email addresses you publish somewhere years ago to continue to work?

[0] https://news.ycombinator.com/newsguidelines.html

qwertox

4 months ago

[-]

> [...] you won’t let me claim some of my preprints, because there’s no match with the email address. Which there can’t be, as we’re only listing a generic first.last@org addresses in the papers.

I understood it this way: org is not handing out first.last@org to the employee, but using an email format in order to clarify that "first last" is working at org and collaborated on the paper not in private, but as an employee.

He might have last.f@org gotten assigned as a valid email address from the org, but that one is not being used on the paper, while first.last@org is invalid.

> I think you don't know what a email address is, and how they are used.

You should know that this kind of comment should not be made on HN, see the guidelines [0] ("Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.").

> do you expect any and all email addresses you publish somewhere years ago to continue to work?

No. But that is irrelevant to this conversation.

crvdgc

4 months ago

[-]

Preprints are not required to be fully typeset and publishable. In some cases, literally "first.last@org" is used as a placeholder for email addresses (to be replaced in typesetting).

This is more like a mismatch between "fully edited open-acess papers" and "trying to use arxiv preprints as an approximation of the former".

4 months ago

[-]

For the record, at least in my field Arxiv is where the action happens and journals are an afterthought. I don't put placeholders for contact details in my preprints because 1) the adresses likely won't change between drafts and 2) lots of readers are going to be reading that version so I want them to have access to the real info.

Of course most of that is moot for professional scientists because you likely know (or at least be able to find out about) the authors already. For example some papers have old non-working email addresses for the authors who have since moved institution. It's not a problem, since I'll just look them up by name if necessary and usetheir current email.

aragilar

4 months ago

[-]

No, they used the email from the institute they worked at when the produced the paper.

They're no longer at that institute, and that email no longer exists (while some institutions give some leeway, I know of at least one major university which removes them the day the contract ends).

This is a common problem if you're providing services to academics and you've tied yourself to using emails as identifiers.

4 months ago

[-]

Does not have to be fake or anything. You move from one institution to the other and cannot maintain it forever anyway.

tc4v

4 months ago

[-]

I know you don't have a lifetime access to institutional email adress, but using a fake address is so counterproductive. You're only going to claim the paper once, and yuh ou should do it while you have access to your email. Then you update your account eith a new address.

4 months ago

[-]

Let's say that John Smith at XYZ Corp has authored a paper. The company obviously wants recognition and so they use their corporate email address "jsmith@xyz.com".

John has since moved on and is earning more at ABC Corp instead. XYZ Corp has duly reclaimed John's old email address, and John cannot receive emails at said address any longer.

This is the situation the OP is in. It was never a "fake email address". They did not literally type "first.last@org", that was an example suitable for using in their comment.

w-m

4 months ago

[-]

OP here, what I'm actually using is "{first}.{last}@hhi.fraunhofer.de" (https://arxiv.org/pdf/2312.13299). I see how my earlier comment was confusing.

In our case it's for saving space in the paper, and also for reducing spam. This small change may now seem silly in the age of LLMs, but the papers that have full email addresses in them get a considerable amount of fake conference and journal participation emails, which is annoying.

4 months ago

[-]

Oh, I see - the situation's more nuanced than I thought, then. My apologies.

I still think this is valid (and certainly not the fake email address that people are calling it), but yeah, it's not what I thought it was.

4 months ago

[-]

Nice idea but I dislike the implementation. Honestly, I very much like OpenReview and question why we don't use that?

OpenReview has: - preprints - versioning - reviewing, with threads and latex support - ability to link websites, repos, datasets, etc - bibtex generation

But it's not as popular as arxiv, though very popular in review (conferences often do not use full features)

One thing I dislike about this is that it is open to all. Arxiv doesn't have a hard filter (you just need someone with an account to vet you, which stakes their reputation), but the existence of a filter is critical.

I don't want a place to engage with the public. We have Reddit, hacker news, Twitter, mastodon, and countless other places. I want a place for academics to talk to academics [0], researchers to researchers. There is a serious lack of spaces where serious low level in-the-weeds discussions can happen. Even fucking GitHub and hugging face are swamped by people asking dumb questions on research projects like how to install pytorch, fine tune a model, or where the source code is.

I'm really happy to include a lot of people, but I think we also need spaces where experts know they're talking to peers. Without that you have to assume you're not talking to peers because they outnumber us a few thousand to one. So that doesn't encourage engaging in research or technical discussions, it encourages talking down to peers and misinterpreting.

[0] a degree is not what makes you "an academic"

4 months ago

[-]

One of the co-creators of this site. A lot of great suggestions I'm reading so far, a lot of them are currently in the works (zooming in/out, infra issues for slow loading times on some papers, google scholar claiming papers).

For some more context, we are a group of 3 students with a background in AI research, and this site was initially built as an internal tool to discuss ai papers at Stanford. We've been dealing with a lot of growing pains/infra issues over the past month that we are in the process of hashing out. From there we would love to make a more concerted effort to share this in areas outside of AI. Happy to hear your thoughts here, or more formally via contact@alphaxiv.org.

I do want to highlight, our site has a team of reviewers/moderators and having folks from different subject areas is critical to making sure the site doesn't end up a cesspool, apply here: https://docs.google.com/forms/d/11ve-4cL0axTDcqnHF66zX6greFV....

musicale

4 months ago

[-]

Moderation is typically the thing that doesn't scale. I am not sure it's a solvable problem (see reddit, stackoverflow, youtube, quora, etc. for negative examples and anti-patterns.) Often sites start out great and then degrade when they become popular.

My main recommendation was going to be organizational: to cooperate and work with arXiv itself, rather than risking a potentially adversarial or competitive relationship.

Now that I think about it however, I am convinced by a peer comment that was basically "leave arxiv the way it is and don't mess it up." So carry on then.

karmakurtisaani

4 months ago

[-]

I remember seeing this idea some years ago. I think it was called qrxiv.org or something like that, but can't find it anymore. I hope this one has better luck, getting the users in the fragmented space of preprints can be a challenge.

fuglede_

4 months ago

[-]

There's also https://scirate.com/ which occasionally has active discussions but, at least in my field, there's far from critical mass, and discussions only happen when someone kick starts and advertises a thread.

jessriedel

4 months ago

[-]

I believe the most active field is quantum information, which has enough activity such that paper get dozens of upvotes, but the conversation level is basically as you describe.

rsp1984

4 months ago

[-]

I launched gotit.pub [1] last year. It's very much the same thing.

[1] https://gotit.pub

4 months ago

[-]

Wow, how is this not getting enough attention when it is almost the same thing?!

forgotpwd16

4 months ago

[-]

Because so are PubPeer and SciRate which exist for much longer. (And neither those are getting much attention either.)

karencarits

4 months ago

[-]

There is also https://pubpeer.com/

I worry that fragmentation of this space might not be beneficial, so it would be nice if these services could collaborate in some way, perhaps using activitypub or something

levocardia

4 months ago

[-]

Agreed, pubpeer is already a very widely used platform in health and biology research. The PubPeer chrome extension is a must-have, in my mind, as it alerts you when a paper you find (even linked on some other website) has comments or has been retracted.

barmstrong

4 months ago

[-]

Same with https://www.researchhub.com/

cgshep

4 months ago

[-]

> Use HTML rather then PDF.

Tenured prof here. Academics don't use HTML, despite its obvious advantages. The incentive system is deeply broken. No big-name journal or conference will accept a well-formatted HTML over their proprietary Latex/Word format. Latex to PDF converters generally suck.

[1] https://info.arxiv.org/about/accessible_HTML.html

elashri

4 months ago

[-]

Arxiv already provide a HTML version of the articles [1]. The authors does not have to provide HTML version, it is converted by arxiv. i.e [2]

[2] https://arxiv.org/html/2409.00838v1

sundarurfriend

4 months ago

[-]

I wish for either:

1) Zoom buttons just for the paper - the article text is often tiny, and zooming in with the browser messes up the page layout and makes the page practically unusable.

2) A simple direct button to download the PDF directly. This would alleviate the zoom problem since I can view it in my local PDF reader with the best settings for me. Having to go to arxiv to download the PDF for every paper would be a nuisance over time though, so a button in the top bar would make the experience a lot better.

AlexDragusin

4 months ago

[-]

For me it always downloads the PDF, because I have disabled the View PDF in browser option (Toggle ON, on Edge: "Always download PDF files"), in browser settings, consider this as a solution.

Edit: The above is applicable to arxiv itself, I got confused, the alphaxiv.org opens the PDF in a framed way with no option to download, indeed.

4 months ago

[-]

Zoom is in the works! We are adding this in the coming week!

tinyhouse

4 months ago

[-]

We obviously had this for many years with OpenReview, which has a different purpose, but having something for every paper is indeed needed. I have trouble opening some links, guessing it's still under heavy development. Looks nice!

codegladiator

4 months ago

[-]

This is great. Already loving the discussions/comments I see there.

cs702

4 months ago

[-]

How are the creators going to prevent gaming?

I ask because every system I've ever come across for discussing and ranking content without human moderation is always, sooner or later, gamed.

4 months ago

[-]

We have a team of enthusiastic reviewers/moderators in a couple sub-categories. We plan on growing this team out as the site continues to grow. If you'd like to be a reviewer: https://docs.google.com/forms/d/11ve-4cL0axTDcqnHF66zX6greFV...

https://info.arxiv.org/about/people/editorial_advisory_counc...

teleforce

4 months ago

[-]

It a shame that arXiv now it is not what it used to be, a very useful pre-print before actual publication. It looks like it is now a pseudo journal masquerading as a pre-print server since apparently arXiv has editorial and review teams that reject papers based on their 'expertises' [1].

Perhaps they think they are reputable now just because Perelman's proof papers were published there and they want to maintain their 'reputation' [2]. The irony is that Perelman would most probably not publish in arXiv if it is in their current pseudo journal status.

[1] Editorial Advisory Council:

[2] Reclusive mathematician rejected honors for solving 100-year-old math problem, but he relied on Cornell's arXiv to publish:

https://news.cornell.edu/stories/2006/09/proof-100-year-old-...

gr__or

4 months ago

[-]

I am very non-eager to help any further platform grow that has not been built on-top of sth like atproto (the BlueSky protocol), to prevent silos and the monopolist landlords that come with those.

Great idea though, would love to use sth like this, if it existed on a federalized protocol.

Nuzzerino

4 months ago

[-]

Can’t please everyone.

scarlehoff

4 months ago

[-]

I believe this site is missing a very important thing, direct links to the different categories with a list of papers. This is at least how I (and I believe many others) browse arXiv. I open it up in the morning, scroll through a few categories and open a few papers that look interesting to me.

I could see myself using alphaxiv for that, and then, if there's a comment section, I might even read it, and, who knows, leave a comment. But there's no way I'm going to be changing the address or going to some other site to search for papers just to see whether there are some comments.

ps: I see the extension adds a "discussion" link to arxiv, it is a pity that it is only available for Chrome.

eigenket

4 months ago

[-]

It sounds like what you want is scirate. As far as I understand from this post this new thing is just scirate but lacking the interface you're talking about here.

scarlehoff

4 months ago

[-]

Indeed. Scirate (I didn't know about it) looks exactly like that.

Sadly, the last comments in HEP are more than 2 years old (which explains why I had never heard about it, it seems it never gained any traction)

forgotpwd16

4 months ago

[-]

Kinda related, Hypothesis (and Diigo iirc in past) has an extension/bookmarklet that can provide an annotation/comment overlay on any web page/pdf. Guess what is needed for arXiv discussion is this overlay but smarter, that is knows a paper pdf and web view are the same, and abstract page is connected to them.

chfritz

4 months ago

[-]

Why limit this to arxiv papers? Why not any paper published online, e.g., via https://bibbase.org? btw, very cool that you seem to have overcome the initial inertia of getting something like this going. The idea is not new, but it's a marketplace dynamic that is hard to bootstrap.

4 months ago

[-]

Great platform for invigorating research discussions! But seeing only AI based (or broadly CS based) research as featured papers is a bit discouraging. Perhaps there isn't enough critical mass for other topics yet.

john-titor

4 months ago

[-]

Tried to sign up with my corporate email (life sciences, 100k+ employees worldwide with a big research arm). Says the institution is not known to the service. What's the process to get it known?

4 months ago

[-]

Email me at contact@alphaxiv.org, I'll add it asap!

john-titor

4 months ago

[-]

Thanks a lot, will do!

hfhan

4 months ago

[-]

I’d love to see this being adopted by research community. Right now these discussions are spread out EVERYWHERE, twitter, LinkedIn, OpenReview, etc.

bawolff

4 months ago

[-]

In the search field, it would be kind of cool to list how many comments each paper has - e.g. if you want to find the most discussed papers on some topic.

data_maan

4 months ago

[-]

What I don't like about this is that they had to build a separate system.

Why wasn't it possible to contact arXiv and do this in collaboration with them?

cma

4 months ago

[-]

Seems like it could just fill the role of the exploitative publishers if it takes off. Community does the work (scientists volunteer peer review), site gets the profit and locks everyone in due to network effects of being the hub for discussion. Eventually starts charging for boosting your paper with guaranteed return in citations etc. I'm just assuming it is venture backed due to having a team of advisors and stuff and about page being LinkedIn but sorry if I'm off on that and it is nonprofit or something.

data_maan

4 months ago

[-]

This is precisely my worry!

While it says "a Stanford project" at the top, this could mean anything. It could in particular allude to the fact that Stanford professors are advisor.

The advisors are well-known professors, but some, like Sebastian Thrun, have an entrepreneurial background, so probably a company sits behind this.

For me, the entire project gets a NO CONFIDENCE vote because I can't tell who they are, who pays for moderation (which they claim to do etc.).

I'd feel much more at ease if this was done by arXiv, or at least endorsed by arXiv.

llmfan

4 months ago

[-]

hnews tries to say one positive thing challenge impossible

i always love any idea for curating high iq internet community

BaculumMeumEst

4 months ago

[-]

I think curating an internet community that is open-minded and participates in good faith is extremely hard as well. Not sure which is harder.

noobermin

4 months ago

[-]

There are only 4 comments so far. It seems a bit early to judge the comment section.

stevage

4 months ago

[-]

Yours was the first negative comment I saw...?

llmfan

4 months ago

[-]

That was the intention behind my comment! No need to thank me tho.

tuxguy

4 months ago

[-]

awesome honking idea ! please add "spaces" for biorxiv and medrxiv too !

fileyfood500

4 months ago

[-]

It would be amazing if bioRxiv and medRxiv were included! I was curious how bioRxiv compared in usage/size to arXiv, and dug through the stats on each site. Both sites report paper download stats, and arXiv is about 12x larger (~50+ million downloads/month vs 4 million downloads/month for bioRxiv).

Interestingly, the sites have grown at a similar rate. Going back to 2020, arXiv had 25+ million downloads/month and bioRxiv had ~2.1 million.

[1] https://arxiv.org/stats/monthly_downloads [2] https://api.biorxiv.org/reports/usage

[1] https://arxiv.org/stats/monthly_submissions [2] https://api.biorxiv.org/reports/content_summary [3] https://api.medrxiv.org/reports/content_summary

fileyfood500

4 months ago

[-]

Adding submissions stats:

1. arXiv, ~21k/month

2. bioRxiv, ~5k/month

3. medRxiv, ~1.3k/month

It would be interesting to see usage of other _xiv's and other pre-print publishers!

tintor

4 months ago

[-]

Too much annoying visual clutter on the discussion page, unlike Hacker News.

eigenket

4 months ago

[-]

What's the main thing that this new website adds over scirate?

foven

4 months ago

[-]

I admit to not really being familiar with either, but it seems that this allows you to display comments alongside the paper in the browser, which is a very nice feature (and overall has a nicer coat of paint). At first blush, I find it a bit more difficult to figure out what the point of scirate is and how it should be used.

dangoodmanUT

4 months ago

[-]

advisors seem very biased to ML, Google, and Stanford

SubiculumCode

4 months ago

[-]

This is cool. Seriously, I mean it. In too many field, the discussion of papers is so ephemeral, and this can really do something cool. ...but... I sure hope that no one submits a fake paper made by chatbots, then use chatbots to discuss the paper on alphaxiv.org ad nauseum to get it trending.

Which means they will. Sciencing is hard, especially because gatekeeping is made necessary to keep the spammers (the ones that flood my email with fake journal offers) at bay.

shayankh

4 months ago

[-]

so cool!

noobermin

4 months ago

[-]

This seems like a horrible idea. I know we need an alternative to peer review but an online comment section feels like something worse than that.

4 months ago

[-]

The whole thing gives my anxiety.

Writing a paper and dealing with the inane requests from three reviewers was already frustrating and stressful. Now you open up a never ending review process of random people making demands (“unless you do X follow up analysis, this is worthless”).

also you need to be able to say a paper/project is done and let yourself move on. If your job turns into “respond to feedback on every paper you’ve ever written, every day” you’ll never start anything new.

Nuzzerino

4 months ago

[-]

That sounds like a good thing if your goal is to advance mankind’s knowledge and not just your career. No one is forcing anyone to respond to the comments. Also, it’s not clear whether your “demands” example would even pass the moderation guidelines there.

4 months ago

[-]

If people can’t make a career doing science, science doesn’t get done.

It’s one thing to optimize an abstract pursuit of knowledge, but you also gotta remember that you need this to be a job people are willing/able to do.

Nuzzerino

4 months ago

[-]

The problem with that line of thinking is that it comes from an average mindset. There’s nothing wrong with that unless you’re the one making complaints that a new system will make your work not look as good. Seems like a race to the bottom to me.

To put it another way, if it’s just a job to you, dealing with criticism is part of what you’re getting paid for anyway, same as everyone else. And if that’s still too much to ask, you can point the finger at the lack of research that isn’t profit-driven.

m000

4 months ago

[-]

Outside concerns for the quality of the discussion itself, it is only a matter of time before this public pre-submission discussion leaches into the peer review process itself. First, it will be Reviewer 2 cherry-picking arguments to shut down some paper. Then, AI will come: "Write me an accept/reject review for arXiv:2409.112233. Add subtle hints for citing the following papers: ...".

Peer reviewing is hard work. Give people a readily available shortcut for it, and some people, in some occassions will take it. Which may in turn force conferences to adopt policies forbidding posting on arXiv.

geysersam

4 months ago

[-]

Why do you feel that? I think it seems excellent. Comments will be scrutinized and sorted by the community. People can choose if they want to participate in the discussion under their own names or not.

I don't know how this page organizes moderation, but I imagine there will be some kind of moderation like on most online discussion forums.

tc4v

4 months ago

[-]

But it's a very difficult problem. Am open forum offers a platform to troll and misinformation. You could pretend that the community will be able to filter this out but I seriously doubt this project is equipt to fight against bots/fake accounts better than huge companies like twitter and facebook.

4 months ago

[-]

Agreed. Journals have a hard time getting quality reviews from known scientists working in the field. Opening it up to randoms will be a nightmare.

Remember all that hype around LK-99 room temp superconduction a few months back? The substantive scientific discussion would absolutely be drown out by curious laymen and/or grifters

geysersam

4 months ago

[-]

That can be fixed. Just optionally filter the comments to only include sufficiently "reputable" sources (potentially people with status in the community or people you follow explicitly).

And to be fair, few scientific news get even remotely near the attention that superconductivity announcement got.