- The frontpage should directly show the list of papers, like with HN. You shouldn't have to click on "trending" first. (When you are logged in, you see a list of featured papers on the homepage, which isn't as engaging as the "trending" page. Again, compare HN: Same homepage whether you're logged in or not.)
- Ranking shouldn't be based on comment activity, which ranks controversial papers, rather papers should be voted on like comments.
- It's slightly confusing that usernames allow spaces. It will also make it harder to implement some kind of @ functionality in the comments.
- Use HTML rather then PDF. Something that could be trivial with HTML, like clicking on an image to show a bigger version, requires you to awkwardly zoom in with PDF. With HTML, you would also have one column, which would fit better with the split paper/comments view.
The PDF is the original paper, as it appears on arXiv, so using PDF is natural.
In general academics prefer PDF to HTML. In part, this is just because our tooling produces PDFs, so this is easiest. But also, we tend to prefer that the formatting be semi-canonical, so that "the bottom of page 7" or "three lines after Theorem 1.2" are meaningful things to say and ask questions about.
That said, the arXiv is rolling out an experimental LaTeX-to-HTML converter for those who prefer HTML, for those who usually prefer PDF but may be just browsing on their phone at the time, or for those who have accessibility issues with PDFs. I just checked this out for one of my own papers; it is not perfect, but it is pretty good, especially given that I did absolutely nothing to ensure that our work would look good in this format:
https://arxiv.org/html/2404.00541v1
So it looks like we're converging towards having the best of both worlds.
The tooling producing PDF by default absolutely makes the preference for PDF justifiable. However, tooling is driven by usage - if more papers come with rendered HTML (e.g. through Pandoc if necessary), and people start preferring to consume HTML, then tooling support for HTML will improve.
> But also, we tend to prefer that the formatting be semi-canonical, so that "the bottom of page 7" or "three lines after Theorem 1.2" are meaningful things to say and ask questions about.
Couldn't you replace references like "the bottom of page 7" with others like "two sentences after theorem 1.2" that are layout-independent? This would also make it easier to rewrite parts of the paper without having to go back and fix all of your layout-dependent references when the layout shifts.
HTML has strong advantages for both paper and electronic reading, so I think it's worth making an effort to adopt.
When I print out a paper to take notes, the margins are usually too narrow for my note-taking, and I additionally have a preference for a narrow margin on one side and a wide margin on the other (on the same side, not alternating with page parity like a book), which virtually no paper has in its PDF representation. When I read a paper electronically, I want to eliminate pagination and read the entire thing as a single long page. Both of these things are significantly easier to do with HTML than LaTeX (and, in the case of the "eliminate pagination" case, I've never found a way to do it with LaTeX at all).
(also, in general, HTML is just far more flexible and accessible than PDF for most people to modify to suit their preferences - I think most on HN would agree with that)
I'm guessing there are few web pages of any significance which need to stay exactly the same for a long time. Here is one example which I've seen trotted out from time to time on HN:
https://www.dolekemp96.org/main.htm
This is clearly the exception. It seems that maintainers of web pages usually expect that they'll need to maintain and update them for as long as they want them to be accessible, and that's definitely not something I'd care to do for research papers.
I generally stick to PDF myself, but I do sometimes wish it would be more ergonomic to reflow a 2-column paper for reading on mobile on the go, for example. Also, ePub is easier to read in night mode than PDF recoloring, and seems easier to search through (try searching for a Greek letter in a PDF…).
EDIT: How is the math support in ePub though? Are people embedding KaTeX/MathJax or just relying on MathML, and how is the quality compared to TeX?
Yes, but I think such references are inherently harder to locate. Personally I try to just avoid making references to specific locations in the document and instead name anything that needs to be referenced (e.g. Figure 5, Theorem 3.2).
Just thinking about having to change layout-dependent references, every time I add two sentences to the introduction, gives me a migraine.
I never do anything like this in the paper itself, nor does anyone else that I'm aware of. I'm thinking of informal discussions, where I ask another mathematician about something specific in a paper.
The HTML version is seriously buggy; and the worst part is, a lot of those bugs take the form of silently dropping or hiding content. It's bad enough when half the paper is gone, because at least you notice that quickly, but it'll also do things like silently drop sections or figures, and you won't realize that until you hit a reference like 'as discussed in Section 3.1' and you wonder how you missed that. I filed like 25 bugs on their HTML pages, concentrating on the really big issues (minor typographic & styling issues are too legion to try to report), and AFAIK, not a single one has been fixed in a year+. Whatever resources they're devoting to it, it's apparently totally inadequate to the task.
But there is another problem: It takes too long to load on mobile and doesn't reflow. I thought mobile was one of the reasons people wanted HTML in the first place!
You can convert a lot of formulas into either Mathjax/Katex-style fonts or MathML, or even just HTML+Unicode. (I get a very long way with pure HTML+Unicode+CSS on Gwern.net, and didn't even have to write a TeX-to-HTML compiler - just a long LLM prompt: https://github.com/gwern/gwern.net/blob/master/build/latex2u... )
But that's missing the point. Who cares about all of the refinements like reflow or pretty equations, when you are routinely serving massively corrupted and silently incomplete HTML versions? I don't care how good the typography is in your book if it's missing 5% of pages at random and also doesn't have any page numbers or table of contents...
Some history: https://www.arxiv-vanity.com/
And I prefer this over discussions on 'X'.
We don't lack places that the public can engage with researchers and experts. What we do lack are places where researchers/experts can communicate with one another __and expect the other person to be a peer__. The bar to arxiv is (absurdly) low, and I think that's fine.
Not everything has to be for everyone.
My longer comment: https://news.ycombinator.com/item?id=41484123
I'm going to go crazy if I get more GitHub issues asking where the source code is or how to fine tune a model. My research project page is not a Google Search engine nor ChatGPT...
Hmm, this is an interesting point.
I don't think this is an inherently better approach, but maybe there should be an option for different ranking mechanisms. You could also rank by things like cite-frequency, cite-recency, "cite pagerank", etc.
Citations are probably not the best metric for discovery, but also this really just makes me wonder if papers are not the best thing for discovery. An academic produces ideas, not papers, those are just a side-effect. The path is something like:
* make a idea
* write short conference papers about it
* present it in conferences
* write journal papers about it
* maybe somebody writes a thesis about it
(Talking to people about it throughout).
If we want to discover ideas as they are being worked on, I guess we’d want some proxy that captures whether all that stuff is progressing, and if anybody has noticed…
Finding that proxy seems incredibly difficult, maybe impossible.
Agreed that it makes it more complicated though.
Regarding HTMl, our original site actually only supported HTML (because it was easier to build an annotator for an HTML page). the issue is that a good ~25% of these papers don't render properly which pisses off a lot of academics. Academics spend a lot of time making their papers look nice for PDF, so when someone comes along and refactors their entire paper in HTML, not everyone is a fan.
That being said, I do think long term HTML makes a lot of sense for papers. It allows researchers to embed videos and other content (think, robotics papers!). At some point we do want to incorporate HTML papers back into the site (perhaps as a toggle).
Did you bulk download the arxiv metadata, PDF and or LaTeX files?
I am trying to figure out what the required space is for just the most recent version of the PDF's.
I can find mentions of the total size in their S3 bucket but unclear if that also includes older versions of the PDF's.
I also wonder if the Kaggle dataset is kept up to date since it states merely 1.7M articles instead of 2.4 I read elsewhere.
Edit: I just found the answers to my question here: https://info.arxiv.org/help/bulk_data_s3.html
I disagree. There are numerous times where I have browsed the comments on a HN post where people haven't read the article and are just responding to the comment thread. The workflow for this seems a bit different in that a person would have already read a paper and wanted to read through existing discussions or respond to discussion. With that, having the search front and center would follow as the next steps for a person who read a paper and wanted to "search" for discussions related to that paper in particular.
HN is more an aimless browsing which is a bit different than researching a specific area or topic.
How about not ranking things at all? I don't feel like things like this should be a popularity/"like" contest and instead let the content of the paper/comments speak for themselves. Yes, there will be some chaff to sort through when reading, but humanity will manage.
Just sort things by updated/created/timestamp and all the content will be equal.
People can't read everything, and have rely on others to filter up the good stuff. If you read something random, based on no recommendation, it's charity work (the odds are extremely good that it is bad) and you should recommend that thing to other people if it turns out to be useful. Ultimately, that's the entire point of any of this design: if we don't care about any of the metadata on the papers, they could just be numbered text files at an ftp site.
The fewer things I have to read to find out they're shit, the longer life I have.
I say the opposite: put a lot of thought into how papers are organized and categorized, how comments on papers are organized and categorized, the means through which papers can be suggested to users who may be interested in them, and the methods by which users can inject their opinions and comment into those processes. Figure out how to thwart ways this process can be gamed.
Treat the content equally, don't force the content to be equal. Hacker News shouldn't just be the unfiltered new page.
Most articles are not interesting, most of the interesting ones are interesting only for a niche of a few researchers. The front page will be flowed by uninteresting stuff.
But don't we want people's attention drawn to controversial/conversation generating papers? The whole point of the platform is to drive conversation
I'm all for casting wide nets and making things available to everyone, but a little gate keeping is not bad (just don't gate keep by race, class, or those things). But I'm sorry, research is hard. There's a reason people spend decades researching things that at face value look trivial. Rabbit holes are everywhere and just because you don't know about them doesn't mean your opinion has equal weight.
We seriously lack areas where experts can talk to other experts.
Unfortunately for those of us pre-tenure, it's difficult to balance these as I'm sure you aware. We're evaluated by people who may have the best intentions, but don't work directly in our field. They then determine whether we keep our jobs. It's difficult not to consider prestige as a factor when you know those evaluating you will.
> Unfortunately, academics are still hindered by institutional inertia
As an ABD this has been a real pain point for me. Maybe I came into academia thinking what mattered most was the research. But now I'm the stereotypical PhD who passionately despises academia for its lack of being academic. I'm happy to have competition, but at the end of the day are we all not on the same team?How the hell did we create a system where it is the norm that an advisor does not read a thesis, to read papers, to mentor? For that to be the norm among a committee? When I've had issues with getting works getting through review (even when they have high citations due to arxiv) I don't understand why it's acceptable for a response to be "keep trying" instead of "here, I read the paper and reviewer responses, let me help"[0]. It seems inefficient that we throw students into the deep end and watch them sink or learn to swim. I think there'd be a lot fewer dejected PhD students if there was a stronger focus on academics, mentorship, and collaboration over churning out ̶w̶i̶d̶g̶e̶t̶s̶ papers.
I think what pisses me off the most is thinking that research significance and success can be measured __purely__ through metrics like citation counts, H-indices, i10's, awards, etc. I'm not saying those are useless, but that we can evaluate without looking at the content? (as you say, actual quality of work) It's like we learned about Goodhart's Law and decided it was a feature not a bug.
(I know this is not always the case and there are many amazing advisors, but I'd be impressed if someone didn't know this is happening at least somewhere within their department.)
[0] If it takes a village to raise a child, it takes a department to mint a PhD. These types of things should come from committees, not just advisors. Our annual meetings and review shouldn't just be going through the motions.
Yeah, but every pre-tenure or postdoc is like “I can’t fight the system right now, I need to publish enough to still have a job two years from now”
Your comment doesn't read like one from anyone with any relationship with academia. If you had, you'd know that the issue is not a vacuous "prestige" but funding being dependent on hard metrics such as impact factor, and in some cases with metrics being collected exclusively from a set of established peer-reviewed journals that must be whitelisted.
And ArXiv is not one of them.
This means that a big share of academia has their professional and future, as well as their institution's ability to raise funding, dependent on them publishing on a small set of non-open peer-reviewed journals.
Reading your post, you make it sound like anyone can just upload a random PDF to a random file server and call it a publication. That ain't it. If you fail to understand the problem, you certainly ain't the solution.
Your comment reads likewise.
He didn't say he publishes them exclusively on Arxiv. It's quite common for professors to post it there as well as submit to journals. Many (most?) journals allow for it - they don't insist the ones in arxiv be taken down - as long as they're posting preprints and not the final (copyrighted) version.
As an academic, you should also know that practices vary widely with discipline. As an example:
> dependent on them publishing on a small set of non-open peer-reviewed journals.
IIRC, NIH grants require publishing in open peer-reviewed journals.
Also, lots of disciplines are not heavily reliant on funding. In both universities I attended, the bulk of math professors did not even apply for grants! It's not required to get tenure (unlike engineering/physics). Also often true in some economics departments.
As an aside, your comment violates a number of HN guidelines.
> Your comment reads likewise.
FWIW I thought it read like reviewer 2, which makes me actually think they have a relationship with academia.The problem I have with their comment is that it rejects the critique by pointing to a different issue. As if there is a singular issue with academia that leads to the mess. So it comes off as a typical "reviewer 2" comment to me where the is more complaint and disagreement than critique.
FWIW, I think we in academia need to embrace the fuzziness and noise of evaluation. I think the issue is that by trying to make sense out of an extremely noisy process we placed too much value in the (still noisy) signals we can use. It is a problem to think that these are objective answers and deny the existence of Goodhart's Law (this is especially ironic in ML where any intro class discusses reward hacking). And in this, I think there's a strong coupling between cgshep's and chipdart's complaints.
As for publishing, I think we also lost sight of the main reason we publish: to communicate with our peers. Publishers played an important role since not even a century ago we could not make our works trivially available to our peers. But now the problem is information overload, not lack of information. And I think in part the review process is getting worse and worse each year as we place so little value on the act of reviewing, do not hold anyone accountable for a low quality review[0,1], do not hold ACs or Metas accountable for, and we have so many papers to review that I don't think we can expect high quality reviewing even if we actually incentivized it. I mean in ML we expect a few hundred ACs to oversee over ten thousand submissions?
My question is if we'll learn that the noisiness of the process and the randomness of rejection creates a negative feedback of papers. Where you "roll the dice" on the next conference. You resubmit without changes, as well as your new work (publish or perish baby). If we had quality reviewing at least this would push pressure for papers to get incrementally better instead of just being recycled. But recycling is a very efficient strategy right now and we've seen plenty of data to suggest it is.
[0] I understand the reasons for this. It is a tricky problem
[1] I'd actually argue we incentivize bad reviews. No one questions you when you reject a work and finding reasons to reject a work are far easier to accept one. There's always legitimate reasons to reject any work. Not to mention that the whole process is zero sum, since venue prestige is for some reason based on the percentage of papers rejected. As if there isn't even variance in year to year quality.
But putting your papers on the arXiv, as your parent said, doesn't mean you only put them on the arXiv. I put all my papers on the arXiv, but I also submit them for publication in journals that will help me make the case for funding and promotion.
Yes, academia has tried to quantify prestige via impact factor and peer-reviewed journals. Yes, lots of people (even in Academia) feel that the system is being gamed, with by the publishing houses that own the journals being a common scapegoat.
The system isn't broken, but it also keeps its integrity through some dynamic tension: a bit of criticism is a good thing.
> the issue is not a vacuous "prestige" but funding being dependent on hard metrics such as impact factor
These things are not in contention.There is no singular problem to be solved, which is why it is so difficult. No smoking gun.
> And ArXiv is not one of them.
ArXiv has a large impact on metrics and so called impact factor. But let's also not be delusional to the fact that a paper from a prestigious institution will always receive more citations (or any other metric) than an equal quality paper from a less prestigious institution. All our metrics can be hacked through publicity. > Reading your post
Reading yours, it sounds like you stand in the way of resolving issues in academia. Not because you don't have issues with it that you want to solve, but because you have already found the answer.Your comment reads like someone who has a relationship with academia.
When I was doing peer reviews, it would often take a day or more to read a paper, think it through, and then write up something thoughtful and constructive.
If you introduce a mechanism to delay comments (eg, holding all messages for 24-72 hours before publishing or only releasing new comments on Monday mornings) it would:
- encourage commenters to write longer thoughtful responses rather than short quick comment threads
- reduce back and forth flame wars
- ease the burden on moderators and give them time to do batches of work
- see if multiple commenters come to the same conclusions/critiques to minimize bandwagon effects
I think it’ll be hard growing a discussion platform, if there’s barriers of entry like that to even populate your profile.
While the grandparent is understandably disappointed with the current implementation, relying on emails was always doomed from the start.
W3C TR did-core: "DID Decentralized Identifiers 1.0": https://www.w3.org/TR/did-core/
W3C TR did-use-cases: "Use Cases and Requirements for Decentralized Identifiers" https://www.w3.org/TR/did-use-cases/
"Email addresses are not good 'permanent' identifiers for accounts" (2024) https://news.ycombinator.com/item?id=38823817#38831952
A person can generate (and optionally register) additional DIDs if they please.
A person can request additional ORCIDs if they please
I don't think Google scholar has this fully solved either, I've seen many misattributed papers there.
It is a good idea in general to make sure that your papers contain up-to-date contact information. One way of doing this is to use an orc-id.
Who sends emails to paper authors? How often do they respond? How fast do the email addresses go out of date? I lost access to my email address included in most my papers within 2 years of publication.
I see little to no value to have it included in the paper.
Also, I don't think we are yet at the point when human2human communication is not possible.
I do, when I'd like to read a paper that's locked behind a paywall and not available on sci-hub. Authors of scientific papers are much like any other authors... they want to be read. The more enlightened among them understand that obscurity is a problem rather than a perk. They also tend to appreciate engagement in the form of follow-up questions (at least from people who actually read the paper.)
Obviously it's not a major concern on arxiv, but in a larger historical sense, this type of communication was a key original application of email.
I do when the paper is not easily available or the publisher charges some outrageous fee (have seen $50 for a paper in the past).
Authors typically despise the publishers and are happy to share their work to anybody interested.
I think you're failing to understand the basics of the problem, and even the whole problem domain.
Email addresses are not created/maintained for life. You can have an email address, them have your org change name and email provider switch, and not to mention that researchers leave research institutions and thus lose access to their accounts.
You have multiple scenarios where papers can be published with authors using email addresses which they lose access to.
Btw, why is it considered normal? I think it would be much better to mention an e-mail, to which you will have (more-or-less) permament access.
I place the blame here entirely on Google for misusing forms of identification. Two-factor authentication is having two locks on the same door, where recovery addresses are having two doors with separate locks. Using a recovery address for 2FA is absurd, and caused me to be locked out of my permanent email address.
When Google switched from offering 2FA to requiring 2FA, it would have been acceptable for them to require a second form of authentication to be added on the next log-in. It is not acceptable for Google to pretend that they have a second form of authentication when they do not.
Second, up until the moment it was needed, I had access to my recovery address. Google locked me out of my primary address and my recovery address simultaneously.
What leads you to believe it isn't normal? I mean, do you have an eternal email address? Have you ever switched jobs?
Most papers are authored/co-authored by graduate students. Do you think all of them will hold onto their institutional address after they graduate? A big chunk of them will not even continue in the field.
Dumb example: you might have published a paper while working at a company, but years later the company went bankrupt and ceased to operate. Now somebody else is owning the domains and they will not make you the favour to give you an email address.
Notable example: Sun Microsystems. But there are many more, of course.
Or you just moved from one university to another. Or you published while on grad school and then moved somewhere else.
This would be a security nightmare for them. It is pretty normal for universities to have some sort of identity managmemt system that automatically provisions emails when you are employed there and deprovision them once you are gone.
Most universities use a portal of some sort for easy access to personal information and preferences anyway, so it shouldn't be too difficult to limit access for alumni to only allow them to change a few personal details like name / address / phone number and the like, plus email forwarding settings. I think the extra cost is negligible compared to what universities already spend on alumni like newsletters, conferences, dinners, etc.
Additionally, making people who want to cold email work a little to acquire the current email address is actually a good thing, especially if they want to talk about something years old. I’ve generally had a lot more pleasant and engaging correspondence with people who worked out my email (say from a side project I develop pseudonymously) than ones who directly lifted my email from my professional profiles. So, expiring emails in papers generally isn’t a real problem anyway, and it’s basically never a hurdle if your target is still in academic circles. It only becomes a problem in this specific context of automated authentication (based on something not intended for that purpose).
If you forward emails automatically then you'd lose this accreditation. I suppose the solution would be an accreditatiom domain that forwards to your uni address only, but that's extra work now.
Obviously our university isn't gonna make a 10k€/month contract just because some prof wants their mail forwarded to gmail. Especially not if they are not working here anymore.
Should I have waited until the startup had more revenue? We were profitable at the time (we were B2B and the layoffs did us in)
I cannot understand how what's written there could have been confidently construed as a statement.
Then don't pretend that it is an email address.
I mean, it's true that email addresses are not guaranteed to be assigned for life, but putting a fake email address on a paper is misleading.
John has since moved on and is earning more at ABC Corp instead. XYZ Corp has duly reclaimed John's old email address, and John cannot receive emails at said address any longer.
This is the situation the OP is in. It was never a "fake email address". They did not literally type "first.last@org", that was an example suitable for using in their comment.
[edit: I'm actually wrong with that last statement, as it turns out. While it wasn't a fake email address, the situation is slightly more nuanced in that OP actually did say "{first}.{last}@hhi.fraunhofer.de" in the paper, as there were multiple authors who all had the same email address format - see https://news.ycombinator.com/item?id=41479618. I still think this is a valid method, though, and it's certainly not fake. Besides, the problem I outlined sounds like it probably remains an issue even if it's not the exact problem OP is experiencing.]
It sounded as if they were using "john.doe@hhi.fraunhofer.de" while in reality it was an invalid email address ("because there’s no match with the email address"), that he would have tried to claim co-authorship via his "real" address, which might be something like "j.doe2@hhi.fraunhofer.de" (but luckily is not).
It's all clear now. Thank you for your explanation.
I think you don't know what a email address is, and how they are used.
> (...) but putting a fake email address(...)
This nonsense of "fake email address" was only brought up as a baseless accusation. There is zero substance to it, and it's been used as a red herring in this discussion.
Focus on the problem: do you expect any and all email addresses you publish somewhere years ago to continue to work?
I understood it this way: org is not handing out first.last@org to the employee, but using an email format in order to clarify that "first last" is working at org and collaborated on the paper not in private, but as an employee.
He might have last.f@org gotten assigned as a valid email address from the org, but that one is not being used on the paper, while first.last@org is invalid.
> I think you don't know what a email address is, and how they are used.
You should know that this kind of comment should not be made on HN, see the guidelines [0] ("Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.").
> do you expect any and all email addresses you publish somewhere years ago to continue to work?
No. But that is irrelevant to this conversation.
This is more like a mismatch between "fully edited open-acess papers" and "trying to use arxiv preprints as an approximation of the former".
Of course most of that is moot for professional scientists because you likely know (or at least be able to find out about) the authors already. For example some papers have old non-working email addresses for the authors who have since moved institution. It's not a problem, since I'll just look them up by name if necessary and usetheir current email.
They're no longer at that institute, and that email no longer exists (while some institutions give some leeway, I know of at least one major university which removes them the day the contract ends).
This is a common problem if you're providing services to academics and you've tied yourself to using emails as identifiers.
John has since moved on and is earning more at ABC Corp instead. XYZ Corp has duly reclaimed John's old email address, and John cannot receive emails at said address any longer.
This is the situation the OP is in. It was never a "fake email address". They did not literally type "first.last@org", that was an example suitable for using in their comment.
[edit: I'm actually wrong with that last statement, as it turns out. While it wasn't a fake email address, the situation is slightly more nuanced in that OP actually did say "{first}.{last}@hhi.fraunhofer.de" in the paper, as there were multiple authors who all had the same email address format - see https://news.ycombinator.com/item?id=41479618. I still think this is a valid method, though, and it's certainly not fake. Besides, the problem I outlined sounds like it probably remains an issue even if it's not the exact problem OP is experiencing.]
In our case it's for saving space in the paper, and also for reducing spam. This small change may now seem silly in the age of LLMs, but the papers that have full email addresses in them get a considerable amount of fake conference and journal participation emails, which is annoying.
I still think this is valid (and certainly not the fake email address that people are calling it), but yeah, it's not what I thought it was.
OpenReview has: - preprints - versioning - reviewing, with threads and latex support - ability to link websites, repos, datasets, etc - bibtex generation
But it's not as popular as arxiv, though very popular in review (conferences often do not use full features)
One thing I dislike about this is that it is open to all. Arxiv doesn't have a hard filter (you just need someone with an account to vet you, which stakes their reputation), but the existence of a filter is critical.
I don't want a place to engage with the public. We have Reddit, hacker news, Twitter, mastodon, and countless other places. I want a place for academics to talk to academics [0], researchers to researchers. There is a serious lack of spaces where serious low level in-the-weeds discussions can happen. Even fucking GitHub and hugging face are swamped by people asking dumb questions on research projects like how to install pytorch, fine tune a model, or where the source code is.
I'm really happy to include a lot of people, but I think we also need spaces where experts know they're talking to peers. Without that you have to assume you're not talking to peers because they outnumber us a few thousand to one. So that doesn't encourage engaging in research or technical discussions, it encourages talking down to peers and misinterpreting.
[0] a degree is not what makes you "an academic"
For some more context, we are a group of 3 students with a background in AI research, and this site was initially built as an internal tool to discuss ai papers at Stanford. We've been dealing with a lot of growing pains/infra issues over the past month that we are in the process of hashing out. From there we would love to make a more concerted effort to share this in areas outside of AI. Happy to hear your thoughts here, or more formally via contact@alphaxiv.org.
I do want to highlight, our site has a team of reviewers/moderators and having folks from different subject areas is critical to making sure the site doesn't end up a cesspool, apply here: https://docs.google.com/forms/d/11ve-4cL0axTDcqnHF66zX6greFV....
My main recommendation was going to be organizational: to cooperate and work with arXiv itself, rather than risking a potentially adversarial or competitive relationship.
Now that I think about it however, I am convinced by a peer comment that was basically "leave arxiv the way it is and don't mess it up." So carry on then.
I worry that fragmentation of this space might not be beneficial, so it would be nice if these services could collaborate in some way, perhaps using activitypub or something
Tenured prof here. Academics don't use HTML, despite its obvious advantages. The incentive system is deeply broken. No big-name journal or conference will accept a well-formatted HTML over their proprietary Latex/Word format. Latex to PDF converters generally suck.
1) Zoom buttons just for the paper - the article text is often tiny, and zooming in with the browser messes up the page layout and makes the page practically unusable.
OR
2) A simple direct button to download the PDF directly. This would alleviate the zoom problem since I can view it in my local PDF reader with the best settings for me. Having to go to arxiv to download the PDF for every paper would be a nuisance over time though, so a button in the top bar would make the experience a lot better.
Edit: The above is applicable to arxiv itself, I got confused, the alphaxiv.org opens the PDF in a framed way with no option to download, indeed.
I ask because every system I've ever come across for discussing and ranking content without human moderation is always, sooner or later, gamed.
Perhaps they think they are reputable now just because Perelman's proof papers were published there and they want to maintain their 'reputation' [2]. The irony is that Perelman would most probably not publish in arXiv if it is in their current pseudo journal status.
[1] Editorial Advisory Council:
https://info.arxiv.org/about/people/editorial_advisory_counc...
[2] Reclusive mathematician rejected honors for solving 100-year-old math problem, but he relied on Cornell's arXiv to publish:
https://news.cornell.edu/stories/2006/09/proof-100-year-old-...
Great idea though, would love to use sth like this, if it existed on a federalized protocol.
I could see myself using alphaxiv for that, and then, if there's a comment section, I might even read it, and, who knows, leave a comment. But there's no way I'm going to be changing the address or going to some other site to search for papers just to see whether there are some comments.
ps: I see the extension adds a "discussion" link to arxiv, it is a pity that it is only available for Chrome.
Sadly, the last comments in HEP are more than 2 years old (which explains why I had never heard about it, it seems it never gained any traction)
Why wasn't it possible to contact arXiv and do this in collaboration with them?
While it says "a Stanford project" at the top, this could mean anything. It could in particular allude to the fact that Stanford professors are advisor.
The advisors are well-known professors, but some, like Sebastian Thrun, have an entrepreneurial background, so probably a company sits behind this.
For me, the entire project gets a NO CONFIDENCE vote because I can't tell who they are, who pays for moderation (which they claim to do etc.).
I'd feel much more at ease if this was done by arXiv, or at least endorsed by arXiv.
i always love any idea for curating high iq internet community
Interestingly, the sites have grown at a similar rate. Going back to 2020, arXiv had 25+ million downloads/month and bioRxiv had ~2.1 million.
[1] https://arxiv.org/stats/monthly_downloads [2] https://api.biorxiv.org/reports/usage
1. arXiv, ~21k/month
2. bioRxiv, ~5k/month
3. medRxiv, ~1.3k/month
It would be interesting to see usage of other _xiv's and other pre-print publishers!
[1] https://arxiv.org/stats/monthly_submissions [2] https://api.biorxiv.org/reports/content_summary [3] https://api.medrxiv.org/reports/content_summary
Which means they will. Sciencing is hard, especially because gatekeeping is made necessary to keep the spammers (the ones that flood my email with fake journal offers) at bay.
Writing a paper and dealing with the inane requests from three reviewers was already frustrating and stressful. Now you open up a never ending review process of random people making demands (“unless you do X follow up analysis, this is worthless”).
also you need to be able to say a paper/project is done and let yourself move on. If your job turns into “respond to feedback on every paper you’ve ever written, every day” you’ll never start anything new.
It’s one thing to optimize an abstract pursuit of knowledge, but you also gotta remember that you need this to be a job people are willing/able to do.
To put it another way, if it’s just a job to you, dealing with criticism is part of what you’re getting paid for anyway, same as everyone else. And if that’s still too much to ask, you can point the finger at the lack of research that isn’t profit-driven.
Peer reviewing is hard work. Give people a readily available shortcut for it, and some people, in some occassions will take it. Which may in turn force conferences to adopt policies forbidding posting on arXiv.
I don't know how this page organizes moderation, but I imagine there will be some kind of moderation like on most online discussion forums.
Remember all that hype around LK-99 room temp superconduction a few months back? The substantive scientific discussion would absolutely be drown out by curious laymen and/or grifters
And to be fair, few scientific news get even remotely near the attention that superconductivity announcement got.