> For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases...No prior work has explored whether LLMs retain the power-law scaling laws observed in language tasks when extended to visual generation tasks. We prove this alignment and further show that vision can be effectively learned by LLMs as a form of language.
Does this really show much that https://arxiv.org/abs/2301.03728#facebook (uncited) and other earlier work did not?