When you install the numpy wheel through `uv` you are likely installing a pre-compiled binary that bundles openblas inside of it. When you install numpy through conda-forge, it dynamically links against a dummy blas package that can be substituted for mkl, openblas, accelerate, whatever you prefer on your system. It's a much better solution to be able to rely on a separate package rather than having to bundle every dependency.
Then lets say you install scipy. Scipy also has to bundle openblas in their wheel, and now you have two copies of openblas sitting around. They don't conflict, but this quickly becomes an odd thing to have to do.
With a background in scientific computing where many of the dependencies I managed are compiled, conda packages gives me much more control.
P.S. I’d like to point out to others to differentiate between package index and package managers. PyPI is an index (that hosts packages in a predefined format) while pip, poetry, uv are package managers that resolve and build your environments using the index.
Similarly but a bit more confusingly, conda can be understood as the index, hosted by anaconda but can also be hosted elsewhere, with different “channels” (kinda like a GitHub organization) where conda-forge is a popular one built by communities. Conda is also a reference implementation of a package manager that uses anaconda channels to resolve. Mamba is an independent, performant, drop in replacement of conda. And pixi is a different one with a different interface by the author of mamba.
Even more confusingly, there are distributions. Distributions come with a set of predefined packages together with the package manager such that you just start running things immediately (sort of like a TeXLive distribution in relation to the package manager tlmgr.) there are anaconda distributions (if you installed anaconda instead of installing conda, that’s what you get), but also Intels distribution for Python, mini forge, mambaforge, etc.
Is this beyond what the pyproject.toml spec supports?
But Anaconda and CondaForge are general package repository, they are not Python-specific but are happy to be used for R, Julia, C/C++/Fortran binaries, etc. it’s primarily a binary-based repository. For example, you can `conda install python` but you can’t `pip install python`.
I don’t know if there is any technical barrier or just a philosophical barrier. Clearly, Pip handles binary blobs inside of Python packages fine, so I would guess the latter but am happy to be corrected :).
Domain Name: pyherald.com
Registry Domain ID: 2663190918_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.namesilo.com
Registrar URL: https://www.namesilo.com/
Updated Date: 2024-12-21T07:00:00Z
Creation Date: 2021-12-21T07:00:00Z
Registrar Registration Expiration Date: 2024-12-21T07:00:00Z
https://web.archive.org/web/20241220211119/https://pyherald.... is the most recent snapBefore conda, getting a usable scipy install up and running on MS Windows was a harrowing experience. And having two independent installations was basically impossible. The real hard work that went into conda was reverse engineering all the nooks and crannies of the DLL loading heuristics, to allow it to ensure that you loaded what you intended.
If you are working on macOS and deploying to some *nix in the cloud, you are unlikely to find any value in this. But in ten years as lead on a large tool that was deployed to personal (Windows) laptops in a corporate environment, I did not find anything that beat conda.
Today you can just "pip install scipy" on Windows at it will just work.
I can’t recall the library, but there was another major project that just deprecated TF because it was the cause of so many build problems.
I create a virtual environment for every project. I install almost all packages with pip, except for any binaries or CUDA related things from conda. I always exported the conda yaml file and managed to reproduce the code/environment including the Python version. I've seen a lot of posts over time praising poetry and other tools and complaining about conda but I could never relate to any of them.
Am i doing something wrong? Or something right?
Poetry isn't perfect, but it's working in an imperfect universe and at least gets the basics (lockfiles) correct to where packages can be semi-reproducible.
There's another rant to be had at the very existence of venvs as part of the solution, but that's neither poetry or anaconda's fault.
It is entirely possible to use poetry to determine the precise set of packages to install and write a requirements.txt, and then shotgun install those packages in parallel. I used a stupidly simple fish shell for loop that ran every requirements line as a pip install with an “&” to background the job and a “wait” after the loop. (iirc) Could use xargs or parallel too.
This is possible at least. Maybe it breaks in some circumstances but I haven’t hit it.
Not as an excuse for bad behavior but rather to consider infrastructure and expectations:
The packages might be cached locally.
There might be many servers – a CDN and/or mirrors.
Each server might have connection limits.
(The machine downloading the packages miiiiiight be able to serve as a mirror for others.)
If these are true, then it’s altruistically self-interested for everyone that the downloader gets all the packages as quickly as possible to be able to get stuff done.
I don’t know if they are true. I’d hope that local caching, CDNs and mirrors as well as reasonable connection limits were a self-evident and obviously minimal requirement for package distribution in something as arguably nation-sized as Python.
And… just… everywhere, really.
We've used different combinations of pipx+lockfiles or poetry, which has been so far OK'ish. But recently discovered uv and are wondering about existing experience so far across the industry.
At the same time, poetry still uses a custom format and is pretty slow.
I was so amazed of the speed, I moved all my projects to uv and have not yet looked back.
uv replaces all of pip, pipx and poetry for me, I does not do more than these tools, but it does it right and fast.
If you're at liberty to try uv, you should try it someday, you might like it. (nothing wrong with staying with poetry or pyenv though, they get the job done)
or worse, imagine being a longtime user of shells but not python and then being presented a venv as a solution to the problem that for some reason python doesn't stash deps in a subdirectory of your project
You just need to have some sort of wrapper/program that knows how to figure out which dependencies to use for a project. With bundler, you just wrap everything in "bundle exec" (or use binstubs).
There are many dependency managers that use a project-local flat storage, and a global storage was really frowned upon until immutable versions and reliable identifiers became popular some 10 years ago.
Ruby and Perl certainly didn't have it - although Ruby did subsequently add Bundler to gems and gems supported multiversioning.
I feel like venv is one such solution. A workaround that doesn’t solve the problem at its root, so much as make the symptoms manageable. But there is (at least for me) a big difference between things like that and the cool ideas that underlie shell tooling like Unix pipes. Things like jq or fzf are awesome examples of new tooling that fit beautifully in the existing paradigm but make it more powerful and useful.
For some libraries, it is not acceptable to stash the dependencies for every single toy app you use. I don't know how much space TensorFlow or PyQt use but I'm guessing most people don't want to install those in many venvs.
Every google for help I do is useless. Each page is full of terms I don't understand at *all*. They're like "Oh solving that error is simple, just take the library and shove it into the jenga package loader so you can execute the lab function with a pasta variation".
She probably would have been better off being pointed towards jupyter, but that's neither here nor thereAlso installing everything with pip is a great way to enjoy unexplainable breakage when a Doesn't work with v1 and b doesn't work with v2.
It also leads to breaking Linux systems where a large part of the system is python code. Especially where user upgrades system python for no reason.
- Setup custom kernels in Jupyter Notebook
- Hardlink the environments, then install same packages via pip in one and conda in others
- install conda inside conda (!!!) and enter nested environment
- Use tox within conda
I believe as long as you treat the environments as "cattle" (if it goes bad, remove it and re-create from yaml file), you should not have any problems. It's clearly not the case of for the post's author though.
(I do agree pip is still pretty lackluster, but the proposed replacements don't really get to the heart of the problem and seem to lack staying power. I'm in 'wait and see' mode on most of them)
`pixi` basically covers `conda` while using the same solver as `uv` and is written in Rust like `uv`.
Now is it a good idea to have python's package management tool handle non-python packages? I think that's debateable. I personally am in favor of a world where `uv` is simply the final python package management solution.
Wrote an article on it here: https://dublog.net/blog/so-many-python-package-managers/
It’s fast, takes yml files as an input (which is super convenient) and super intuitive
Quite surprised it isn’t more popular
Also crossing fingers that uv ends up being the last one standing when the comprehensive amounts of dust here settle. But until then, I'll look into pixi, on the off chance it minimizes some of my workplace sorrows.
> except for any binaries or CUDA related things from conda
doing the default thing with cuda related python packages used to often result in "fuck it, reinstall linux". admittedly, i dont know how it is now. i have one machine that runs python with a gpu and it runs only one python program.
From about 2014-17 you are correct, but it appears (on ubuntu at least), that it mostly works now. Maybe I've just gotten better at dealing with the pain though...
People like to complain about node packages but never seen people have the trouble with them that they have with python.
You can just give up and say that "The proper way to do this is to use the Nvidia CUDA toolkit to write your cuda app in C++ and then invoke it as a separate process from node" [0]. That apparently works for node, but Python wants much more.
If you actually want to use high-performance native code in your slow compiled language, then no solution is going to be very good, that's because the problem is inherently hard.
You can rely on host OS as much as possible - if OS is known, provide binaries; if it's unknown, provide source code and hope user has C/C++/Rust/Fortran compilers to build it. That's what uv, pip, etc.. do.
You can create your own parallel OS, bringing your own copy of every math libray, as well as CUDA even if there are perfectly good versions installed on the system - that's what conda/minconda does.
You can implement as much as possible in your own language, so there is much less need to use "high-performance native language" - that's what Rust and Go do. Sadly, that's not an option for Python.
[0] https://stackoverflow.com/questions/20875456/how-can-i-use-c...
If you aren't precise, you're gonna get different versions of your dependencies on different machines. Oops.
Pinning concrete versions is of course better, but then there isn't a clear and easy way to upgrade all dependencies and check whether ci still passes.
The only difference from one language to another is that some make this mandatory, while in others it's only something that you should really do and there isn't any other real option you should consider.
That means you don't use Windows.
What is great. Keep not using it. But most people will have a different experience.
- DS/compiled libs users (mostly Fortran/Cuda/C++)
- Anyone with dependencies on native/non python libraries.
Conda definitely helps with 2 and 3 above, and uv is at least a nice, fast API over pip (which is better since it started doing dependency checking and binary wheels).
More generally, lots of the issues come from the nature of python as a glue language over compiled libraries, which is a relatively harder problem in general.
i think there was a significant step change improvement in python packaging around 2012, when the wheel format was introduced, which standardised distributing prebuilt platform-specific binary packages. for packages with gnarly native library dependencies / build toolchains (e.g. typical C/fortran numeric or scientific library wrapped in a layer of python bindings), once someone sets up a build server to bake wheels for target platforms, it becomes very easy to pip install them without dragging in that project's native build-from-source toolchain.
venv + pip (+ perhaps maintaining a stack of pre-built wheels for your target platform, for a commercial project where you want to be able to reproduce builds) gets most of the job done, and those ingredients have been in place for over 10 years.
around the time wheel was introduced, i was working at a company that shipped desktop software to windows machines, we used python for some of the application components. between venv + pip + wheels, it was OK.
where there were rough edges were things like: we have a dep on python wrapper library pywhatever, which requires a native library libwhatever.dll built from the c++ whatever project to be installed -- but libwhatever.dll has nothing to do with python, maybe its maintainers kindly provide an msi installer, so if you install it into a machine, it gets installed into the windows system folder, so venv isn't able to manage it & offer isolation if you need to install multiple versions for different projects / product lines, as venv only manages python packages, not arbitrary library dependencies from other ecosystems
but it's a bit much blame python for such difficulties: if you have a python library that has a native dependency on something that isnt a python package, you need to do something else to manage that dep. that's life. if you're trying to do it on windows, which doesn't have an O/S level package manager.. well, that's life.
Oh, and if some package you are using has a bug or something that requires you to vendor it in your repo, well then good luck because again, PEP 508 does not support installing another package from a relative link. You either need to put all the code inside the same package, vendored dependency included, and do some weird stuff to make sure that the module you wanted to vendor is used first, or... you just have to use the broken package, again for some sort of security reasons apparently.
Again, all of that might even work when using pip from the cli, but good luck trying to make a requirements.txt or define dependencies in a standard way that is even slightly outside of a certain workflow.
Adding index URLs is explicitly not supported in the requirements.txt in setuptools or the default python build tool.
For development I use venv and pip, sometimes pyenv if I need a specific Python version. For production, I install Python packages with apt. The operating system can deal with upgrading minor library versions.
I really hate most other package managers, they are all to confusing and to hard to use. You need to remember to pull in library update, rebuild and release. Poetry sucks too, it's way to complicated to use.
The technical arguments against Python packages managers are completely valid, but when people bring up Maven, NPM or even Go as role models I check out. The ergonomics of those tools are worse than venv and pip. I also think that's why we put up with pip and venv, they are so much easier to use than the alternative (maybe excluding uv). If a project uses Poetry, I just know that I'm going to be spending half a day upgrading dependencies, because someone locked them down a year ago and there's now 15 security holes that needs to be plugged.
No, what Python needs is to pull in requests and a web framework into the standard library and then we can start build 50% of our projects without any dependencies at all. They could pull in Django, it only has two or three dependencies anyway.
Being new to the ecosystem I have no clue why people would use Conda and why it matters. I tried it, but was left bewildered, not understanding the benefits.
The big thing to realise is that when Conda first was released it was the only packaging solution that truly treated Windows as a first class citizen and for a long time was really the only way to easily install python packages on Windows. This got it a huge following in the scientific community where many people don't have a solid programming/computer background and generally still ran Windows on their desktops.
Conda also not only manages your python interpreter and python libraries, it manages your entire dependency chain down to the C level in a cross platform way. If a python library is a wrapper around a C library then pip generally won't also install the C library, Conda (often) will. If you have two different projects that need two different versions of GDAL or one needs OpenBLAS and one that needs MKL, or two different versions of CUDA then Conda (attempts to) solve that in a way that transparently works on Windows, Linux and MacOS. Using venv + requirements.txt you're out of luck and will have to fall back on doing everything in its own docker container.
Conda lets you mix private and public repos as well as mirroring public packages on-perm in a transparent way much smoother than pip, and has tools for things like audit logging, find grained access control, package signing and centralised controls and policy management.
Conda also has support for managing multi-language projects. Does your python project need nodejs installed to build the front-end? Conda can also manage your nodejs install. Using R for some statistical analysis in some part of your data pipeline? Conda will mange your R install. Using a Java library for something? Conda will make sure everybody has the right version of Java installed.
Also, it at least used to be common for people writing numeric and scientific libraries to release Conda packages first and then only eventually publish on PyPi once the library was 'done' (which could very well be never). So if you wanted the latest cutting edge packages in many fields you needed Conda.
Now there are obviously a huge class a projects where none of these features are needed and mean nothing. If you don't need Conda, then Conda is no longer the best answer. But there are still a lot of niche things Conda still does better than any other tool.
I love conda, but this isn't true. You need to opt-in to a bunch of optional compiler flags to get a portable yml file, and then it can often fail on different OS's/versions anyway.
I haven't done too much of this since 2021 (gave up and used containers instead) but it was a nightmare getting windows/mac builds to work correctly with conda back then.
As user of the modules, venv is sufficient.
Coming from C++, IMO, it is vastly better.
Nowadays, thanks to wheels being numerous and robust, the appeal of anaconda is disappearing for most users except for some exotic mixes.
conda itself now causes more trouble than it solves as it's slow, and lives in its own incompatible world.
But anaconda solves a different problem now that nobody else solves, and that's managing Python for big corporation. This is worth a lot of money to big structures that need to control packages origin, permissions, updates, and so on, at scale.
So it thrives there.
In my experience conda is enormously superior to the standard Python packaging tools.
Mind you, glad it works for you. Warms my grey heart to know there's some balance in this universe. :)
I avoid it as much as possible.
Why go through all this trouble? Because originally it was meant to be a basic "scientific Python" distribution, and needed to be strict around what's installed for reproducibility reasons.
It's IMO overkill for most users, and I suspect most scientific users don't care either - most of the time I see grads and researchers just say "fuck it" and use Pip whenever Conda refuses to get done in a timely fashion.
And the ones who do care about reproducibility are using R anyway, since there's a perception those libraries are "more correct" (read: more faithful to the original publication) than Pythonland. And TBH I can't blame them when the poster child of it is Sklearn's RandomForestRegressor not even being correctly named - it's bagged trees under the default settings, and you don't get any indication of this unless you look at that specific kwarg in the docs.
Personally, I use Conda not for reproducibility, but so all of my projects have independent environments without having to mess with containers
I worked in a pharma company with lots of R code and this comment is bringing up some PTSD. One time we spent weeks trying to recreate an "environment" to reproduce a set of results. Try installing a specific version of a package, and all the dependencies it pulls in are the latest version, whether or not they are compatible. Nobody actually records the package versions they used.
The R community are only now realising that reproducible environments are a good thing, and not everybody simply wants the latest version of a package. Packrat was a disaster, renv is slightly better.
A perfectly reasonable goal, yup! Thankfully not one that, in fact, requires conda. Automated per-project environments are increasingly the default way of doing things in Python, thank goodness. It's been a long time coming.
Neat idea, but sounds like a lot of work.
I've seen so many issues with different Python venvs from different Python project directories stepping on each others' dependencies somehow (probably because there are some global ones) that the fact that I can now just stick a basic and barely-modified-per-project Python flake.nix file in each one and be always guaranteed to have the entirely of the same dependencies available when I run it 6 months later is a win.
I'll offer mine: I won't say that Python packaging is generally excellent, but it's gotten much better over the years. The pyproject.toml is a godsend, there's the venv module built-in to Python, pip will by default no longer install package outside of a venv. Dependency groups are being added, meaning that the requirements.txt files can also be specified in the project.toml. Documentation is pretty good, especially if you avoid blog posts from 5+ years ago.
I don’t mind conda. It has a lot of caveats and weird quirks
But, I think this illustrates the problem very well.
Conda isn't just used for Python. It's used for general tools and libraries that Python scripts depend on. They could be C/C++ that needs to be compiled. It could be a Cython library. It could be...
When you're trying to be a package manager that operates on-top of the operating system's package manager, you're always going to have issues. And that is why Conda is such a mess, it's trying to do too much. Installation issues are one of the reason why I stopped writing so many projects in Python. For now, I'm only doing smaller scripts in Python. Anything larger than a module gets written in something else.
People here have mentioned Rust as an example of a language with a solid dependency toolchain. I've used more Go, which similarly has had dependency management tooling from the begining. By and large, these languages aren't trying to bring in C libraries that need to be compiled and linked into Python accessible code (it's probably possible, but not the main use-case).
For Python code though, when I do need to import a package, I always start with a fresh venv virtual environment, install whatever libraries are needed in that venv, and then always run the python from that absolute path (ex: `venv/bin/python3 script.py`). This has solved 99% of my dependency issues. If you can separate yourself from the system python as much as possible, you're 90% of the way there.
Side rant: Which, is why I think there is a problem with Python to begin with -- *nix OSes all include a system level Python install. Dependencies only become a problem when you're installing libraries in a global path. If you can have separate dependency trees for individual projects, you're largely safe. It's not very storage efficient, but that's a different issue.
How would you do this otherwise? I find `conda list` to be terribly helpful.
As a tool developer for bioinformaticians, I can't imagine trying to work with OS package managers, so that would leave vendoring multiple languages and libraries in a home-grown scheme slightly worse and more brittle than conda.
I also don't think it's realistic to imagine that any single language (and thus language-specific build tools or pkg manager) is sufficient. Since we're still using fortran deep in the guts of many higher level libraries (recent tensor stuff is disrupting this a bit, but it's not like openBLAS isn't still there as a default backend).
I think you might be surprised as to how long this has been going on (or maybe you already know...). When I started with HPC and bioinformatics, Modules were already well established as a mechanism for keeping track of versioning and multiple libraries and tools. And this was over 20 years ago.
The trick to all of this is to be meticulous in how data and programs are organized. If you're organized, then all of the tracking and trails are easy. It's just soooo easy to be disorganized. This is especially true with non-devs who are trying to use a Conda installed tool. You certainly can be organized and use Conda, but more often than not, for me, tools published with Conda have been a $WORKSFORME situation. If it works, great. If it doesn't... well, good luck trying to figure out what went wrong.
I generally try to keep my dependency trees light and if I need to install a tool, I'll manually install the version I need. If I need multiple versions, modules are still a thing. I generally am hesitant to trust most academic code and pipelines, so blindly installing with Conda is usually my last resort.
I'm far more comfortable with Docker-ized pipelines though. At least then you know when the dev says $WORKSFORME, it will also $WORKFORYOU.
> A single Anaconda distribution may have multiple NumPy versions installed at the same time, although only one will be available to the Python process (note that this means that sub-processes created in this Python process won’t necessarily have the same version of NumPy!).
I’m pretty sure there’s not, but maybe there is some insane way to cause subprocesses to do this. Besides that, under the authors definition, different Python virtualenvs also install multiple copies of libraries in the same way conda does.
The comments about Jupyter also seem very confused. It’s hard to make heads or tails of exactly what the author is saying. There might be some misunderstandings of how Jupyter kernels select environments.
> Final warning: no matter how ridiculous this is: the current directory in Python is added to the module lookup path, and it precedes every other lookup location. If, accidentally, you placed a numpy.py in the current directory of your Python process – that is going to be the numpy module you import.
This has nothing to do with conda.
uv is here to kick ass and chew bubblegum. And it’s all out of gum.
These days, when I absolutely have to use it because some obscure piece of software can't run unless Conda, I install it in a VM so that:
- I protect my working system from the damage of installing Conda on it
- I can throw the whole garbage fire away without long term brain damage to my system once I'm done
Wait, what? In what situation would that ever happen? Especially given the directories for packages are not versioned, so setuptools should never do two different versions in any way.
poetry has been working well enough for me as of late, but it'd be nice if I didn't have to pick.
What I don't understand - what makes this so difficult to solve in Python? It seems that many other platforms solved this a long time ago - maven 2.0 was released almost 20 years ago. While it wasn't / isn't by no means perfect, its fundamentals were decent already back then.
One thing which I think messed this up from the beginning was applying the Unix philosophy with several/many individual tools as opposed to one cohesive system - requirements.txt, setuptools, pip, pipx, pipenv, venv... were always woefully inadequate, but produced a myriad of possible combinations to support. It seems like simplicity was the main motivation for such design, but these certainly seems like examples of being too simplistic for the job.
I recently tried to run a Python app (after having a couple of years break from Python) which used conda and I got lost there quickly. Project README described using conda, mamba, anaconda, conda-forge, mini-forge, mini-conda ... In the end, nothing I tried worked.
Python creates the perfect storm for package management hell:
- Most the valuable libraries are natively compiled (so you get all the fun of distributing binaries for every platform without any of the traditional benefits of native compilation)
- The dynamic nature makes it challenging to understand the non-local impacts of changes without a full integration test suite (library developers break each other all the time without realizing it, semantic versioning is a farce)
- Too many fractured packaging solutions, not a single one well designed. And they all conflict.
- A bifurcated culture of interactive use vs production code - while they both ostensibly use the same language, they have wildly different sub-cultures and best practices.
- Churn: a culture that largely disavows strong backwards compatibility guarantees, in favor of the "move fast and break things" approach. (Consequence: you have to move fast too just to keep up with all the breakage)
- A culture that values ease of use above simplicity of implementation. Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system. The quite obvious consequence is an ever-growing backlog of complexity.
Some of the issues are technical. But I'd argue that the final bullet is why all of the above problems are getting worse, not better.
100% this.
Last 4 years, one of the most frustrating parts of SWE that I need to deal with on a daily basis is packaging data science & machine learning applications and APIs in Python.
Maybe this is a very mid-solution, but one solution that I found was to use dockerized local environments with all dependencies pinned via Poetry [1]. The start setup is not easy, but now using some other Make file, it's something that I take only 4 hours with a DS to explain and run together and save tons of hours of in debugging and dependency conflict.
> Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system.
Sounds odd to me in several projects that I worked on that folks bring the entire dependency on Scikit-Learn due to the train_test_split function [2] because the team thought that it would be simpler and easier to write a function that splits the dataset.
[1] - https://github.com/orgs/python-poetry/discussions/1879 [2] - https://scikit-learn.org/1.5/modules/generated/sklearn.model...
has anyone managed to make a viable P#, a clean break which retains most of what most people love about the language and environment; and cheerfully asserts new and immutable change in things like <the technical parts of the above>.
When I have looked into this it seems people can't help but improve one-more-thing or one-other-thing and end up just enjoying vaguely-pythonic language design.
I also think some of the criticisms in the GP comment are not accurate. most of the valuable libraries are native compiled? Some important ones are, but not all.
I think a lot of the problem is that Python's usage has changed. Its great for a wide range of uses (scripting, web apps and other server stuff, even GUIs) but its really not a great match for scientific computing and the like but has become widely used there because it is easy to learn (and has lots of libraries for that now!).
If Python leadership had true visionaries they would sit down, analyze every publicly available Python project and build a single set of tools that could gradually and seamlessly replace the existing clusterfuck.
Python developers will pretend the language is all about simplicity and then hand you over to the most deranged ecosystem imaginable. It sure is easy to pretend that you have a really simple ecosystem when you cover your eyes and focus on a small segment of the overall experience.
I'm not sure how Rust is doing it, but the problem is hardly insurmountable.
I think there are many answers to this, and there are many factors contributing to it, but if I had to pick one: The setup.py file. It needs to be executed to determine the dependencies of a project. Since it's a script, that allows any maintainer of any package you are using to do arbitrarily complex/dumb stuff in it like e.g. conditionally adding dependencies based on host system specific environment markers, or introduce assumptions on the environment it is being installed to. That makes trying to achieve all the things you'f want from a modern package manager so much harder.
This also means that the problem isn't just concentrated in 1-2 central package management projects, but scattered throughout the ecosystem (and some of the worst offenders are some of Python's most popular sub-ecosystems).
There is some light with the introduction of the pyproject.toml, and now uv as a tool taking advantage of it.
Yes, this should never have been allowed. It solved a problem in the short term but in the long term has caused no end of pain.
Also I think that Python packages are sometimes distributed as shared libraries is a problem. When I think about conan or vcpkg (package managers for C and C++), they usually suck because some dependencies are available on some platforms and not on others or even in one version on one platform and in another version on another and you get messes all around if you need to support multiple platforms.
I think generally binary package managers are almost always bad* and source based package managers almost always work well (I think those are essentially easy mode).
*: unless they maintain a source package of their own that they actually support and have a fixed set of well-supported platforms (like system package managers on most Linux distros do).
This is exactly the reason I've moved from pip to conda for some projects: "pip" was acting a source-based package manager, and thus asking for C tools, libraries and dev headers to be installed - but not providing them as they were non-Python and thus declared out of scope. Especially on older Linux distributions, getting dependencies right can be quite a task.
Were your issues recent or from several years ago?
1. A good python solution needs to support native extensions. Few other languages solve this well, especially across unix + windows.
2. Python itself does not have package manager included.
I am not sure solving 2 alone is enough, because it will be hard to fix 1 then. And ofc 2 would needs to have solution for older python versions.My guess is that we're stuck in a local maximum for a while, with uv looking like a decent contender.
Is that a new feature? Pretty sure it didn't a few years ago. If the thing I need needed the libfoo C library then I first had to install libfoo on my computer using apt/brew/etc. If a new version of the PHP extension comes out that uses libfoo 2.0, then it was up to me to update libfoo first. There was no way for composer to install and manage libfoo.
> Php-yaml can be installed using PHP's PECL package manager. This extension requires the LibYAML C library version 0.1.0 or higher to be installed.
$ sudo apt-get install libyaml-dev
This is basically how "pip" works, and while it's fine for basic stuff, it gets pretty bad if you want to install fancy numerical of cryptography package on a LTS linux system that's at the end of the support period.I am guessing that PHP might simply have less need for native packages, being more web-oriented.
But with Python it seems completely fractured - everyone tries to solve it their own way, with nothing becoming a truly widely used solution. More involvement from the Python project could make a difference. From my perspective, this mess is currently Python's biggest problem and should be prioritized accordingly.
Even the CLI workflow is identical: dotnet add package / cargo add (.NET had it earlier too, it's nice that Cargo now also has it).
> Python community itself needs to address this.
The Python community can't address it, really, because that would make the Python community responsible for a general-purpose package management system not at all limited to Python, but including packages written in C, C++, and Rust to start, and also Fortran, maybe Haskell and Go, too.
The only role the Python community can realistically play in such a solution is making Python packages well-behaved (i.e., no more arbitrary code at build time or install time) and standardizing a source format rich with metadata about all dependencies (including non-Python dependencies). There seems to be some interest in this in the Python community, but not much.
The truth, perhaps bitter, is that for languages whose most important packages all have dependencies foreign to the ecosystem, the only sane package management strategy is slotting yourself into polyglot software distributions like Nix, Guix, Spack, Conda, Pkgsrc, MacPorts, MSYS2, your favorite Linux distro, whatever. Python doesn't need a grand, unifying Python package manager so much as a limited, unified source package format.
So another tool isn't meaningfully different (and it can be the answer): if "the community" migrates to the new tool it wouldn't matter that there's a dozen of other unused tools.
Same thing if "the community" fixes an existing tool and migrates to it: other unused tools will still exist
That parasitism is also Docker's strength: bring along whatever knowledge you have of your favorite language ecosystem's toolchain; it'll not only apply but it'll likely be largely sufficient.
Build systems like Buck and Bazel are more like Nix in this respect: they take over the responsibilities of soke tools in your language's toolchain (usually high-level build tools, sometimes also dependency managers) so they can impose a certain discipline and yield certain benefits (crucially fine-grained, incremental compilation).
Anyway, Docker doesn't fetch or resolve the dependencies of Python packages. It leaves that to other tools (Nix, apt-get, whatever) and just does you the favor of freezing the result as a binary artifact. Immensely useful, but solves a different problem than the main one here, even if it eases some of the same burdens.
also it doesn't always work, I got stuck with some dependencies when it works it's amazing
At least uv is nice! https://docs.astral.sh/uv/
I think the answer is the same thing that makes it difficult to make a good package manager for C++.
When a language doesn't start with decent package management, it becomes really hard to retrofit a good one later in the lifespan of that language. Everyone can see "this sucks" but there's simply no good route to change the status quo.
I think Java is the one language I've seen that has successfully done the switch.
Conda suffers from the virtual environment syndrome. Virtual environments are always imperfect and confusing. System libraries sometime leak through. The "scientific" Python stack has horrible mixtures of C/C++/Cython etc., all poorly written ad difficult to build.
Projects deteriorated in their ability to build from source due to the availability of binary wheels and the explosion of build systems. In 2010 there was a good chance that building a C project worked. Now you fight with meson versions, meson-python, cython versions, libc versions and so forth.
There is no longer any culture of correctness and code cleanliness in the Python ecosystem. A lot of good developers have left. Some current developers work for the companies who sell solutions for the chaos in the ecosystem.
Don't forget a whole lot of FORTRAN :)
Most of these aspects have significantly improved over the last decade, at least for the standard packaging ecosystem. I don’t know about Conda, which has always been its own separate thing.
Not saying packaging doesn't have faults, but on it's own, on a good Python setup, it's actually better than average. But few people have a good setup. In fact most people don't know what a good setup looks like.
And here is why bootstrapping is broken: https://www.bitecode.dev/p/why-is-the-python-installation-pr...
Well, Unix IS the cohesive system..
Pythons strength (and weakness) is an emphasis on quick scripts, data science and statistics.
There’s simply not the right people with the right mindset.
yay -S python-virtualenv # I'm on arch, do not confuse with 12 similarly named alternatives pyenv virtualenv 3.10 random-python-crap pyenv local 3.10.6/envs/random-python-crap pip install -r requirements.txt
and it works (sometimes deps are in some other places, or you have to pass -c constraints.txt or there is no file and you need to create it in various ways)
At least by not using local .env directories, I always know where to find them.
I install a lot of AI projecst so I have around 1TB just for the same python dependencies installed over and over again.
Sometimes I can get away with trying to use the same venv for two different projects but 8/10 deps get broken.
Though I agree with the premise, Conda is an absolute pest when you start customising an environment with a number of packages. Dependency resolution hell.
Now why you would do that … IDK.
Weirdly, it's the second post on HN this quarter to do it, from a completely different site. Makes me wonder if there's some viral piece of CSS advice out there …? (And nobody looks at their site…?) Bad LLM output?
When 30% of your page is junk, it makes you wonder about the other 30%...
https://mail.python.org/pipermail/python-list/2024-May/91230...
[oof, never mind, it's worse in some ways]