guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PYTHONPATH issue explanation


From: Chris Marusich
Subject: Re: PYTHONPATH issue explanation
Date: Sat, 24 Mar 2018 21:47:04 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)

Hi Hartmut,

Awesome analysis!  Thank you for taking point on this.  I will offer
some feedback.  I hope it is useful.

The short version is: I think Python should let us explicitly tell it
where its system site directory is.  If Python provided such a feature,
then I think we could use it and avoid all these problems.  I think this
would be better than modifying the heuristics that Python uses for
finding its system site during start-up (although I think that is a good
back-up plan), since those heuristics are complicated and difficult to
control.  It would just be simpler if we could explicitly tell Python
where its site directory is, instead of indirectly arranging for Python
to find its site directory via its module-lookup Rube-Goldberg machine.

Hartmut Goebel <address@hidden> writes:

> This python interpreter does not find the site-packages in GUIX_PROFILE
> since site-packages are search relative to "sys.base_prefix" (which is
> the same as "sys.prefix" except in virtual environments).
> "sys.base_prefix" is determined based on the executable's path (argv[0])
> by resolving all symlinks.

I am familiar with this problem.  Any time you want to deploy Python and
its libraries by building up a symlink tree, and you put Python in a
part of the file system that lives far away from the libraries
themselves, Python will punish you cruelly with this behavior.  It is no
fun at all.  :-( You always have to come up with silly hacks to work
around it, and those hacks don't work generally in every case.

Question: Why does Python insist on canonicalizing its executable path?
It always seemed to me like if Python just used the original path, these
problems would not occur.  People who use symlink trees to deploy Python
would be happy.  Perhaps I am missing some information.  What is the
intent behind Python's decision to canonicalize the executable path?
What problems occur if Python doesn't do that?

> The python interpreter assumes "site-packages" to be relative to "where
> python is installed" - called "sys.base_prefix" (which is the same as
> "sys.prefix" except in virtual environments). "sys.base_prefix" is
> determined based on the executable's path (argv[0]) by resolving all
> symlinks. For Guix this means: "sys.base_prefix" will always point to
> /gnu/store/…-python-X.Y, not to GUIX_PROFILE. Thus the site-packages
> installed into the guix profile will not be found.

Yes.  This is a problem.  As you know, this heuristic fails
spectacularly when you try to deploy Python in a symlink tree.

Question: Why does Python not supply a way to "inject" the system site
directory?  In Guix-deployed systems, we are the masters of reality.  We
control ALL the paths.  We can tell Python exactly where its "system
site" is - we can build a symlink tree of its system site in the store
and then tell Python to use that site specifically.  For example, if
Python would let us specify this path via a PYTHON_SYSTEM_SITE
environment variable, then I think it would solve many (all?) of our
problems.  Perhaps this is similar to what you are suggesting regarding
GUIX_PYTHON_X.Y_SITE_PACKAGES and GUIX_PYTHONHOME_X.Y.

> This is why we currently (mis-) use PYTHONPATH: To make the
> site-packages installed into the guix profile available.

I agree that this is a mis-use.  People do it because Python doesn't
provide any better way.  And then people find out about all its terrible
down-sides, like for example the fact that .pth files will not be
processed if they appear on the PYTHONPATH.  And then they do stuff like
hack site.py to walk the PYTHONPATH and evaluate all the .pth files,
which is gross but sort of works.  Just thinking about the pain I have
experienced with this stuff makes my blood boil.

> no. 2
> suggests using a mechanism already implemented in python: Setting
> "PYTHONHOME" will make the interpreter to use this as "sys.base_prefix"
> unconditionally. Again there is only one PYTHONHOME variable for all
> versions of python (designed by upstream). We could work around this
> easily (while keeping upstream compatibility) by using
> GUIX-PYTHONHOME-X.Y, to be evaluated just after PYTHONHOME.

Are there legitimate use cases where a user wants to set their own
PYTHONHOME?  If so, would our use of PYTHONHOME prevent them from doing
that?  If so, that seems bad.

In the past, I have used PYTHONUSERBASE (or maybe it was PYTHONUSERSITE,
I can't remember exactly which) to make Python find libraries in a
symlink tree.  However, because that is intended for users to use, I
don't think it's a good solution for us here.  If we co-opt these
environment variables, then users would not be able to use them.

> The drawback is: This is implemented using an environment variable,
> which might not give the expected results in all cases. E.g. running
> /gnu/store/…-profile/bin/python will not load the site-packages of that
> profile. Also there might be issues implementing virtual environments.
> (Thinking about this, I'm quite sure there will. Ouch!)

I wouldn't be surprised if that's true, but right now, I can't think of
any specific virtualenv-related problems that would occur by using
PYTHONHOME.

> no.3
> suggests changing the way the python interpreter is resolving symlinks
> when searching for "sys.base_prefix". The idea is to stop "at the profile".
>
> The hard part of this is to determine "at the profile". Also this needs
> a larger patch. But if we manage to implement this, it would be perfect.
> I could contribute a draft for this implemented in Python. The
> C-implementation needs to be done by some C programmer.

This seems a little tricky, mainly because it's going to rely again on
heuristics that may not always be accurate.  As I mentioned above, in
Guix we are the masters of reality, so why can't we just tell Python
exactly where its system site path is?  If Python needs to be taught how
to be informed of such things, perhaps that is the patch we should
write: a patch that enables us to tell Python exactly where its system
site directory will be found.

> Which way should we go?

I think we should figure out a way to tell Python EXACTLY where its
system site directory is.  If that isn't viable, then I think the next
best thing will be to adjust the site-finding heuristics (your proposal
No. 3).

Hartmut Goebel <address@hidden> writes:

> As it stands now, the venv-hack is not a valid solution. It may be the basis
> for another solution, tough.

I agree.  We need a solution that allows users to use virtualenv the way
they would normally on any other foreign distro, if they want to.

> 1. How could GUIX-PYTHON-X.Y-SITE-PACKAGES be implemented?
> =============================================================
>
> [...]
>
> 2. How could GUIX-PYTHONHOME-X.Y be implemented?
> =================================================

How do these two methods (GUIX-PYTHON-X.Y-SITE-PACKAGES
vs. GUIX-PYTHONHOME-X.Y) differ?  They seem to serve basically the same
purpose.


> 3. How to avoid GUIX-PYTHONHOME[23]?
> =========================================
>
> We could avoid GUIX-PYTHONHOME[23] if we stop resolving the symlinks at
> the correct point in iteration.
>
> [...]
> 
> Drawbacks:
>
> - More complicated patch.
>
> - More comparison within a look, this will slow down start-up a bit.
>
> Open questions:
>
> - Which are the correct paths to check to stop iteration?
> - How to handle the "pythonX" -> "pythonX.Y" link?
> - How to handle "python-wrapper", which links python -> python3

Instead of modifying Python's heuristics for finding its site, it'd be
better if Python just exposed a way for us to explicitly tell it where
its site directory is.

However, if we really want to modify the heuristics, I can think of some
possible ideas for how to do it:

* Don't canonicalize the path in the first place.
* Stop just before the first path that is in the store.
* Stop at the first path that is in the store.
* Stop at a path that matches a special pattern that we control,
  like "guix-python-site" or something.  We could create 

> 4. Path-handling in Python's start-up sequence

As you've shown, the way Python handles paths when it starts up is quite
complicated.  This is another reason why I would prefer not to change
the heuristics, but instead to expose a way for us to explicitly tell
Python where its site is.

-- 
Chris

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]