Install WordNet nltk into the Janus-linked Python

For Windows, see the quick summary in the Windows screenshot at the end. The macOS steps and a more detailed explanation are given immediately below.

At the Terminal, after installing SWI-Prolog, type swipl. You should see the Prolog welcome banner and its prompt ?-. Type py_version. and hit Enter. Don't forget the period (.) at the end of the command (a Prolog quirk). This one embeds Python 3.9.6 (YMMV, you will probably have a different version showing).

$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 9.2.5)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.

?- py_version.
% Interactive session; added `.` to Python `sys.path`
% Janus 1.3.0 embeds Python 3.9.6 (default, Feb  3 2024, 15:58:28) 
[Clang 15.0.0 (clang-1500.3.9.4)]
true.

Next, we can enter the linked Python interpreter directly from SWI-Prolog. Type py_shell. and Enter after the prompt. You should see the Python welcome banner and a prompt >>>. Notice the Python version is 3.9.6, the important thing is that it's the same one that was reported above.

?- py_shell.
Warning: Janus: py_shell/0: Importing janus into the Python shell requires Python 3.10 or later.
Warning: Run "from janus import *" in the Python shell to import janus.
Python 3.9.6 (default, Feb  3 2024, 15:58:28) 
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
(InteractiveConsole)

If you type sys.executable and hit Enter (notice no period is required in Python), you can see the underlying program is swipl.

>>> sys.executable
'/Applications/SWI-Prolog.app/Contents/MacOS/swipl'
>>>

Assuming you haven't already installed nltk into this particular Python, the command import nltk will throw an error (as it's not installed yet).

>>> import nltk
Traceback (most recent call last):
  File "", line 1, in 
ModuleNotFoundError: No module named 'nltk'

pip is the command that downloads and install Python packages such as nltk. The problem is that each install of Python has its own pip, and if you install nltk to one, it's not visible to other installs. What's the solution? Use the pip that is inside the Python we are running now to install nltk. That guarantees it's the right one. Command is pip.main(['install','nltk']) after first running import pip.

>>> import pip
>>> pip.main(['install','nltk'])
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Defaulting to user installation because normal site-packages is not writeable
Collecting nltk
  Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
     |████████████████████████████████| 1.5 MB 9.2 MB/s 
Collecting regex>=2021.8.3
  Downloading regex-2024.5.15-cp39-cp39-macosx_10_9_x86_64.whl (281 kB)
     |████████████████████████████████| 281 kB 30.4 MB/s 
Collecting joblib
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
     |████████████████████████████████| 301 kB 26.8 MB/s 
Collecting tqdm
  Downloading tqdm-4.66.4-py3-none-any.whl (78 kB)
     |████████████████████████████████| 78 kB 10.3 MB/s 
Collecting click
  Downloading click-8.1.7-py3-none-any.whl (97 kB)
     |████████████████████████████████| 97 kB 11.2 MB/s 
Installing collected packages: tqdm, regex, joblib, click, nltk
  WARNING: The script tqdm is installed in '/Users/sandiway/Library/Python/3.9/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script nltk is installed in '/Users/sandiway/Library/Python/3.9/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed click-8.1.7 joblib-1.4.2 nltk-3.8.1 regex-2024.5.15 tqdm-4.66.4
WARNING: You are using pip version 21.2.4; however, version 24.1.2 is available.
You should consider upgrading via the '/Applications/SWI-Prolog.app/Contents/MacOS/swipl -m pip install --upgrade pip' command.
0

The above largely ignorable junk tells you version 3.8.1 of nltk is being installed. Where does all that go? Well, it appear it goes into /Users/sandiway/Library/Python/3.9/, which makes sense as we're running embedded Python 3.9.6. What it doesn't tell you is that nltk actually went into /Users/sandiway/Library/Python/3.9/lib/python/site-packages for me. Shell command (not Python, not Prolog) ls on this directory will show you something like this:

 $ ls
click				nltk				tqdm
click-8.1.7.dist-info		nltk-3.8.1.dist-info		tqdm-4.66.4.dist-info
joblib				regex
joblib-1.4.2.dist-info		regex-2024.7.24.dist-info

How does Python 3.9.6 know where on your computer to look for the nltk package? Well, there's a system variable called sys.path. Typing it will show something like:

>>> sys.path
['', '/Applications/SWI-Prolog.app/Contents/swipl/library/ext/swipy/python', '/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python39.zip', '/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9', '/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/lib-dynload', '/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages']

If you are not confident, it can find nltk (via Python command import nltk), you can add it to sys.path as follows:

>>>
sys.path.append('/Users/sandiway/Library/Python/3.9/lib/python/site-packages')

At this point, we've installed nltk for Python 3.9. It resides somewhere in your user directory structure on this computer. And we've added it to sys.path. So, now import nltk followed by nltk.download() (for corpora including WordNet) should work.

>>> import nltk
>>> nltk.download()
DEPRECATION WARNING: The system version of Tk is deprecated and may be removed in a future release. Please don't rely on it. Set TK_SILENCE_DEPRECATION=1 to suppress this warning.
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

The above warning message is because the Tk graphics library on this laptop is out of date. But it still worked. And popped up a menu, and you can select the corpora you want. I recommend selecting all as you don't really want to do this again, do you?

After, it's done, WordNet should be somewhere on your computer. It doesn't tell you where. It's in the directory nltk_data/corpora under your home directory (of course). The following is the result of running the Shell command ls on ~/nltk_data (tilde: ~ is shorthand for my home directory on the computer).

$ ls
chunkers	grammars	misc		sentiment	taggers
corpora		help		models		stemmers	tokenizers

We're done. Inside Python, you should be able load and run Wordnet as follows:

>>> from nltk.corpus import wordnet as wn
>>> wn.morphy('did','v')
'do'

The SMT parser uses wn.morphy to do stemming (regular and irregular). Above, 'did' as a verb 'v' is stemmed down to 'do'. Below is an example with a plural noun.

>>> wn.morphy('corridors','n')
'corridor'

You can also call morphy from SWI-Prolog on the command line with a slightly different syntax once you've installed the parser.


Windows

An example of running Python via the py command in PowerShell, and installing nltk (using pip from inside Python) and nltk_data (via the popup menu) following the instructions above.


Sandiway Fong, University of Arizona.
Last modified: Tue Aug 13 17:00:20 MST 2024