How to fix ModuleNotFoundError - No module named pymongo in Notebook
One common issues while working with python or spark is getting error message which says - module not found. Module not found here points that proper configuration or installation is required in respect to libray level. Once this is done, application will able to find my required module.
For example, while using below code in Fabric Notebook and trying to connect with MongoDB database and display records, getting no module found error message.
Running the code, it is throwing error - ModuleNotFoundError - No module named pymongo
To fix the abobe issue, it is required to install the required libries.
#install the required packages
!pip install pymongo
!pip install certifi
Once done, Azure notebook now able to connect with Mongo database, and getting the confirmation log from Azure.
Collecting pymongo
Downloading pymongo-4.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (676 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 676.9/676.9 kB 17.3 MB/s eta 0:00:00a 0:00:01
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.7/307.7 kB 102.1 MB/s eta 0:00:00
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.6.1 pymongo-4.6.3
Requirement already satisfied: certifi in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (2023.7.22)
Below the code is now running successfully .
import pymongo
import certifi
import re
import pandas
mongo_connection_string = "<your connection string>"
# Connect to the database using known good certificates
client = pymongo.MongoClient(mongo_connection_string, tlsCAFile=certifi.where())
print(f"Using MongoDB version {client.server_info()['version']}.")
# Check what databases exist on this server
all_databases = client.list_database_names()
print(f"This MongoDB server has the databases {all_databases}")
# If we know the correct database to talk to, we connect like this:
video_database = client['video']
# Here is the list of collections within my database
all_collections = my_database.list_collection_names()
print(f"This database has the collections {all_collections}")
Below the pyspark code to read the mongo db collection and display the records in notebook.
# Retrieve records from movie collection matching a query
cursor = video_database["movies"].find({"year": 1975})
# Convert this information into a Pandas dataframe
records = pandas.DataFrame(cursor)
records.head()
Comments
Post a Comment