How to fix ModuleNotFoundError - No module named pymongo in Notebook

One common issues while working with python or spark is getting error message which says - module not found. Module not found here points that proper configuration or installation is required in respect to libray level. Once this is done, application will able to find my required module.

For example, while using below code in Fabric Notebook and trying to connect with MongoDB database and display records, getting no module found error message.




Running the code, it is throwing error - ModuleNotFoundError - No module named pymongo


To fix the abobe issue, it is required to install the required libries.

#install the required packages
!pip install pymongo
!pip install certifi


Once done, Azure notebook now able to connect with Mongo database, and getting the confirmation log from Azure.
Collecting pymongo Downloading pymongo-4.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (676 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 676.9/676.9 kB 17.3 MB/s eta 0:00:00a 0:00:01 Collecting dnspython<3.0.0,>=1.16.0 (from pymongo) Downloading dnspython-2.6.1-py3-none-any.whl (307 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.7/307.7 kB 102.1 MB/s eta 0:00:00 Installing collected packages: dnspython, pymongo Successfully installed dnspython-2.6.1 pymongo-4.6.3 Requirement already satisfied: certifi in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (2023.7.22)




Below the code is now running successfully .

import pymongo
import certifi
import re
import pandas

mongo_connection_string = "<your connection string>"
# Connect to the database using known good certificates
client = pymongo.MongoClient(mongo_connection_string, tlsCAFile=certifi.where())
print(f"Using MongoDB version {client.server_info()['version']}.")

# Check what databases exist on this server
all_databases = client.list_database_names()
print(f"This MongoDB server has the databases {all_databases}")

# If we know the correct database to talk to, we connect like this:
video_database = client['video']

# Here is the list of collections within my database
all_collections = my_database.list_collection_names()
print(f"This database has the collections {all_collections}")


Next step is to read from movies collection and filter records where year is equal to 1975. Following are mongodb records filter with year = 1975.


Below the pyspark code to read the mongo db collection and display the records in notebook.

# Retrieve records from movie collection matching a query
cursor = video_database["movies"].find({"year": 1975})

# Convert this information into a Pandas dataframe
records = pandas.DataFrame(cursor)
records.head()



The same can be achive by SCALA as well, however keep in mind required library need to configure first. Enjoy Fabric!

Comments

Popular posts from this blog

How to fix Azure DevOps error MSB4126

How to create Custom Visuals in Power BI – Initial few Steps

Entity Framework common error - no such table: __EFMigrationsHistory + ConnectionString property has not been initialized + certificate chain was issued by an authority that is not trusted