Posts

Showing posts from April, 2024

Spark seetings in Microsoft Fabric

Image
Micrsoft Fabric brings many teams in one platform - It joining Data Engineering, Data Science, and Reporting landscape in one platform.  Lakehouse is new concept in Faric and here it is. Question is why we are going to use Lakehouse while Micrsoft do have existing data storage platform. They do have multiple options in Data Engineering area. I believe Microsoft is now looking to operate on a fully managed compute platform that can support Data Engineering and Data Science experiences - Selecting Apache spark features and services, Microsoft Fabric started it's journey. Fabric do using starter pools. With starter pools, we can expect rapid Spark session initialization, typically within 5 to 10 seconds, with no need for manual setup. Starter pools have Spark clusters that are always on and ready for your requests.  Starter pools are a fast and easy way to use Spark on the Microsoft Fabric platform within seconds. You can use Spark sessions right away, instead of waiting for Spark to

How to fix ModuleNotFoundError - No module named pymongo in Notebook

Image
One common issues while working with python or spark is getting error message which says - module not found. Module not found here points that proper configuration or installation is required in respect to libray level. Once this is done, application will able to find my required module. For example, while using below code in Fabric Notebook and trying to connect with MongoDB database and display records, getting no module found error message. Running the code, it is throwing error - ModuleNotFoundError - No module named pymongo To fix the abobe issue, it is required to install the required libries. #install the required packages ! pip install pymongo ! pip install certifi Once done, Azure notebook now able to connect with Mongo database, and getting the confirmation log from Azure. Collecting pymongo Downloading pymongo-4.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (676 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 676.9/676.9 kB 17.3 MB/s eta 0:00:00 a 0:00

Microsoft Fabric - Use Lakehouse to upload source

Image
Microsoft Fabric comes up with multiple capabilities/wings and one of it is Data Enginnering where you brings your data to next generation AI. In Data Enginnering platform, you are going to load your data, perform operation on your data to process it, and finally display your finetune data in nice way using Power BI capabilities. Tables are Files hold your data in Data Enginnering landscape.  Table will allow to hold data in table structure format while you can upload your data file using csv/json/parquet. Upload option is there to upload your file(s) into Microsoft Fabric.              Files uploading in Lakehouse Files uploaded in Lakehouse Simple way to display record is to create one Notebook and drag the file there, Fabric will create the code for you :) Click on the table data, and options are there to display the data in different format. For example, you can view your data in chart format (like bar/chart/pie).   Bar Format Pie Format Notebook where you can tricks using SQL to