Open in app
Home
Notifications
Lists
Stories

Write
Qrious Kamal
Qrious Kamal

25 Followers

Home

13 hours ago

Creating Pyspark ML model pipeline

Objective: This post outlines the end to end Data+ML model pipeline creation process in Pyspark environment. It provides a brief comparison of pyspark over the ubiquitous pandas libraries(atleast in python circles). It then moves directly to a code example pipeline, which starts with training data, and results in prediction dataframe. …

Pyspark

5 min read

Pyspark

5 min read


5 days ago

Exploding JSONL format with Pyspark

JSON Lines is a format used in many locations on the web, and I recently came across the file format in Kaggle competition. The train data in this competition is JSONL format which contains the article_ID, time stamp and the event in each row from the e-commerce site Otto. …

Kaggle

2 min read

Exploding JSONL format with Pyspark
Exploding JSONL format with Pyspark
Kaggle

2 min read


6 days ago

Data Engineering AirBnB data with Pyspark

TL:DR . The gist is one can use Intel i3 / 2GB RAM Linux Machine for “learning Data Engineering”, even complicated data like AirBnB datasets. The post is Learning log about my exploration and learning for past 4 months about Data Engineering My intention to write this post is to…

AWS

9 min read

Data Engineering AirBnB data with Pyspark
Data Engineering AirBnB data with Pyspark
AWS

9 min read


Nov 20

Data Engineer’s Tools of Pyspark

Individual Tools in any trade look trivial when we first use it. As we start mastering many tools, suddenly more problems get solved easily. Such is the case with the following 5 tools in Python. map filter reduce itertools.takewhile itertools.chain lambda function (anonymous function) Pyspark is about map => filter…

Data Engineering

3 min read

Data Engineer’s Tools of Pyspark
Data Engineer’s Tools of Pyspark
Data Engineering

3 min read


Nov 20

AWS EMR : BMW on Rent

AWS EMR or Elastic Map Reduce is a cluster of EC2 nodes that can execute Distributed computing tasks. At the level of disk operations, and motherboards, the cluster is made up of servers with 2 TB (max) HDDs, 32GB RAMS and couple of 8 core processors running in Datacenters. Multiple…

AWS

4 min read

AWS EMR : BMW on Rent
AWS EMR : BMW on Rent
AWS

4 min read


Nov 18

How to learn Data Analysis, Visualisation and much more From ObservableHQ

I have been fascinated with data visualisation and analysis for long time. The more I learnt, I found there was more to learn and explore. ObservableHQ is online platform for showcasing the data visualisation and analysis skills. Exploring the platform, I found the community in observable is kind enough to…

Data

7 min read

How to learn Data Analysis, Visualisation and much more From ObservableHQ
How to learn Data Analysis, Visualisation and much more From ObservableHQ
Data

7 min read


Nov 16

AWS Cloud 9 — Learning Log 9

That single picture is sufficient to tell all the services Cloud 9 can connect in AWS. That is because, AWS spawns a EC2 instance which in turn gives opportunity to work with everything that is on AWS or outstide AWS that can connect through the Network Interface. …

AWS

5 min read

AWS Cloud 9 — Learning Log 9
AWS Cloud 9 — Learning Log 9
AWS

5 min read


Nov 14

GluAWS Glue Partition : Learning Log 8

Partitioning Tables & AWS Glue: Learning Log 8 Someone forgot to tell that Data Engineering is really Data Base Engineering, or I did not read it in between the words. There is no escape from SQL queries any time soon. For most apps, atleast one SQL DB is there in…

AWS

6 min read

GluAWS Glue Partition : Learning Log 8
GluAWS Glue Partition : Learning Log 8
AWS

6 min read


Nov 11

AIDA Super “Charged” : Please keep Charging it

Today there is Attention and then there is Action, we have forgotten the Interest and the Decision part of what Mitch Murry’s top motivator told to those sales guys in Glen Garry and Glen Ross. The Interest and Decision part is what Data Science and Data engineering automates. In…

Data

4 min read

Data

4 min read


Nov 11

Getting the Data in — AWS Databases LL7

There are variety of databases under the SQL and NOSQL umbrella. They come in all flavours and colored logos inside AWS, pretty much like the EC2 instances. Database itself is just a data-structure, which contains huge tables of records, arranged in rows(or columns). To operate on these data-structure, there are…

AWS

5 min read

AWS

5 min read

Qrious Kamal

Qrious Kamal

25 Followers
Following
  • Durga Gadiraju

    Durga Gadiraju

  • Vitalik Buterin

    Vitalik Buterin

  • Salil Arora

    Salil Arora

  • Akash Mathur

    Akash Mathur

  • The New Yorker

    The New Yorker

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech