Some cool Data Science tricks, most of Pandas

pic courtesy: sid balachandran unplash

This week I have learned quite a few data tricks which I would like to share with you all. It might help you in your own data analysis.

for the analysis, I will be using the COVID vaccination data from Kaggle: https://www.kaggle.com/gpreda/covid-world-vaccination-progress

1st Trick: How to find missing data % in the whole database?

#code
data.isnull().sum()*100/len(data)
#outputcountry 0.000000
iso_code 0.000000
date 0.000000
total_vaccinations 37.453558
people_vaccinated 44.355530
people_fully_vaccinated 62.360674
daily_vaccinations_raw 47.342098
daily_vaccinations 2.572163
total_vaccinations_per_hundred 37.453558
people_vaccinated_per_hundred 44.355530
people_fully_vaccinated_per_hundred 62.360674
daily_vaccinations_per_million 2.572163
vaccines 0.000000
source_name 0.000000
source_website 0.000000
dtype: float64

You can see that, out of the total dataset, the total vaccination null values are 37% and daily vaccination per million is almost 2%.

2nd Trick: How to find the correlations among different variables?

# use this code for the correlations
data.corr()

and you could also visualize the data using the seaborn heatmap

# you can ignore cmap and annot
sns.heatmap(data.corr(), cmap='summer',annot=True)

3rd Trick: Visualize the correlations of a single column (total vaccination) with other columns.

It is very handy to understand the correlations between the target and independent variables

#code
data.corr()['total_vaccinations'][1:].plot(kind='bar')

4th Trick: Filter out columns based on a particular data type

# here I have selected data type as object
data.select_dtypes('object').columns
#outputIndex(['country', 'iso_code', 'date', 'vaccines', 'source_name',
'source_website'],

dtype='object')

Here we can see that the date is present as object-type data. To get more information from the date data, we need to convert it to datetime. Let’s do that using pandas

data['date'] =  pd.to_datetime(data['date'])

5th Trick: Replace some data with a particular value

# replacing AFG country code with Afgan value
data.replace('AFG','Afgan')
# to make this changes permanent, we need to put inplace = True
data.replace('AFG','Afgan', inplace=True)

6th Trick: what are *args and *kwargs in python functions?

We often see this in different python methods, functions. Basically *args represent unlimited inputs in a form of tuple and *kwargs represent unlimited inputs in a form of a dictionary. Below examples will clarify the points

# I have declared a function with *args as an input and print 5% of sum
def output(*args):
print(sum(args)*.5)
# output
output(1,2,3,4)
5.0
# Now let's print out only *args
def output(*args):
print(args)
#output
output(1,2,3,4)
(1, 2, 3, 4)
# at present it is a tuple. Here you can add any number of inputs as you want

Let’s talk about **kwargs

# define a function
def output_kwargs(**kwargs):
print(kwargs)
# if you enter inputs here, it's looks like
output_kwargs(a=1,b=2,c=3)
# output
{'a': 1, 'b': 2, 'c': 3}
# It is a python dictionary. Now if we want we could do anything out of this dictionary

I hope you like the tricks and methods. I will catch up with you very soon.

Business Analyst/ Data Scientist at Genpact https://www.linkedin.com/in/soumyabratar/