Some cool Data Science tricks, most of Pandas
This week I have learned quite a few data tricks which I would like to share with you all. It might help you in your own data analysis.
for the analysis, I will be using the COVID vaccination data from Kaggle: https://www.kaggle.com/gpreda/covid-world-vaccination-progress
1st Trick: How to find missing data % in the whole database?
#code
data.isnull().sum()*100/len(data)#outputcountry 0.000000
iso_code 0.000000
date 0.000000
total_vaccinations 37.453558
people_vaccinated 44.355530
people_fully_vaccinated 62.360674
daily_vaccinations_raw 47.342098
daily_vaccinations 2.572163
total_vaccinations_per_hundred 37.453558
people_vaccinated_per_hundred 44.355530
people_fully_vaccinated_per_hundred 62.360674
daily_vaccinations_per_million 2.572163
vaccines 0.000000
source_name 0.000000
source_website 0.000000
dtype: float64
You can see that, out of the total dataset, the total vaccination null values are 37% and daily vaccination per million is almost 2%.
2nd Trick: How to find the correlations among different variables?
# use this code for the correlations
data.corr()
and you could also visualize the data using the seaborn heatmap
# you can ignore cmap and annot
sns.heatmap(data.corr(), cmap='summer',annot=True)
3rd Trick: Visualize the correlations of a single column (total vaccination) with other columns.
It is very handy to understand the correlations between the target and independent variables
#code
data.corr()['total_vaccinations'][1:].plot(kind='bar')
4th Trick: Filter out columns based on a particular data type
# here I have selected data type as object
data.select_dtypes('object').columns#outputIndex(['country', 'iso_code', 'date', 'vaccines', 'source_name',
'source_website'],
dtype='object')
Here we can see that the date is present as object-type data. To get more information from the date data, we need to convert it to datetime. Let’s do that using pandas
data['date'] = pd.to_datetime(data['date'])
5th Trick: Replace some data with a particular value
# replacing AFG country code with Afgan value
data.replace('AFG','Afgan')# to make this changes permanent, we need to put inplace = True
data.replace('AFG','Afgan', inplace=True)
6th Trick: what are *args and *kwargs in python functions?
We often see this in different python methods, functions. Basically *args represent unlimited inputs in a form of tuple and *kwargs represent unlimited inputs in a form of a dictionary. Below examples will clarify the points
# I have declared a function with *args as an input and print 5% of sum
def output(*args):
print(sum(args)*.5)# output
output(1,2,3,4)
5.0# Now let's print out only *args
def output(*args):
print(args)#output
output(1,2,3,4)(1, 2, 3, 4)
# at present it is a tuple. Here you can add any number of inputs as you want
Let’s talk about **kwargs
# define a function
def output_kwargs(**kwargs):
print(kwargs)# if you enter inputs here, it's looks like
output_kwargs(a=1,b=2,c=3)# output
{'a': 1, 'b': 2, 'c': 3}# It is a python dictionary. Now if we want we could do anything out of this dictionary
I hope you like the tricks and methods. I will catch up with you very soon.