Table of Contents

## Python Pandas Working With CSV’s Hands-on Solution

### What is Python Pandas?

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name “Pandas” has a reference to both “Panel Data”, and “Python Data Analysis” and was created by Wes McKinney in 2008.

## Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

## What Can Pandas Do?

Pandas gives you answers about the data. Like:

- Is there a correlation between two or more columns?
- What is average value?
- Max value?
- Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called *cleaning* the data.

## Where is the Pandas Codebase?

The source code for Pandas is located at this github repository https://github.com/pandas-dev/pandas

**Fresco Play Hands-on Python Pandas Working With CSV’s**

**Task 1**

• Create a series named heights_A with values 176.2, 158.4, 167.6, 156.2, and 161.4. These values represent heights of 5 students of class A.

• Label each student as s1, s2, s3, s4, and s5.

• Create another series named weights_A with values 85.1, 90.2, 76.8, 80.4, and 78.9. These values represent weights of 5 students of class A.

• Label each student as s1, s2, s3, s4, and s5.

• Create a dataframe named df_A, which contains the height and weight of five students namely s1, s2, s3, s4 and s5.

• Label the columns

as Student_height and Student_weight,

respectively.

• Write the contents of df_A to a CSV file named classA.csv.

Note:Use the to_csv method associated with a data frame.

• Verify if the file classA.csv exists in the present directory using command Is -I.

• You can also view the contents of the file using the command cat classA.csv

**#Write your code here import numpy as np import pandas as pd**

height_A=pd.Series([176.2, 158.4, 167.6, 156.2,161.4], index = [‘s1’, ‘s2’, ‘s3’, ‘s4’, ‘s5’])

weight_A=pd.Series([85.1, 90.2, 76.8, 80.4, 78.9], index = [‘s1’, ‘s2’, ‘s3’, ‘s4’, ‘s5’]) df_A=pd.DataFrame({‘Student_height’:height_A, ‘Student_weight’:weight_A})

df_A.to_csv(“classA.csv”) df_A2=pd.read_csv(“classA.csv”) print(df_A2)

df_A3=pd.read_csv(“classA.csv”, index_col=0) print(df_A3)

np.random.seed(100)

heights_B = pd.Series(np.random.normal(loc= 170.0, scale=25.0, size=5)) np.random.seed(100)

weights_B=pd.Series(np.random.normal(loc= 75.0, scale=12.0, size=5))

df_B=pd.DataFrame({‘Student_height’:heights_B, ‘Student_weight’:weights_B}) df_B.to_csv(“classB.csv”, index=False)

df_B2=pd.read_csv(“classB.csv”)

print(df_B2)

df_B3=pd.read_csv(“classB.csv”, header=None) print(df_B3)

df_B4=pd.read_csv(“classB.csv”, header=None,skiprows=2) print(df_B4)

Run Test: Will be succesful