茄子的个人空间

Pandas学习

字数统计: 461阅读时长: 2 min
2021/08/06
loading

Video url: https://www.youtube.com/watch?v=vmEHCJofslg

Pandas document: https://pandas.pydata.org/pandas-docs/stable/reference/index.html#api

Pracice Pandas: https://stratascratch.com/?via=keith

1. Loading data into Pandas

1.1 Loading data from csv format file

1
2
3
# read data from csv format file
df = pd.read_csv('pokemon_data.csv')
print(df.head(3))

1.2 Loading data from txt format file

1
2
3
# read data from txt format file
df_txt = pd.read_csv('pokemon_data.txt', delimiter='\t')
print(df_txt.head(3))

2. Read Data in Pandas

2.1 Read head

1
print(df.columns)

2.2 Read each Column

1
2
3
print(df['Name'][0:5])
print(df.Name[0:5])
print(df[['Name', 'Type 1', 'HP']][0:5])

2.3 Read each Row

1
2
3
4
5
6
# print the top 4 rows
print(df.head(4))

print(df.iloc[0])
for index, row in df.iterrows():
print(index, row['Name'])

2.4 Read a specific location(R,C)

1
print(df.iloc[2,1])

3. Sorting/Describing Data

3.1 View data describe

1
print(df.describe())

3.2 Sorting

1
sort_res = df.sort_values(['Name','HP'], ascending=[1, 0])

4. Making changes to the data

4.1 Create a new column

1
2
df['Total'] = df['HP'] + df['Attack']
df['Total'] = df.iloc[:,4:10].sum(axis=1)

4.2 Delete one column

1
df = df.drop(columns=['Total'])

5. Save data to file

1
df.to_csv('new_data.csv', index=False, sep='\t')

6. Filtering Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 过滤出满足条件的数据,过滤出来的数据会保留原来的序号
new_df = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison')]

# 将过滤出来的数据的序号进行重置
new_df.reset_index(drop=True, inplace=True)

# 过滤出“Name”列中包含“Mega”字段的数据
new_df = df.loc[~df['Name'].str.contains('Mega')]

# 过滤出“Type 1”列中包含“first”和“grass”字段是数据,并且不区别大小写
df.loc[df['Type 1'].str.contains('fire|Grass',flags=re.I, regex=True)] # Ignore case

# 用正则表达式过滤“Name”列中满足条件的数据
df.loc[df['Name'].str.contains('^pi[a-z]*', flags=re.I, regex=True)]

7. Conditional changes

1
2
# 把type 1 列中满足条件(等于'Fire')的内容改为Flamer
df.loc[df['Type 1'] == 'Fire', 'Type 1'] = 'Flamer'

8. Aggregate Statistics(Groupby)

1
res = df.groupby(['Type 1']).mean().sort_values('HP', ascending=False)

9. Working with large amounts of data

1
2
3
# 分批读取数据,一次读100行
for df in pd.read_csv('pokemon_data.csv', chunk_size=100):
print(df)
CATALOG
  1. 1. 1. Loading data into Pandas
  2. 2. 2. Read Data in Pandas
  3. 3. 3. Sorting/Describing Data
  4. 4. 4. Making changes to the data
  5. 5. 5. Save data to file
  6. 6. 6. Filtering Data
  7. 7. 7. Conditional changes
  8. 8. 8. Aggregate Statistics(Groupby)
  9. 9. 9. Working with large amounts of data