print(pd.__version__)
输出:
0.24.1
常见的数据类型:
- 一维: Series
- 二维: DataFrame
- 三维: Panel …
- 四维: Panel4D …
- N维: PanelND …
array = ["粉条", "粉丝", "粉带"] # 如果不指定索引, 默认从0开始; s1 = pd.Series(data=array) print(s1) # 如果不指定索引, 默认从0开始; ss1 = pd.Series(data=array, index=['A', 'B', 'C']) print(ss1)
输出:
0 粉条 1 粉丝 2 粉带 dtype: object A 粉条 B 粉丝 C 粉带 dtype: object
n = np.random.randn(5) # 随机创建一个ndarray对象; s2 = pd.Series(data=n) print(s2) # 修改元素的数据类型; ss2 = s2.astype(np.int) print(ss2)
输出:
0 -1.649755 1 0.607479 2 0.943136 3 -1.794060 4 1.569035 dtype: float64 0 -1 1 0 2 0 3 -1 4 1 dtype: int64
dict = {string.ascii_lowercase[i]:i for i in range(10)} s3 = pd.Series(dict) print(s3)
输出:
a 0 b 1 c 2 d 3 e 4 f 5 g 6 h 7 i 8 j 9 dtype: int64
共同部分:
import pandas as pd import numpy as np import string array = ["粉条", "粉丝", "粉带"] s1 = pd.Series(data=array) print(s1)
输出:
0 粉条 1 粉丝 2 粉带 dtype: object
print(s1.index) #输出:RangeIndex(start=0, stop=3, step=1) s1.index = ['A', 'B', 'C'] print(s1)
输出:
A 粉条 B 粉丝 C 粉带 dtype: object
s1.index = ['A', 'B', 'C'] array = ["粉条", "粉丝", "粉带"] # 如果不指定索引, 默认从0开始; s2 = pd.Series(data=array) s3 = s1.append(s2) print(s3)
输出:
A 粉条 B 粉丝 C 粉带 0 粉条 1 粉丝 2 粉带 dtype: object
s3 = s3.drop('C') # 删除索引为‘C'对应的值; print(s3)
输出:
A 粉条 B 粉丝 0 粉条 1 粉丝 2 粉带 dtype: object
print(s3['B']) #粉丝 s3['B'] = np.nan #索引B处的值替换为缺失值 print(s3)
输出:
A 粉条 B NaN 0 粉条 1 粉丝 2 粉带 dtype: object
print(s3[:2]) #显示前两个元素 print(s3[::-1]) #逆序 print(s3[-2:]) # 显示最后两个元素
输出:
A 粉条 B NaN dtype: object ------------------------- 2 粉带 1 粉丝 0 粉条 B NaN A 粉条 dtype: object ------------------------- 1 粉丝 2 粉带 dtype: object
先设置两个Series对象:
import pandas as pd import numpy as np import string s1 = pd.Series(np.arange(5), index=list(string.ascii_lowercase[:5])) s2 = pd.Series(np.arange(2, 8), index=list(string.ascii_lowercase[2:8])) print(s1) print(s2)
按照对应的索引进行计算, 如果索引不同,则填充为Nan;
print(s1 + s2) print(s1.add(s2))
输出:
a NaN b NaN c 4.0 d 6.0 e 8.0 f NaN g NaN h NaN dtype: float64
print(s1 - s2) print(s1.sub(s2))
输出:
a NaN b NaN c 0.0 d 0.0 e 0.0 f NaN g NaN h NaN dtype: float64
print(s1 * s2) print(s1.mul(s2))
输出:
a NaN b NaN c 4.0 d 9.0 e 16.0 f NaN g NaN h NaN dtype: float64
print(s1 / s2) print(s1.div(s2))
输出:
a NaN b NaN c 1.0 d 1.0 e 1.0 f NaN g NaN h NaN dtype: float64
print(s1.median())
输出:
2.0
print(s1.sum())
输出:
10
print(s1.max())
输出:
4
print(s1.min())
输出:
0
series中的where方法运行结果和numpy中完全不同
import pandas as pd import numpy as np import string s1 = pd.Series(np.arange(5), index=list(string.ascii_lowercase[:5])) print(s1)
输出:
a 0 b 1 c 2 d 3 e 4 dtype: int64
print(s1.where(s1 > 3))
大于3的显示,不大于3的为NaN
# 对象中小于3的元素赋值为10; print(s1.where(s1 > 3, 10))
# 对象中大于3的元素赋值为10; print(s1.mask(s1 > 3, 10))
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:mmqy2019@163.com进行举报,并提供相关证据,查实之后,将立刻删除涉嫌侵权内容。
长按识别二维码并关注微信
更方便到期提醒、手机管理