「PythonとJavaScriptではじめるデータビジュアライゼーション」を読む

9.3 インデックスとpandasのデータ選択

#列のカラム名
print(df.columns)
#列数
print(len(df.columns))
Index(['born_in', 'category', 'country', 'date_of_birth', 'date_of_death',
       'gender', 'link', 'name', 'place_of_birth', 'place_of_death', 'text',
       'year'],
      dtype='object')
12
#DataFrameのインデックに使う列を指定する
#インデックスを変更すると新しくDatFrameを生成して返すので
#もとのDataFrameは変更されない
#ここではもとの変数に代入しているので変更されている
df = df.set_index('name')
df.head(2)
born_in category country date_of_birth date_of_death gender link place_of_birth place_of_death text year
name
César Milstein Physiology or Medicine Argentina 8 October 1927 24 March 2002 male http://en.wikipedia.org/wiki/C%C3%A9sar_Milstein Bahía Blanca , Argentina Cambridge , England César Milstein , Physiology or Medicine, 1984 1984
Ivo Andric * Bosnia and Herzegovina Literature 9 October 1892 13 March 1975 male http://en.wikipedia.org/wiki/Ivo_Andric Dolac (village near Travnik), Austria-Hungary ... Belgrade, SR Serbia, SFR Yugoslavia (present-d... Ivo Andric *, born in then Austria–Hungary ,... 1961
#インデックスにした列が最左端にくる
#df = df.set_index('born_in')
#df.head(2)
df.reset_index(inplace=True)
df.head(2)
name born_in category country date_of_birth date_of_death gender link place_of_birth place_of_death text year
0 César Milstein Physiology or Medicine Argentina 8 October 1927 24 March 2002 male http://en.wikipedia.org/wiki/C%C3%A9sar_Milstein Bahía Blanca , Argentina Cambridge , England César Milstein , Physiology or Medicine, 1984 1984
1 Ivo Andric * Bosnia and Herzegovina Literature 9 October 1892 13 March 1975 male http://en.wikipedia.org/wiki/Ivo_Andric Dolac (village near Travnik), Austria-Hungary ... Belgrade, SR Serbia, SFR Yugoslavia (present-d... Ivo Andric *, born in then Austria–Hungary ,... 1961
bi_col = df.born_in
bi_col
0                             
1       Bosnia and Herzegovina
2       Bosnia and Herzegovina
3                             
4                             
5                             
6                             
7                             
8                             
9                             
10                            
11                            
12                            
13                            
14                     Belarus
15                     Belarus
16                     Belarus
17                            
18                            
19                            
20                            
21                            
22                            
23                            
24                            
25                            
26                            
27              Czech Republic
28              Czech Republic
29              Czech Republic
                 ...          
1022                          
1023                   Austria
1024                   Austria
1025                          
1026                          
1027                   Austria
1028                          
1029                          
1030                   Austria
1031                   Austria
1032                          
1033                          
1034                   Austria
1035                 Australia
1036                          
1037                          
1038                          
1039                 Australia
1040                          
1041                 Australia
1042                          
1043                          
1044                          
1045                          
1046                 Australia
1047                          
1048                          
1049                          
1050                          
1051                          
Name: born_in, Length: 1052, dtype: object
type(bi_col)
pandas.core.series.Series
#locはlocationの省略形で位置という意味で使われているもよう
#locはラベルによる行の指定,ilocは番号による行の指定,ixはどっちもOK
df.iloc[0]
name                                                César Milstein
born_in                                                           
category                                    Physiology or Medicine
country                                                  Argentina
date_of_birth                                       8 October 1927
date_of_death                                        24 March 2002
gender                                                        male
link              http://en.wikipedia.org/wiki/C%C3%A9sar_Milstein
place_of_birth                           Bahía Blanca ,  Argentina
place_of_death                                 Cambridge , England
text                 César Milstein , Physiology or Medicine, 1984
year                                                          1984
Name: 0, dtype: object
#2行ある受賞年が1921年なので重複した記載になる(countryがスイスとドイツ2つある)
df.set_index('name', inplace=True)
df.loc['Albert Einstein']
born_in category country date_of_birth date_of_death gender link place_of_birth place_of_death text year
name
Albert Einstein Physics Switzerland 1879-03-14 1955-04-18 male http://en.wikipedia.org/wiki/Albert_Einstein Ulm , Baden-Württemberg , German Empire Princeton, New Jersey , U.S. Albert Einstein , born in Germany , Physics, ... 1921
Albert Einstein Physics Germany 1879-03-14 1955-04-18 male http://en.wikipedia.org/wiki/Albert_Einstein Ulm , Baden-Württemberg , German Empire Princeton, New Jersey , U.S. Albert Einstein , Physics, 1921 1921
df.reset_index(inplace=True)
9.3.1 複数行の選択
mask = df.year > 2000
winners_since_2000 = df[mask]
winners_since_2000.count()
name              202
born_in           202
category          202
country           202
date_of_birth     201
date_of_death     201
gender            200
link              202
place_of_birth    201
place_of_death    201
text              202
year              202
dtype: int64
winners_since_2000.head()
name born_in category country date_of_birth date_of_death gender link place_of_birth place_of_death text year
13 François Englert Physics Belgium 6 November 1932 male http://en.wikipedia.org/wiki/Fran%C3%A7ois_Eng... Etterbeek , Brussels , Belgium François Englert , Physics, 2013 2013
32 Christopher A. Pissarides Economics Cyprus 1948-02-20 male http://en.wikipedia.org/wiki/Christopher_A._Pi... Nicosia, Cyprus Christopher A. Pissarides , Economics, 2010 2010
66 Kofi Annan Peace Ghana 8 April 1938 male http://en.wikipedia.org/wiki/Kofi_Annan Kumasi , Ghana Kofi Annan , Peace, 2001 2001
87 Riccardo Giacconi * Italy Physics October 6, 1931 male http://en.wikipedia.org/wiki/Riccardo_Giacconi Genoa , Italy Riccardo Giacconi *, Physics, 2002 2002
88 Mario Capecchi * Italy Physiology or Medicine 6 October 1937 male http://en.wikipedia.org/wiki/Mario_Capecchi Verona , Italy Mario Capecchi *, Physiology or Medicine, 2007 2007