「ビックデータ分析・活用のためのSQLレシピ」を読む

3-3-2 グループの中での順序を扱う

conn = psycopg2.connect("dbname=BigData host=localhost user=testuser")
df = pd.read_sql("SELECT * FROM popular_products", conn)
print(df)

f:id:bitop:20180714125641p:plain

df['row'] = df['score'].rank(
    ascending=False,
    method='first'
).astype(int)
df['rank'] = df['score'].rank(
   ascending=False,
   method='min'
).astype(int)
df['dense_rank'] = df['score'].rank(
   ascending=False,
   method='dense'
).astype(int)
    df['lag1'] = df['product_id'].shift(1)
    df['lag2'] = df['product_id'].shift(2)
    df['lead1'] = df['product_id'].shift(-1)
    df['lead2'] = df['product_id'].shift(-2)
print(df)

f:id:bitop:20180714125732p:plain

ORDER BY句と集約関数を組み合わせる

conn = psycopg2.connect("dbname=BigData host=localhost user=testuser")
df = pd.read_sql("SELECT * FROM popular_products", conn)
print(df)

f:id:bitop:20180714125846p:plain

df['row'] = df['score'].rank(
    ascending=False,
    method='first'
).astype(int)
df['cum_score'] = df['score'].cumsum()
df['loacl_avg'] = 
df['score'].rolling(window=3,center=True,min_periods=1).mean()
df['first_value'] = df['product_id'][df['score'].idxmax()]
df['last_value'] = df['product_id'][df['score'].idxmin()]
print(df)

f:id:bitop:20180714125959p:plain

PATITION BY とORDER BYを組み合わせる

いろいろ試してみたができなかった

3-3-3 縦持ちデータを横もちにする

import pandas as pd
import psycopg2
import numpy as np

conn = psycopg2.connect("dbname=BigData host=localhost user=testuser")
df = pd.read_sql("SELECT * FROM daily_kpi", conn)
print(df)

f:id:bitop:20180714130128p:plain

pd.pivot_table(df,index='dt',columns='indicator')

f:id:bitop:20180714130203p:plain

3-3-4 横もちを縦持ちにする

conn = psycopg2.connect("dbname=BigData host=localhost user=testuser")
df = pd.read_sql("SELECT * FROM quarterly_sales", conn)
print(df)

f:id:bitop:20180714130254p:plain

df.stack()

f:id:bitop:20180714130431p:plain