「ビックデータ分析・活用のためのSQLレシピ」を読む

4章 売上を把握するためのデータ抽出

4-4 時系列に沿ってデータを集約する

4-4-1 日別の売上を集計する

import pandas as pd
import psycopg2
import numpy as np

conn = psycopg2.connect("dbname=BigData host=localhost user=testuser")
df = pd.read_sql("SELECT * FROM purchase_log", conn)
print(df.head(3))

f:id:bitop:20180721075517p:plain

gb = df.groupby('dt')
df1 = pd.DataFrame({'purchase_count':gb['purchase_amount'].count(),
                'total_amount':gb['purchase_amount'].sum(),
                'avg_amount':gb['purchase_amount'].mean()})
print(df1[['purchase_count','total_amount','avg_amount']])

f:id:bitop:20180721075738p:plain

%matplotlib inline
df1['total_amount'].plot(kind='bar',color='r')
df1['avg_amount'].plot(kind='line')

f:id:bitop:20180721075824p:plain