PYTHON DATA 시각화 – SEABORN #1

seaborn 과 pandas 차이점

pandas : wide data seaborn : 정돈된 데이터

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
employee = pd.read_csv('data/employee.csv', parse_dates=['HIRE_DATE','JOB_DATE'])
employee
Out[2]:
UNIQUE_ID POSITION_TITLE DEPARTMENT BASE_SALARY RACE EMPLOYMENT_TYPE GENDER EMPLOYMENT_STATUS HIRE_DATE JOB_DATE
0 0 ASSISTANT DIRECTOR (EX LVL) Municipal Courts Department 121862.0 Hispanic/Latino Full Time Female Active 2006-06-12 2012-10-13
1 1 LIBRARY ASSISTANT Library 26125.0 Hispanic/Latino Full Time Female Active 2000-07-19 2010-09-18
2 2 POLICE OFFICER Houston Police Department-HPD 45279.0 White Full Time Male Active 2015-02-03 2015-02-03
3 3 ENGINEER/OPERATOR Houston Fire Department (HFD) 63166.0 White Full Time Male Active 1982-02-08 1991-05-25
4 4 ELECTRICIAN General Services Department 56347.0 White Full Time Male Active 1989-06-19 1994-10-22
... ... ... ... ... ... ... ... ... ... ...
1995 1995 POLICE OFFICER Houston Police Department-HPD 43443.0 White Full Time Male Active 2014-06-09 2015-06-09
1996 1996 COMMUNICATIONS CAPTAIN Houston Fire Department (HFD) 66523.0 Black or African American Full Time Male Active 2003-09-02 2013-10-06
1997 1997 POLICE OFFICER Houston Police Department-HPD 43443.0 White Full Time Male Active 2014-10-13 2015-10-13
1998 1998 POLICE OFFICER Houston Police Department-HPD 55461.0 Asian/Pacific Islander Full Time Male Active 2009-01-20 2011-07-02
1999 1999 FIRE FIGHTER Houston Fire Department (HFD) 51194.0 Hispanic/Latino Full Time Male Active 2009-01-12 2010-07-12

2000 rows × 10 columns

1. 각 부서 개수를 막대그래프고 그리기

1-1. seaborn countplot 그리기

In [4]:
fig, ax = plt.subplots(figsize=(8,6))
sns.countplot(y='DEPARTMENT', data=employee, ax=ax)
fig.savefig('c13-sns1.png', dpi=300, bbox_inches='tight')

1-2. pandas barplot 그리기

In [5]:
fig, ax = plt.subplots(figsize=(8,6))
(
    employee['DEPARTMENT'].value_counts()
    .plot.barh(ax=ax)
)
fig.savefig('c13-sns2.png', dpi=300, bbox_inches='tight')

2. 인종별 평균 급여 그리기

2-1. seaborn barplot 그리기

In [13]:
fig, ax = plt.subplots(figsize=(8,6))
sns.barplot(y='RACE', x='BASE_SALARY', data=employee, ax=ax)
fig.savefig('c13-sns3.png', dpi=300, bbox_inches='tight')

2-2. pandas barplot 그리기

In [15]:
fig, ax = plt.subplots(figsize=(8,6))
(
    employee
    .groupby('RACE', sort=False)
    ['BASE_SALARY'].mean()
    .plot.barh(rot=0, width=.8, ax=ax)
)
ax.set_xlabel('Mean Salay')
fig.savefig('c13-sns4.png', dpi=300, bbox_inches='tight')

3. RACE 와 GENDER 별 평균급여 그리기

3-1 seaborn barplot 그리기

In [16]:
fig, ax = plt.subplots(figsize=(18,6))
sns.barplot(x='RACE',y='BASE_SALARY', hue='GENDER', ax=ax, data=employee, palette='Greys',
            order=['Hispanic/Latino',
                   'Black or African American',
                   'American Indian or Alaskan Native',
                   'Asian/Pacific Islander', 'Others', 'White'])
fig.savefig('c13-sns5.png', dpi=300, bbox_inches='tight')

3-2 pandas barplot 그리기

In [23]:
fig, ax = plt.subplots(figsize=(18,6))
(
    employee
    .groupby(['RACE','GENDER'], sort=False)
    ['BASE_SALARY']
    .mean()
    .unstack('GENDER')
    .sort_values('Female')
    .plot.bar(rot=0, ax=ax, width=.8, cmap='viridis')
)

fig.savefig('c13-sns6.png', dpi=300, bbox_inches='tight')

4. RACE와 GENDER 별 급여를 상자 그림으로 그리기

4-1. seaborn boxplot 그리기

In [25]:
fig, ax = plt.subplots(figsize=(8,6))
sns.boxplot(x='GENDER', y='BASE_SALARY', data=employee, hue='RACE', palette='Greys', ax=ax)
fig.savefig('c13-sns7.png', dpi=300, bbox_inches='tight')

4-2. pandas bocplot 그리기

In [30]:
fig, axs = plt.subplots(1, 2, figsize=(12,6), sharey=True)
for g, ax in zip(['Female', 'Male'], axs):
    (employee
    .query('GENDER == @g')
    .assign(RACE=lambda df_:df_.RACE.fillna('NA'))
    .pivot(columns='RACE')
    ['BASE_SALARY']
    .plot.box(ax=ax, rot=30)
    )
    ax.set_title(g + ' Salary')
    ax.set_xlabel('')
fig.savefig('c13-sns8,png', bbox_inches='tight')
In [ ]:
 

답글 남기기

이메일 주소는 공개되지 않습니다.