R, python을 이용한 기술통계

250x250

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

여정의 기록

R, python을 이용한 기술통계 본문

공부중 .../파이썬과 R

R, python을 이용한 기술통계

Chelsey 2022. 12. 9. 03:25

728x90

R

median : (n+1)/2 번째의 값

min, Q1, Median, Q3, max

library(psych)

describe(data)

# 정확한 라이브러리의 함수를 불러올땐
psych::describe(data)

sapply

sapply(score[ , -c(1:2)], mean, na.rm=TRUE)

summary( data[ , -1])

그룹별로 기술통계 하는 방법

tapply

tapply(구하려는 변수, 그룹변수, 통계량)
tapply(data$total, data$gender, mean)

aggregate

aggregate(data[c("col1","col2","col3")], list(col4=data$col4), mean)
# col4에 대한 mean값 리턴

by

by(data, data$col4, summary)

describeBy

describeBy(data, data$col4)

줄기 | 잎 그림

stem(data$total, scale=2)

stem.leaf.backback(data1, data2)

괄호친 부분이 중앙값

boxplot() , hist()

table() 함수로 값 종류마다 카운트 값을 얻을 수 있다.

xtabs(~row1+col1, data=data) : 행과 열을 설정하여 데이터를 만들 수 있다.

chisq.test(data) : X-squared 값, p-value 를 얻을 수 있다.

Python

data.mean()
data.std() # 표준편차
data.median()
data.quantile(0.75) # 3사분위수
data.quantile(0.25) # 1사분위수

# 이를 아래로 한번에 구할 수 있다.
data.describe()

data_des = data.describe()
data_des.loc['count'] # count 값을 리턴한다
data_des.loc['mean'] # mean 값 return

왜도 skewness 와 첨도 kurtosis

from scipy.stats import skew

skew(data) # 왜도

kurtosis(data) # 첨도

왜도 : 정규분포에 비해서 데이터가 좌, 우 어디로 치우쳐졌는가? (positive skew(좌), negative skew(우))
첨도 : 분포의 꼬리. outlier와 연결 ... 값이 큰것으로 outlier가 있는지 알 수 있다.

data group 해서 기술통계 추출하기

data_g_t = data.groupby('gender')['total']
data_g_t.agg(['size','mean','std','min','max'])

researchpy : 기술통계량 구하는 라이브러리

rp.summary_cont(data['total'])

줄이-잎 그리기

# stemgraphic 라이브러리 사용

import stemgraphic

stemgraphic.stem_graphic(data.total, scale=10)

scale : 줄기 갯수

boxplot

import seaborn as sns

sns.set(style="whitegrid")
sns.boxplot(x="total", data=data)
# 그래프 방향 바꿀때
sns.boxplot(y="total", data=data)

hist

plt.hist(data.total)

pd.crosstab

R의 xtab 같은것 ?

df = pd.crosstab(index=data.grade, columns=data.q1)
df.index = ["one","two","three"]

chi2_contigency

from scipy.stats import chi2_contingency

chi2_contingency(data)

728x90

저작자표시 비영리 변경금지

'공부중 ... > 파이썬과 R' 카테고리의 다른 글

분산분석이란 ... (0)	2022.12.11
R, Python을 이용한 산점도, 상관계수, 회귀분석 (0)	2022.12.09
t-검정에 대하여 (0)	2022.12.09
자료의 입력과 출력 (0)	2022.12.07
[파이썬과 R] R과 Python 비교 코드 (0)	2022.09.12

'공부중 .../파이썬과 R' Related Articles

여정의 기록

R, python을 이용한 기술통계 본문

R, python을 이용한 기술통계

R

sapply

그룹별로 기술통계 하는 방법

tapply

aggregate

by

describeBy

Python

왜도 skewness 와 첨도 kurtosis

줄이-잎 그리기

boxplot

hist

pd.crosstab

chi2_contigency

'공부중 ... > 파이썬과 R' 카테고리의 다른 글

티스토리툴바