Introduction ¶
Les tableaux sont essentiels dans le traitement statistiques. A cet effet, plusieurs ados ont été développés:
- xtable (Weverthon Machado)
- desctable (Trenton D. Mize)
- outreg (John Gallup)
- tabout (Ian Watson)
- estout (Ben Jann)
- outreg2 (Roy Wada)
- asdoc (Attaullah Shah)
- summtab (John A. Gallis)
- etc.
Dans ce tutoriel, nous utilisons la base de mon livre, disponible dans mon github. Dans la suite, les commandes de base du logiciel (version 18 et +) seront utilisées pour illustrer les tableaux. Pour les commandes des utilisateurs, comme celles listées ci-haut, les lecteurs interessés peuvent les installer et les exploiter avec les pages d'aide associées.
In [1]:
* Lecture de la base de données
use base_imputed.dta, clear
Fonctions catégorielles ¶
Les plus utilses sont:
- condition if ;
- connecteurs logiques ;
- fonction cond() ;
- fonction cut() ;
- fonction group() ;
- command encode.
In [2]:
* Illustration des variables à utiliser
describe ACTIVITE SEXE AGE SITMAT CHIFFRE IMPOSITION
Variable Storage Display Value
name type format label Variable label
--------------------------------------------------------------------------------
ACTIVITE str15 %15s ACTIVITE
SEXE str5 %9s SEXE
AGE byte %10.0g AGE
SITMAT byte %10.0g SITMAT
CHIFFRE double %10.0g CHIFFRE
IMPOSITION long %10.0g IMPOSITION
Condition if ¶
In [3]:
* Contenu de la variable
codebook SEXE
--------------------------------------------------------------------------------
SEXE SEXE
--------------------------------------------------------------------------------
Type: String (str5)
Unique values: 2 Missing "": 0/1,172
Tabulation: Freq. Value
503 "Femme"
669 "Homme"
In [4]:
* La Condition if
generate byte gender = 1 if SEXE == "Femme"
replace gender = 2 if SEXE == "Homme" // if missing(SEXE)
(669 missing values generated) (669 real changes made)
Connecteurs logiques¶
In [5]:
* Le contenu
codebook REGIME
--------------------------------------------------------------------------------
REGIME REGIME
--------------------------------------------------------------------------------
Type: String (str12)
Unique values: 5 Missing "": 0/1,172
Tabulation: Freq. Value
117 "GIE"
343 "Individuelle"
131 "SA"
470 "SARL"
111 "SUARL"
In [6]:
summarize CHIFFRE, detail
CHIFFRE
-------------------------------------------------------------
Percentiles Smallest
1% 1204788 1028509
5% 2059102 1043389
10% 3523881 1057915 Obs 1,172
25% 6491982 1075342 Sum of wgt. 1,172
50% 2.57e+07 Mean 5.18e+08
Largest Std. dev. 1.56e+09
75% 7.34e+07 8.73e+09
90% 1.12e+09 8.74e+09 Variance 2.44e+18
95% 4.65e+09 8.78e+09 Skewness 3.643051
99% 8.09e+09 8.90e+09 Kurtosis 15.73682
In [7]:
label define ouino 0 Non 1 Oui
generate informel:ouino = (REGIME == "Individuelle" & CHIFFRE <= r(p50))
codebook informel
--------------------------------------------------------------------------------
informel (unlabeled)
--------------------------------------------------------------------------------
Type: Numeric (float)
Label: ouino
Range: [0,1] Units: 1
Unique values: 2 Missing .: 0/1,172
Tabulation: Freq. Numeric Label
854 0 Non
318 1 Oui
In [8]:
summarize AGE
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
AGE | 1,172 47.58959 16.62192 16 80
In [9]:
label define lb_age 1 Jeune 2 Adulte 3 Vieux 4 "Trop Vieux"
generate cl_age:lb_age = 1*(AGE <= 35) + 2*(AGE > 35 & AGE < 60) + ///
3*inrange(AGE, 60, 75) + 4*(AGE > 75)
codebook cl_age
--------------------------------------------------------------------------------
cl_age (unlabeled)
--------------------------------------------------------------------------------
Type: Numeric (float)
Label: lb_age
Range: [1,4] Units: 1
Unique values: 4 Missing .: 0/1,172
Tabulation: Freq. Numeric Label
312 1 Jeune
545 2 Adulte
280 3 Vieux
35 4 Trop Vieux
Fonction cond() ¶
In [10]:
label define lb_impot 1 Abordable 2 Raisonnable 3 Normal 4 Cher 5 "Trop cher"
generate cl_impot:lb_impot = cond(IMPOSITION <= 1000000, 1, ///
cond(inrange(IMPOSITION, 1000001, 2000000), 2, ///
cond(inrange(IMPOSITION, 2000001, 3000000), 3, ///
cond(inrange(IMPOSITION, 3000001, 4000000), 4, 5))))
codebook cl_impot
--------------------------------------------------------------------------------
cl_impot (unlabeled)
--------------------------------------------------------------------------------
Type: Numeric (float)
Label: lb_impot
Range: [1,5] Units: 1
Unique values: 5 Missing .: 0/1,172
Tabulation: Freq. Numeric Label
425 1 Abordable
107 2 Raisonnable
56 3 Normal
61 4 Cher
523 5 Trop cher
Fonction cut() et group()¶
In [11]:
summarize CHIFFRE, detail
CHIFFRE
-------------------------------------------------------------
Percentiles Smallest
1% 1204788 1028509
5% 2059102 1043389
10% 3523881 1057915 Obs 1,172
25% 6491982 1075342 Sum of wgt. 1,172
50% 2.57e+07 Mean 5.18e+08
Largest Std. dev. 1.56e+09
75% 7.34e+07 8.73e+09
90% 1.12e+09 8.74e+09 Variance 2.44e+18
95% 4.65e+09 8.78e+09 Skewness 3.643051
99% 8.09e+09 8.90e+09 Kurtosis 15.73682
In [12]:
return list
scalars:
r(p99) = 8088181760
r(p95) = 4652993536
r(p90) = 1119819264
r(p75) = 73353392
r(p50) = 25711547
r(p25) = 6491981.5
r(p10) = 3523881
r(p5) = 2059102
r(p1) = 1204788
r(max) = 8895640576
r(min) = 1028509
r(sum) = 607382348636.5
r(kurtosis) = 15.73681983555987
r(skewness) = 3.643050819529613
r(sd) = 1560623647.229503
r(Var) = 2.43554616829e+18
r(mean) = 518244324.7751706
r(sum_w) = 1172
r(N) = 1172
In [13]:
label define lb_ca 0 "Très petite" 1 Petite 2 Moyenne 3 Grande
egen cl_ca = cut(CHIFFRE), at(1028509 6491981.5 25711547 73353392 ///
8895640576) icodes
label values cl_ca lb_ca
(1 missing value generated)
In [14]:
egen g_bar = group(SEXE SITMAT), autotype label(lb_var) missing
Commande encode ¶
In [15]:
codebook ACTIVITE
--------------------------------------------------------------------------------
ACTIVITE ACTIVITE
--------------------------------------------------------------------------------
Type: String (str15)
Unique values: 6 Missing "": 0/1,172
Tabulation: Freq. Value
362 "Agriculture"
195 "Autres Services"
62 "BTP"
324 "Commerce"
56 "Industrie"
173 "Transport"
Warning: Variable has embedded blanks.
In [16]:
label define lb_act 1 Agriculture 2 BTP 3 Commerce 4 Industrie ///
5 Transport 6 "Autres Services"
encode ACTIVITE, generate(cl_act) label(lb_act)
describe ACTIVITE cl_act
Variable Storage Display Value
name type format label Variable label
--------------------------------------------------------------------------------
ACTIVITE str15 %15s ACTIVITE
cl_act long %15.0g lb_act ACTIVITE
Tableaux de statistiques ¶
Tableaux simples ¶
In [17]:
tabulate cl_act
ACTIVITE | Freq. Percent Cum.
----------------+-----------------------------------
Agriculture | 362 30.89 30.89
BTP | 62 5.29 36.18
Commerce | 324 27.65 63.82
Industrie | 56 4.78 68.60
Transport | 173 14.76 83.36
Autres Services | 195 16.64 100.00
----------------+-----------------------------------
Total | 1,172 100.00
In [18]:
tabulate cl_act SEXE, chi2
| SEXE
ACTIVITE | Femme Homme | Total
----------------+----------------------+----------
Agriculture | 153 209 | 362
BTP | 29 33 | 62
Commerce | 136 188 | 324
Industrie | 22 34 | 56
Transport | 78 95 | 173
Autres Services | 85 110 | 195
----------------+----------------------+----------
Total | 503 669 | 1,172
Pearson chi2(5) = 1.2265 Pr = 0.942
In [19]:
tabulate cl_act, summarize(CHIFFRE) mean
| Summary of
| CHIFFRE
ACTIVITE | Mean
------------+------------
Agricultu | 7.235e+08
BTP | 5.053e+08
Commerce | 4.610e+08
Industrie | 5.436e+08
Transport | 3.746e+08
Autres Se | 3.565e+08
------------+------------
Total | 5.182e+08
In [20]:
tabulate cl_act SEXE, summarize(CHIFFRE) mean
Means of CHIFFRE
| SEXE
ACTIVITE | Femme Homme | Total
-----------+----------------------+----------
Agricultu | 8.284e+08 6.468e+08 | 7.235e+08
BTP | 2.239e+08 7.525e+08 | 5.053e+08
Commerce | 4.008e+08 5.045e+08 | 4.610e+08
Industrie | 5.258e+08 5.551e+08 | 5.436e+08
Transport | 4.235e+08 3.344e+08 | 3.746e+08
Autres Se | 3.446e+08 3.657e+08 | 3.565e+08
-----------+----------------------+----------
Total | 5.202e+08 5.168e+08 | 5.182e+08
In [21]:
tabstat CHIFFRE CAPITAL IMPOSITION, statistic(mean median) by(ACTIVITE)
Summary statistics: Mean, p50
Group variable: ACTIVITE (ACTIVITE)
ACTIVITE | CHIFFRE CAPITAL IMPOSI~N
----------------+------------------------------
Agriculture | 7.24e+08 9.34e+07 7.14e+07
| 3.00e+07 356692.5 3295370
----------------+------------------------------
Autres Services | 3.57e+08 2.95e+07 3.00e+07
| 2.42e+07 239403 2570393
----------------+------------------------------
BTP | 5.05e+08 6.48e+07 5.11e+07
| 4.55e+07 298346 4786725
----------------+------------------------------
Commerce | 4.61e+08 4.61e+07 4.12e+07
| 2.56e+07 232292.5 2587603
----------------+------------------------------
Industrie | 5.44e+08 7.27e+07 5.99e+07
| 2.31e+07 520180 2874014
----------------+------------------------------
Transport | 3.75e+08 5.04e+07 3.28e+07
| 1.99e+07 236496 2112742
----------------+------------------------------
Total | 5.18e+08 6.08e+07 4.88e+07
| 2.57e+07 301130.5 2851304
-----------------------------------------------
Collection de Tableaux ¶
In [22]:
collect clear
collect create mycol
collect, name(mycol): summarize CHIFFRE
collect layout (result), name(mycol)
collect export Calcul.xlsx, name(mycol) as(xlsx) sheet(essai) replace
(current collection is mycol)
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
CHIFFRE | 1,172 5.18e+08 1.56e+09 1028509 8.90e+09
Collection: mycol
Rows: result
Table 1: 8 x 1
---------------------------------
Number of observations | 1172
Variance | 2.44e+18
Maximum | 8.90e+09
Mean | 5.18e+08
Minimum | 1028509
Std. dev. | 1.56e+09
Sum of variable | 6.07e+11
Sum of the weights | 1172
---------------------------------
(collection mycol exported to file Calcul.xlsx)
In [23]:
collect _r_b _r_p _r_se _r_z: regress CHIFFRE CAPITAL i.EDUCATION
collect layout (result)(var)
collect export Calcul.xlsx, as(xlsx) sheet(essai) cell(B5) modify
Source | SS df MS Number of obs = 1,172
-------------+---------------------------------- F(6, 1165) = 202.61
Model | 1.4563e+21 6 2.4272e+20 Prob > F = 0.0000
Residual | 1.3957e+21 1,165 1.1980e+18 R-squared = 0.5106
-------------+---------------------------------- Adj R-squared = 0.5081
Total | 2.8520e+21 1,171 2.4355e+18 Root MSE = 1.1e+09
------------------------------------------------------------------------------
CHIFFRE | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
CAPITAL | 5.692333 .1641981 34.67 0.000 5.370176 6.01449
|
EDUCATION |
2 | -1.84e+08 1.09e+08 -1.69 0.092 -3.98e+08 3.02e+07
3 | -5.42e+07 1.08e+08 -0.50 0.617 -2.67e+08 1.59e+08
4 | -1.42e+08 1.19e+08 -1.19 0.236 -3.76e+08 9.26e+07
5 | -8.52e+07 1.65e+08 -0.52 0.605 -4.08e+08 2.38e+08
6 | -2.27e+08 3.18e+08 -0.72 0.474 -8.51e+08 3.96e+08
|
_cons | 2.79e+08 9.35e+07 2.99 0.003 9.58e+07 4.63e+08
------------------------------------------------------------------------------
(dimension var not found)
Collection: mycol
Rows: result
Columns: var
Your layout specification does not uniquely match any items. One or both of the
dimensions EDUCATION and colname might help uniquely match items.
(collection mycol exported to file Calcul.xlsx)
In [24]:
dtable, by(cl_ca) continuous(CHIFFRE CAPITAL, statistics(mean) test(kwallis)) ///
factor(gender informel, statistics(fvpercent) test(kendall)) nformat(%12.0f) ///
name(mydtab) export("Calcul.xlsx", as(xlsx) sheet(mafeuil) cell(A2) modify)
---------------------------------------------------------------
cl_ca
Très petite Petite Moyenne Grande Total
---------------------------------------------------------------
N 293 (25%) 293 (25%) 293 (25%) 292 (25%) 1171 (100%)
CHIFFRE 3855450 11830824 49182956 1984520629 511090272
CAPITAL 83261 3004533 3680084 234529440 60175564
gender
1 (45%) (47%) (40%) (39%) (43%)
2 (55%) (53%) (60%) (61%) (57%)
informel
Non (32%) (59%) (100%) (100%) (73%)
Oui (68%) (41%) (0%) (0%) (27%)
---------------------------------------------------------------
(collection mydtab exported to file Calcul.xlsx)
In [28]:
collect drop mytab
table ()(var), statistic(mean CHIFFRE CAPITAL) ///
statistic(fvfrequency informel) ///
command(regress CHIFFRE CAPITAL i.EDUCATION) name(mytab)
collect export Calcul.xlsx, name(mytab) as(xlsx) sheet(table) cell(B2) modify
-----------------------------------------------------------------------------------
| CHIFFRE CAPITAL informel
| Non Oui
------------------------------------+----------------------------------------------
Mean |
Mean |
CHIFFRE | 5.18e+08
CAPITAL | 6.08e+07
Factor-variable frequency |
Factor-variable frequency |
informel=Non | 854
informel=Oui | 318
regress CHIFFRE CAPITAL i.EDUCATION |
Coefficient |
CAPITAL | 5.692333
EDUCATION=1 | 0
EDUCATION=2 | -1.84e+08
EDUCATION=3 | -5.42e+07
EDUCATION=4 | -1.42e+08
EDUCATION=5 | -8.52e+07
EDUCATION=6 | -2.27e+08
Intercept | 2.79e+08
-----------------------------------------------------------------------------------
(collection mytab exported to file Calcul.xlsx)
In [26]:
regress CHIFFRE CAPITAL i.EDUCATION
estimates store mymod
etable, estimates(mymod) export(Calcul.xlsx, sheet(etable, replace) modify) name(myreg)
Source | SS df MS Number of obs = 1,172
-------------+---------------------------------- F(6, 1165) = 202.61
Model | 1.4563e+21 6 2.4272e+20 Prob > F = 0.0000
Residual | 1.3957e+21 1,165 1.1980e+18 R-squared = 0.5106
-------------+---------------------------------- Adj R-squared = 0.5081
Total | 2.8520e+21 1,171 2.4355e+18 Root MSE = 1.1e+09
------------------------------------------------------------------------------
CHIFFRE | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
CAPITAL | 5.692333 .1641981 34.67 0.000 5.370176 6.01449
|
EDUCATION |
2 | -1.84e+08 1.09e+08 -1.69 0.092 -3.98e+08 3.02e+07
3 | -5.42e+07 1.08e+08 -0.50 0.617 -2.67e+08 1.59e+08
4 | -1.42e+08 1.19e+08 -1.19 0.236 -3.76e+08 9.26e+07
5 | -8.52e+07 1.65e+08 -0.52 0.605 -4.08e+08 2.38e+08
6 | -2.27e+08 3.18e+08 -0.72 0.474 -8.51e+08 3.96e+08
|
_cons | 2.79e+08 9.35e+07 2.99 0.003 9.58e+07 4.63e+08
------------------------------------------------------------------------------
---------------------------------
CHIFFRE
---------------------------------
CAPITAL 5.692
(0.164)
EDUCATION
2 -1.84e+08
(1.09e+08)
3 -5.42e+07
(1.08e+08)
4 -1.42e+08
(1.19e+08)
5 -8.52e+07
(1.65e+08)
6 -2.27e+08
(3.18e+08)
Intercept 2.79e+08
(9.35e+07)
Number of observations 1172
---------------------------------
(collection myreg exported to file Calcul.xlsx)
In [27]:
collect dir
Collections in memory Current: myreg ------------------ Name No. items ------------------ mycol 96 mydtab 41 myreg 92 mytab 94 ------------------