Introduction ¶

Les tableaux sont essentiels dans le traitement statistiques. A cet effet, plusieurs ados ont été développés:

  • xtable (Weverthon Machado)
  • desctable (Trenton D. Mize)
  • outreg (John Gallup)
  • tabout (Ian Watson)
  • estout (Ben Jann)
  • outreg2 (Roy Wada)
  • asdoc (Attaullah Shah)
  • summtab (John A. Gallis)
  • etc.

Dans ce tutoriel, nous utilisons la base de mon livre, disponible dans mon github. Dans la suite, les commandes de base du logiciel (version 18 et +) seront utilisées pour illustrer les tableaux. Pour les commandes des utilisateurs, comme celles listées ci-haut, les lecteurs interessés peuvent les installer et les exploiter avec les pages d'aide associées.

In [1]:
* Lecture de la base de données
use base_imputed.dta, clear

Fonctions catégorielles ¶

Les plus utilses sont:

  1. condition if ;
  2. connecteurs logiques ;
  3. fonction cond() ;
  4. fonction cut() ;
  5. fonction group() ;
  6. command encode.
In [2]:
* Illustration des variables à utiliser
describe ACTIVITE SEXE AGE SITMAT CHIFFRE IMPOSITION
Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
ACTIVITE        str15   %15s                  ACTIVITE
SEXE            str5    %9s                   SEXE
AGE             byte    %10.0g                AGE
SITMAT          byte    %10.0g                SITMAT
CHIFFRE         double  %10.0g                CHIFFRE
IMPOSITION      long    %10.0g                IMPOSITION

Condition if ¶

In [3]:
* Contenu de la variable
codebook SEXE
--------------------------------------------------------------------------------
SEXE                                                                        SEXE
--------------------------------------------------------------------------------

                  Type: String (str5)

         Unique values: 2                         Missing "": 0/1,172

            Tabulation: Freq.  Value
                          503  "Femme"
                          669  "Homme"
In [4]:
* La Condition if
generate byte gender = 1 if SEXE == "Femme"
replace gender = 2 if SEXE == "Homme" // if missing(SEXE)
(669 missing values generated)

(669 real changes made)

Connecteurs logiques¶

In [5]:
* Le contenu
codebook REGIME
--------------------------------------------------------------------------------
REGIME                                                                    REGIME
--------------------------------------------------------------------------------

                  Type: String (str12)

         Unique values: 5                         Missing "": 0/1,172

            Tabulation: Freq.  Value
                          117  "GIE"
                          343  "Individuelle"
                          131  "SA"
                          470  "SARL"
                          111  "SUARL"
In [6]:
summarize CHIFFRE, detail
                           CHIFFRE
-------------------------------------------------------------
      Percentiles      Smallest
 1%      1204788        1028509
 5%      2059102        1043389
10%      3523881        1057915       Obs               1,172
25%      6491982        1075342       Sum of wgt.       1,172

50%     2.57e+07                      Mean           5.18e+08
                        Largest       Std. dev.      1.56e+09
75%     7.34e+07       8.73e+09
90%     1.12e+09       8.74e+09       Variance       2.44e+18
95%     4.65e+09       8.78e+09       Skewness       3.643051
99%     8.09e+09       8.90e+09       Kurtosis       15.73682
In [7]:
label define ouino 0 Non 1 Oui
generate informel:ouino = (REGIME == "Individuelle" & CHIFFRE <= r(p50))
codebook informel



--------------------------------------------------------------------------------
informel                                                             (unlabeled)
--------------------------------------------------------------------------------

                  Type: Numeric (float)
                 Label: ouino

                 Range: [0,1]                         Units: 1
         Unique values: 2                         Missing .: 0/1,172

            Tabulation: Freq.   Numeric  Label
                          854         0  Non
                          318         1  Oui
In [8]:
summarize AGE
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         AGE |      1,172    47.58959    16.62192         16         80
In [9]:
label define lb_age 1 Jeune 2 Adulte 3 Vieux 4 "Trop Vieux"
generate cl_age:lb_age = 1*(AGE <= 35) + 2*(AGE > 35 & AGE < 60) + ///
 3*inrange(AGE, 60, 75) + 4*(AGE > 75)
codebook cl_age



--------------------------------------------------------------------------------
cl_age                                                               (unlabeled)
--------------------------------------------------------------------------------

                  Type: Numeric (float)
                 Label: lb_age

                 Range: [1,4]                         Units: 1
         Unique values: 4                         Missing .: 0/1,172

            Tabulation: Freq.   Numeric  Label
                          312         1  Jeune
                          545         2  Adulte
                          280         3  Vieux
                           35         4  Trop Vieux

Fonction cond() ¶

In [10]:
label define lb_impot 1 Abordable 2 Raisonnable 3 Normal 4 Cher 5 "Trop cher"
generate cl_impot:lb_impot = cond(IMPOSITION <= 1000000, 1, ///
    cond(inrange(IMPOSITION, 1000001, 2000000), 2, ///
    cond(inrange(IMPOSITION, 2000001, 3000000), 3, ///
    cond(inrange(IMPOSITION, 3000001, 4000000), 4, 5))))
codebook cl_impot



--------------------------------------------------------------------------------
cl_impot                                                             (unlabeled)
--------------------------------------------------------------------------------

                  Type: Numeric (float)
                 Label: lb_impot

                 Range: [1,5]                         Units: 1
         Unique values: 5                         Missing .: 0/1,172

            Tabulation: Freq.   Numeric  Label
                          425         1  Abordable
                          107         2  Raisonnable
                           56         3  Normal
                           61         4  Cher
                          523         5  Trop cher

Fonction cut() et group()¶

In [11]:
summarize CHIFFRE, detail
                           CHIFFRE
-------------------------------------------------------------
      Percentiles      Smallest
 1%      1204788        1028509
 5%      2059102        1043389
10%      3523881        1057915       Obs               1,172
25%      6491982        1075342       Sum of wgt.       1,172

50%     2.57e+07                      Mean           5.18e+08
                        Largest       Std. dev.      1.56e+09
75%     7.34e+07       8.73e+09
90%     1.12e+09       8.74e+09       Variance       2.44e+18
95%     4.65e+09       8.78e+09       Skewness       3.643051
99%     8.09e+09       8.90e+09       Kurtosis       15.73682
In [12]:
return list
scalars:
                r(p99) =  8088181760
                r(p95) =  4652993536
                r(p90) =  1119819264
                r(p75) =  73353392
                r(p50) =  25711547
                r(p25) =  6491981.5
                r(p10) =  3523881
                 r(p5) =  2059102
                 r(p1) =  1204788
                r(max) =  8895640576
                r(min) =  1028509
                r(sum) =  607382348636.5
           r(kurtosis) =  15.73681983555987
           r(skewness) =  3.643050819529613
                 r(sd) =  1560623647.229503
                r(Var) =  2.43554616829e+18
               r(mean) =  518244324.7751706
              r(sum_w) =  1172
                  r(N) =  1172
In [13]:
label define lb_ca 0 "Très petite" 1 Petite 2 Moyenne 3 Grande
egen cl_ca = cut(CHIFFRE), at(1028509 6491981.5 25711547 73353392 ///
 8895640576) icodes
label values cl_ca lb_ca

(1 missing value generated)

In [14]:
egen g_bar = group(SEXE SITMAT), autotype label(lb_var) missing

Commande encode ¶

In [15]:
codebook ACTIVITE
--------------------------------------------------------------------------------
ACTIVITE                                                                ACTIVITE
--------------------------------------------------------------------------------

                  Type: String (str15)

         Unique values: 6                         Missing "": 0/1,172

            Tabulation: Freq.  Value
                          362  "Agriculture"
                          195  "Autres Services"
                           62  "BTP"
                          324  "Commerce"
                           56  "Industrie"
                          173  "Transport"

               Warning: Variable has embedded blanks.
In [16]:
label define lb_act 1 Agriculture 2 BTP 3 Commerce 4 Industrie ///
 5 Transport 6 "Autres Services"
encode ACTIVITE, generate(cl_act) label(lb_act) 
describe ACTIVITE cl_act



Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
ACTIVITE        str15   %15s                  ACTIVITE
cl_act          long    %15.0g     lb_act     ACTIVITE

Tableaux de statistiques ¶

Tableaux simples ¶

In [17]:
tabulate cl_act
       ACTIVITE |      Freq.     Percent        Cum.
----------------+-----------------------------------
    Agriculture |        362       30.89       30.89
            BTP |         62        5.29       36.18
       Commerce |        324       27.65       63.82
      Industrie |         56        4.78       68.60
      Transport |        173       14.76       83.36
Autres Services |        195       16.64      100.00
----------------+-----------------------------------
          Total |      1,172      100.00
In [18]:
tabulate cl_act SEXE, chi2
                |         SEXE
       ACTIVITE |     Femme      Homme |     Total
----------------+----------------------+----------
    Agriculture |       153        209 |       362 
            BTP |        29         33 |        62 
       Commerce |       136        188 |       324 
      Industrie |        22         34 |        56 
      Transport |        78         95 |       173 
Autres Services |        85        110 |       195 
----------------+----------------------+----------
          Total |       503        669 |     1,172 

          Pearson chi2(5) =   1.2265   Pr = 0.942
In [19]:
tabulate cl_act, summarize(CHIFFRE) mean
            | Summary of
            |   CHIFFRE
   ACTIVITE |        Mean
------------+------------
  Agricultu |   7.235e+08
        BTP |   5.053e+08
   Commerce |   4.610e+08
  Industrie |   5.436e+08
  Transport |   3.746e+08
  Autres Se |   3.565e+08
------------+------------
      Total |   5.182e+08
In [20]:
tabulate cl_act SEXE, summarize(CHIFFRE) mean
                             Means of CHIFFRE

           |        SEXE
  ACTIVITE |     Femme      Homme |     Total
-----------+----------------------+----------
 Agricultu | 8.284e+08  6.468e+08 | 7.235e+08
       BTP | 2.239e+08  7.525e+08 | 5.053e+08
  Commerce | 4.008e+08  5.045e+08 | 4.610e+08
 Industrie | 5.258e+08  5.551e+08 | 5.436e+08
 Transport | 4.235e+08  3.344e+08 | 3.746e+08
 Autres Se | 3.446e+08  3.657e+08 | 3.565e+08
-----------+----------------------+----------
     Total | 5.202e+08  5.168e+08 | 5.182e+08
In [21]:
tabstat CHIFFRE CAPITAL IMPOSITION, statistic(mean median) by(ACTIVITE)
Summary statistics: Mean, p50
Group variable: ACTIVITE (ACTIVITE)

       ACTIVITE |   CHIFFRE   CAPITAL  IMPOSI~N
----------------+------------------------------
    Agriculture |  7.24e+08  9.34e+07  7.14e+07
                |  3.00e+07  356692.5   3295370
----------------+------------------------------
Autres Services |  3.57e+08  2.95e+07  3.00e+07
                |  2.42e+07    239403   2570393
----------------+------------------------------
            BTP |  5.05e+08  6.48e+07  5.11e+07
                |  4.55e+07    298346   4786725
----------------+------------------------------
       Commerce |  4.61e+08  4.61e+07  4.12e+07
                |  2.56e+07  232292.5   2587603
----------------+------------------------------
      Industrie |  5.44e+08  7.27e+07  5.99e+07
                |  2.31e+07    520180   2874014
----------------+------------------------------
      Transport |  3.75e+08  5.04e+07  3.28e+07
                |  1.99e+07    236496   2112742
----------------+------------------------------
          Total |  5.18e+08  6.08e+07  4.88e+07
                |  2.57e+07  301130.5   2851304
-----------------------------------------------

Collection de Tableaux ¶

In [22]:
collect clear
collect create mycol
collect, name(mycol): summarize CHIFFRE
collect layout (result), name(mycol)
collect export Calcul.xlsx, name(mycol) as(xlsx) sheet(essai) replace

(current collection is mycol)


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     CHIFFRE |      1,172    5.18e+08    1.56e+09    1028509   8.90e+09


Collection: mycol
      Rows: result
   Table 1: 8 x 1

---------------------------------
Number of observations |     1172
Variance               | 2.44e+18
Maximum                | 8.90e+09
Mean                   | 5.18e+08
Minimum                |  1028509
Std. dev.              | 1.56e+09
Sum of variable        | 6.07e+11
Sum of the weights     |     1172
---------------------------------

(collection mycol exported to file Calcul.xlsx)
In [23]:
collect _r_b _r_p _r_se _r_z: regress CHIFFRE CAPITAL i.EDUCATION
collect layout (result)(var)
collect export Calcul.xlsx, as(xlsx) sheet(essai) cell(B5) modify

      Source |       SS           df       MS      Number of obs   =     1,172
-------------+----------------------------------   F(6, 1165)      =    202.61
       Model |  1.4563e+21         6  2.4272e+20   Prob > F        =    0.0000
    Residual |  1.3957e+21     1,165  1.1980e+18   R-squared       =    0.5106
-------------+----------------------------------   Adj R-squared   =    0.5081
       Total |  2.8520e+21     1,171  2.4355e+18   Root MSE        =    1.1e+09

------------------------------------------------------------------------------
     CHIFFRE | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     CAPITAL |   5.692333   .1641981    34.67   0.000     5.370176     6.01449
             |
   EDUCATION |
          2  |  -1.84e+08   1.09e+08    -1.69   0.092    -3.98e+08    3.02e+07
          3  |  -5.42e+07   1.08e+08    -0.50   0.617    -2.67e+08    1.59e+08
          4  |  -1.42e+08   1.19e+08    -1.19   0.236    -3.76e+08    9.26e+07
          5  |  -8.52e+07   1.65e+08    -0.52   0.605    -4.08e+08    2.38e+08
          6  |  -2.27e+08   3.18e+08    -0.72   0.474    -8.51e+08    3.96e+08
             |
       _cons |   2.79e+08   9.35e+07     2.99   0.003     9.58e+07    4.63e+08
------------------------------------------------------------------------------

(dimension var not found)

Collection: mycol
      Rows: result
   Columns: var

Your layout specification does not uniquely match any items. One or both of the
dimensions EDUCATION and colname might help uniquely match items.

(collection mycol exported to file Calcul.xlsx)
In [24]:
dtable, by(cl_ca) continuous(CHIFFRE CAPITAL, statistics(mean) test(kwallis)) ///
 factor(gender informel, statistics(fvpercent) test(kendall)) nformat(%12.0f) ///
 name(mydtab) export("Calcul.xlsx", as(xlsx) sheet(mafeuil) cell(A2) modify)
---------------------------------------------------------------
                                  cl_ca                        
         Très petite   Petite   Moyenne    Grande      Total   
---------------------------------------------------------------
N          293 (25%) 293 (25%) 293 (25%)  292 (25%) 1171 (100%)
CHIFFRE      3855450  11830824  49182956 1984520629   511090272
CAPITAL        83261   3004533   3680084  234529440    60175564
gender                                                         
  1            (45%)     (47%)     (40%)      (39%)       (43%)
  2            (55%)     (53%)     (60%)      (61%)       (57%)
informel                                                       
  Non          (32%)     (59%)    (100%)     (100%)       (73%)
  Oui          (68%)     (41%)      (0%)       (0%)       (27%)
---------------------------------------------------------------
(collection mydtab exported to file Calcul.xlsx)
In [28]:
collect drop mytab
table ()(var), statistic(mean CHIFFRE CAPITAL) ///
 statistic(fvfrequency informel) ///
 command(regress CHIFFRE CAPITAL i.EDUCATION) name(mytab)
collect export Calcul.xlsx, name(mytab) as(xlsx) sheet(table) cell(B2) modify


-----------------------------------------------------------------------------------
                                    |   CHIFFRE    CAPITAL    informel             
                                    |                         Non   Oui            
------------------------------------+----------------------------------------------
Mean                                |                                              
  Mean                              |                                              
    CHIFFRE                         |  5.18e+08                                    
    CAPITAL                         |             6.08e+07                         
Factor-variable frequency           |                                              
  Factor-variable frequency         |                                              
    informel=Non                    |                         854                  
    informel=Oui                    |                               318            
regress CHIFFRE CAPITAL i.EDUCATION |                                              
  Coefficient                       |                                              
    CAPITAL                         |                                      5.692333
    EDUCATION=1                     |                                             0
    EDUCATION=2                     |                                     -1.84e+08
    EDUCATION=3                     |                                     -5.42e+07
    EDUCATION=4                     |                                     -1.42e+08
    EDUCATION=5                     |                                     -8.52e+07
    EDUCATION=6                     |                                     -2.27e+08
    Intercept                       |                                      2.79e+08
-----------------------------------------------------------------------------------

(collection mytab exported to file Calcul.xlsx)
In [26]:
regress CHIFFRE CAPITAL i.EDUCATION
estimates store mymod
etable, estimates(mymod) export(Calcul.xlsx, sheet(etable, replace) modify) name(myreg)

      Source |       SS           df       MS      Number of obs   =     1,172
-------------+----------------------------------   F(6, 1165)      =    202.61
       Model |  1.4563e+21         6  2.4272e+20   Prob > F        =    0.0000
    Residual |  1.3957e+21     1,165  1.1980e+18   R-squared       =    0.5106
-------------+----------------------------------   Adj R-squared   =    0.5081
       Total |  2.8520e+21     1,171  2.4355e+18   Root MSE        =    1.1e+09

------------------------------------------------------------------------------
     CHIFFRE | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     CAPITAL |   5.692333   .1641981    34.67   0.000     5.370176     6.01449
             |
   EDUCATION |
          2  |  -1.84e+08   1.09e+08    -1.69   0.092    -3.98e+08    3.02e+07
          3  |  -5.42e+07   1.08e+08    -0.50   0.617    -2.67e+08    1.59e+08
          4  |  -1.42e+08   1.19e+08    -1.19   0.236    -3.76e+08    9.26e+07
          5  |  -8.52e+07   1.65e+08    -0.52   0.605    -4.08e+08    2.38e+08
          6  |  -2.27e+08   3.18e+08    -0.72   0.474    -8.51e+08    3.96e+08
             |
       _cons |   2.79e+08   9.35e+07     2.99   0.003     9.58e+07    4.63e+08
------------------------------------------------------------------------------



---------------------------------
                         CHIFFRE 
---------------------------------
CAPITAL                     5.692
                          (0.164)
EDUCATION                        
  2                     -1.84e+08
                       (1.09e+08)
  3                     -5.42e+07
                       (1.08e+08)
  4                     -1.42e+08
                       (1.19e+08)
  5                     -8.52e+07
                       (1.65e+08)
  6                     -2.27e+08
                       (3.18e+08)
Intercept                2.79e+08
                       (9.35e+07)
Number of observations       1172
---------------------------------
(collection myreg exported to file Calcul.xlsx)
In [27]:
collect dir
Collections in memory
Current: myreg
------------------
  Name   No. items
------------------
 mycol   96       
mydtab   41       
 myreg   92       
 mytab   94       
------------------