Introduction

Les conditions du modèle linéaire:

  1. variable dépendante continue ;
  2. plus de deux variables explicatives (si deux catégorielles: ANOVA à deux facteurs ou ANOVA factorielle) ;
  3. indépendance des observations ou des résidus (Durbin-Waston test) ;
  4. relation linéaire conjointe et séparement entre la dépendante et les explicatives ;
  5. homoscédasticité des données (nuage de points des résidus studentisés et la prédiction non unstandardized ;
  6. pas de multicolinéarité (coéfficients de correlation ou les valerus de la Tolérence/VIF) ;
  7. Absence de valeurs abérantes ;
  8. distribution normale des résidus (histogram, Normal P-P Plot ou la Normal Q-Q Plot des résidus studentisés.
In [7]:
use effectdata.dta, clear
(Base fictive pour l'évaluation d'impact (Par Ibrahima TALL))
In [8]:
notes: Simulation pour l'évaluation d'impact
In [9]:
describe
Contains data from effectdata.dta
 Observations:           602                  Base fictive pour l'évaluation
                                                d'impact (Par Ibrahima TALL)
    Variables:            15                  7 Nov 2023 14:10
                                              (_dta has notes)
--------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
ID              str4    %9s                   Identifiant du questionnaire
REG             float   %9.0g      REG        Région administrative
CAMP            float   %9.0g                 Campagne (Années) agricole
EDU             float   %9.0g      EDU        Education du CM
AGE             float   %9.0g                 Age du CM
MEN             float   %9.0g                 Taille du ménage
PARC            float   %9.0g                 Nombre de parcelles agricoles
MAT             float   %9.0g                 Nombre de matériels agricoles
CRED            float   %9.0g      OK         Crédit agricole
TYPSEM          float   %9.0g      TYPSEM     Type de semence utilisé
FORM            float   %9.0g      OK         Formation en pratiques agricoles
SITMAT          float   %9.0g      SITMAT     Situation matrimoniale du CM
SEM             float   %9.0g                 Quant de semence reçue de l'Etat
PROD            float   %9.0g                 Production agricole en tonnes
REV             float   %9.0g                 Revenu Moyen (en milliers)
--------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

I. Modèle linéaire

In [46]:
regress PROD SEM REV i.TYPSEM i.EDU i.SITMAT AGE MEN PARC MAT CRED FORM
      Source |       SS           df       MS      Number of obs   =       602
-------------+----------------------------------   F(17, 584)      =     13.44
       Model |  312023.262        17  18354.3095   Prob > F        =    0.0000
    Residual |  797540.107       584  1365.65087   R-squared       =    0.2812
-------------+----------------------------------   Adj R-squared   =    0.2603
       Total |  1109563.37       601  1846.19529   Root MSE        =    36.955

------------------------------------------------------------------------------
        PROD | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         SEM |   .0427787   .0063856     6.70   0.000     .0302373    .0553202
         REV |  -.0329684   .0193828    -1.70   0.089    -.0710369    .0051001
             |
      TYPSEM |
  Améliorée  |   .9009478   3.641197     0.25   0.805    -6.250488    8.052384
      Mixte  |  -2.322426   3.785714    -0.61   0.540    -9.757698    5.112846
             |
         EDU |
   Primaire  |  -13.19697   4.246306    -3.11   0.002    -21.53686   -4.857075
      Moyen  |   -11.2447   4.361401    -2.58   0.010    -19.81064   -2.678756
 Secondaire  |   .7141498   4.253914     0.17   0.867    -7.640683    9.068983
             |
      SITMAT |
   Monogame  |   2.554427   3.750966     0.68   0.496      -4.8126    9.921453
   Polygame  |  -6.709478   5.067001    -1.32   0.186    -16.66124    3.242287
       Veuf  |   3.068596   7.076381     0.43   0.665    -10.82966    16.96685
    Divorcé  |  -13.34076   11.60691    -1.15   0.251    -36.13714    9.455615
             |
         AGE |   -.257942   .2145425    -1.20   0.230    -.6793107    .1634268
         MEN |   .7636438   .7424602     1.03   0.304    -.6945736    2.221861
        PARC |   1.433503   1.074874     1.33   0.183     -.677586    3.544593
         MAT |  -.7227274   .9731831    -0.74   0.458    -2.634093    1.188638
        CRED |   6.110468   3.667312     1.67   0.096     -1.09226     13.3132
        FORM |   -1.63625   3.036183    -0.54   0.590    -7.599419    4.326919
       _cons |   147.6565   14.60293    10.11   0.000     118.9758    176.3371
------------------------------------------------------------------------------
In [20]:
predict prod_xb, xb
predict prod_r, residual
predict prod_rstand, rstandard
predict prod_rstud, rstudent

I.1 Valeurs inhabituelles

Trois types de valeurs:

  • aberrantes: valeurs qui sortent de la plage fréquante ;
  • fortes : valeurs qui peuvent changer le pouvoir explicatif des regresseurs ;
  • influentes : Valeurs qui modifient le résultat de la regression.
In [34]:
graph matrix PROD SEM REV AGE
graph export matgr.png, as(png) replace

file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph3.png saved as PNG format

(file matgr.png not found)
file matgr.png saved as PNG format

file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph4.png saved as PNG format

Il n'y as pas de valeurs inhabituelles particulières.

In [35]:
* Valeurs aberrantes sur les résidus
stem prod_r
Stem-and-leaf plot for prod_r (Residuals)

prod_r rounded to integers

 -8* | 410
 -7. | 8776
 -7* | 433321110
 -6. | 9998877776665
 -6* | 443332200000
 -5. | 987655
 -5* | 44433332111110
 -4. | 998766655555
 -4* | 444433221110000
 -3. | 99998888777666665555555
 -3* | 4444333333222100000
 -2. | 999999888877776666555555
 -2* | 444443333333333222222221111111000000000
 -1. | 99999999888888887777766666666655555555
 -1* | 44433333222221110000
 -0. | 9999999998887777776666655
 -0* | 4444444433333222111111
  0* | 000001111111111222333333344444
  0. | 55556666677777777888899
  1* | 0001111111222222333334444
  1. | 55566666677777777888889999
  2* | 000001112222333334444444
  2. | 555555556778888888888999999999
  3* | 0000000001222223333444444
  3. | 5555555666677777788888999
  4* | 00000111111222334444444
  4. | 555556667888999
  5* | 00111222223333444
  5. | 55555666778888899
  6* | 0011222333
  6. | 5566688
  7* | 23444
  7. | 9
  8* | 4
In [36]:
* Valeurs fortes sur le résidus
predict prod_lev, leverage
stem prod_lev


Stem-and-leaf plot for prod_lev (Leverage)

prod_lev rounded to nearest multiple of .001
plot in units of .001

   1. | 56666666666667777777777778888888888888888888888899999999999999 ... (77)
   2* | 0000000000000000000000000000000011111111111111111111111111111 ... (174)
   2. | 5555555555555555555555555555555555556666666666666666666666666 ... (130)
   3* | 00000000000000000000000000111111111111111111122222222222222233 ... (87)
   3. | 555555555555555556666666666666677777777777788888999999
   4* | 000000000111111122222233344
   4. | 666777788899
   5* | 0011111222224444
   5. | 6667777899
   6* | 2444
   6. | 
   7* | 
   7. | 
   8* | 
   8. | 
   9* | 
   9. | 
  10* | 
  10. | 579
  11* | 1134
  11. | 79
  12* | 4
  12. | 8
In [38]:
lvr2plot, mlabel(ID)
graph export lvr2.png, as(png) replace

file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph5.png saved as PNG format

(file lvr2.png not found)
file lvr2.png saved as PNG format

file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph6.png saved as PNG format

In [41]:
* Cook’s D and DFITS: distance pour évaluer les valeurs aberrentes
predict prod_d, cooksd
In [44]:
sum prod_d
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      prod_d |        602    .0017268     .002373   2.07e-09   .0253942
In [45]:
gsort -prod_d
list ID prod_d if prod_d > 1/100


     +-----------------+
     |   ID     prod_d |
     |-----------------|
  1. | Z54P   .0253942 |
  2. | F67G    .021861 |
  3. | F14G   .0135174 |
  4. | S26T   .0112482 |
  5. | C33Q   .0103544 |
     +-----------------+
In [48]:
predict prod_dfits, dfits
gsort -prod_dfits
list ID prod_dfits if abs(prod_dfits) > 2*sqrt(17/602)


     +------------------+
     |   ID   prod_df~s |
     |------------------|
  1. | Z54P    .6773206 |
  2. | E57P    .4163871 |
  3. | Z35W    .4144422 |
  4. | O85E    .4009565 |
  5. | O80M    .3929912 |
     |------------------|
  6. | B78N    .3768462 |
  7. | Z21U    .3530808 |
  8. | A95L     .341955 |
  9. | R51T    .3390989 |
 10. | A46U    .3375764 |
     |------------------|
589. | F51Z   -.3423429 |
590. | A99E   -.3560463 |
591. | P45U   -.3574846 |
592. | J42J    -.361938 |
593. | T27M   -.3812206 |
     |------------------|
594. | O32F   -.3861555 |
595. | M14R   -.4055046 |
596. | I47A   -.4097126 |
597. | M58X   -.4139673 |
598. | D97K    -.417295 |
     |------------------|
599. | C33Q   -.4319074 |
600. | S26T   -.4508269 |
601. | F14G    -.494814 |
602. | F67G   -.6285325 |
     +------------------+
In [49]:
* how each coefficient is changed by deleting the observation
dfbeta
Generating DFBETA variables ...

    _dfbeta_1: DFBETA SEM
    _dfbeta_2: DFBETA REV
    _dfbeta_3: DFBETA 2.TYPSEM
    _dfbeta_4: DFBETA 3.TYPSEM
    _dfbeta_5: DFBETA 2.EDU
    _dfbeta_6: DFBETA 3.EDU
    _dfbeta_7: DFBETA 4.EDU
    _dfbeta_8: DFBETA 2.SITMAT
    _dfbeta_9: DFBETA 3.SITMAT
   _dfbeta_10: DFBETA 4.SITMAT
   _dfbeta_11: DFBETA 5.SITMAT
   _dfbeta_12: DFBETA AGE
   _dfbeta_13: DFBETA MEN
   _dfbeta_14: DFBETA PARC
   _dfbeta_15: DFBETA MAT
   _dfbeta_16: DFBETA CRED
   _dfbeta_17: DFBETA FORM
In [50]:
scatter _dfbeta_*, ylabel(-1(.5)3) yline(.28 -.28)
graph export nuagedf.png, as(png) replace

file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph7.png saved as PNG format

(file nuagedf.png not found)
file nuagedf.png saved as PNG format

file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph8.png saved as PNG format

In [51]:
* avplot: added-variable plot or partial-regression plot for identifying influential points
avplot SEM, mlabel(ID)
graph export avpsem.png, as(png) replace

file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph9.png saved as PNG format

(file avpsem.png not found)
file avpsem.png saved as PNG format

file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph10.png saved as PNG
    format

In [ ]:
avplots
graph export avpls.png, as(png) replace

In [56]:
regress PROD SEM REV i.TYPSEM i.EDU AGE MEN PARC MAT CRED FORM if abs(prod_dfits) <= 2*sqrt(17/602)
      Source |       SS           df       MS      Number of obs   =       578
-------------+----------------------------------   F(13, 564)      =     18.57
       Model |  294558.409        13  22658.3392   Prob > F        =    0.0000
    Residual |  688036.663       564  1219.92316   R-squared       =    0.2998
-------------+----------------------------------   Adj R-squared   =    0.2836
       Total |  982595.073       577  1702.93773   Root MSE        =    34.927

------------------------------------------------------------------------------
        PROD | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         SEM |   .0421508   .0062292     6.77   0.000     .0299155    .0543861
         REV |  -.0297814   .0186709    -1.60   0.111    -.0664544    .0068915
             |
      TYPSEM |
  Améliorée  |   1.110064   3.503327     0.32   0.751    -5.771097    7.991226
      Mixte  |  -2.100659   3.632077    -0.58   0.563    -9.234708     5.03339
             |
         EDU |
   Primaire  |  -14.62977    4.08162    -3.58   0.000     -22.6468   -6.612733
      Moyen  |   -14.3826   4.198691    -3.43   0.001    -22.62958   -6.135615
 Secondaire  |   1.478126   4.091624     0.36   0.718    -6.558555    9.514807
             |
         AGE |  -.2969653   .2096174    -1.42   0.157    -.7086914    .1147607
         MEN |   .5886134   .7216209     0.82   0.415    -.8287792    2.006006
        PARC |   1.454369   1.044087     1.39   0.164    -.5964049    3.505143
         MAT |  -.4751098   .9413538    -0.50   0.614    -2.324097    1.373878
        CRED |   7.961222   3.510189     2.27   0.024     1.066583    14.85586
        FORM |  -2.198019   2.922501    -0.75   0.452    -7.938334    3.542295
       _cons |   150.3307   14.32676    10.49   0.000     122.1903     178.471
------------------------------------------------------------------------------
In [58]:
estimates store mymodel

I.2 Distribution normale des résidus

Histogram, Normal P-P Plot ou la Normal Q-Q Plot des résidus studentisés.

In [ ]:
kdensity prod_r, normal
graph export kdens.png, as(png) replace
In [ ]:
* residual vs fitted plot
rvfplot //avplot, rvfplot, and rvpplot
In [37]:
testparm i.agedec
 ( 1)  2.agedec = 0
 ( 2)  3.agedec = 0
 ( 3)  4.agedec = 0
 ( 4)  5.agedec = 0
 ( 5)  6.agedec = 0
 ( 6)  7.agedec = 0

       F(  6, 40976) =   20.61
            Prob > F =    0.0000
In [40]:
*limit our examination to only the nonlinear effects
contrast p(2/7).agedec
Contrasts of marginal linear predictions

Margins: asbalanced

------------------------------------------------
             |         df           F        P>F
-------------+----------------------------------
      agedec |
(quadratic)  |          1       26.65     0.0000
    (cubic)  |          1       52.99     0.0000
  (quartic)  |          1        1.18     0.2778
  (quintic)  |          1        1.00     0.3179
   (sextic)  |          1        0.15     0.6959
   (septic)  |          1        0.02     0.8748
      Joint  |          6       20.61     0.0000
             |
 Denominator |      40976
------------------------------------------------

--------------------------------------------------------------
             |   Contrast   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
      agedec |
(quadratic)  |  -.0384272   .0074435     -.0530165   -.0238378
    (cubic)  |   .0466045   .0064025      .0340554    .0591536
  (quartic)  |   .0056375    .005194     -.0045429    .0158178
  (quintic)  |  -.0042917   .0042968     -.0127134      .00413
   (sextic)  |   .0014878   .0038067     -.0059734    .0089489
   (septic)  |  -.0005636    .003578     -.0075765    .0064494
--------------------------------------------------------------
In [ ]:
marginsplot, recast(line) recastci(rarea)
Variables that uniquely identify margins: age
In [54]:
cap drop age_1 age_2
fp <age>: regress educ <age>

(fitting 44 models)
(....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%)

Fractional polynomial comparisons:
------------------------------------------------------------------------------
             | Test              Residual   Deviance
         age |   df   Deviance   std. dev.      diff.       P   Powers
-------------+----------------------------------------------------------------
     omitted |    4  281982.95      3.179   4174.442    0.000               
      linear |    3  279481.55      3.107   1673.048    0.000   1           
       m = 1 |    2  278692.89      3.085    884.386    0.000   3           
       m = 2 |    0  277808.51      3.060      0.000       --   -2 .5       
------------------------------------------------------------------------------
Note: Test df is degrees of freedom, and P = P > F is sig. level for tests
      comparing models vs. model with m = 2 based on deviance difference,
      F(df, 54741).

      Source |       SS           df       MS      Number of obs   =    54,746
-------------+----------------------------------   F(2, 54743)     =   2168.74
       Model |   40608.203         2  20304.1015   Prob > F        =    0.0000
    Residual |  512512.941    54,743  9.36216394   R-squared       =    0.0734
-------------+----------------------------------   Adj R-squared   =    0.0734
       Total |  553121.144    54,745   10.103592   Root MSE        =    3.0598

------------------------------------------------------------------------------
        educ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       age_1 |  -2060.389   45.13987   -45.64   0.000    -2148.864   -1971.915
       age_2 |  -1.367772   .0219094   -62.43   0.000    -1.410714   -1.324829
       _cons |   23.36296   .1769524   132.03   0.000     23.01613    23.70978
------------------------------------------------------------------------------