Introduction
Les conditions du modèle linéaire:
- variable dépendante continue ;
- plus de deux variables explicatives (si deux catégorielles: ANOVA à deux facteurs ou ANOVA factorielle) ;
- indépendance des observations ou des résidus (Durbin-Waston test) ;
- relation linéaire conjointe et séparement entre la dépendante et les explicatives ;
- homoscédasticité des données (nuage de points des résidus studentisés et la prédiction non unstandardized ;
- pas de multicolinéarité (coéfficients de correlation ou les valerus de la Tolérence/VIF) ;
- Absence de valeurs abérantes ;
- distribution normale des résidus (histogram, Normal P-P Plot ou la Normal Q-Q Plot des résidus studentisés.
In [7]:
use effectdata.dta, clear
(Base fictive pour l'évaluation d'impact (Par Ibrahima TALL))
In [8]:
notes: Simulation pour l'évaluation d'impact
In [9]:
describe
Contains data from effectdata.dta
Observations: 602 Base fictive pour l'évaluation
d'impact (Par Ibrahima TALL)
Variables: 15 7 Nov 2023 14:10
(_dta has notes)
--------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
--------------------------------------------------------------------------------
ID str4 %9s Identifiant du questionnaire
REG float %9.0g REG Région administrative
CAMP float %9.0g Campagne (Années) agricole
EDU float %9.0g EDU Education du CM
AGE float %9.0g Age du CM
MEN float %9.0g Taille du ménage
PARC float %9.0g Nombre de parcelles agricoles
MAT float %9.0g Nombre de matériels agricoles
CRED float %9.0g OK Crédit agricole
TYPSEM float %9.0g TYPSEM Type de semence utilisé
FORM float %9.0g OK Formation en pratiques agricoles
SITMAT float %9.0g SITMAT Situation matrimoniale du CM
SEM float %9.0g Quant de semence reçue de l'Etat
PROD float %9.0g Production agricole en tonnes
REV float %9.0g Revenu Moyen (en milliers)
--------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
I. Modèle linéaire
In [46]:
regress PROD SEM REV i.TYPSEM i.EDU i.SITMAT AGE MEN PARC MAT CRED FORM
Source | SS df MS Number of obs = 602
-------------+---------------------------------- F(17, 584) = 13.44
Model | 312023.262 17 18354.3095 Prob > F = 0.0000
Residual | 797540.107 584 1365.65087 R-squared = 0.2812
-------------+---------------------------------- Adj R-squared = 0.2603
Total | 1109563.37 601 1846.19529 Root MSE = 36.955
------------------------------------------------------------------------------
PROD | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
SEM | .0427787 .0063856 6.70 0.000 .0302373 .0553202
REV | -.0329684 .0193828 -1.70 0.089 -.0710369 .0051001
|
TYPSEM |
Améliorée | .9009478 3.641197 0.25 0.805 -6.250488 8.052384
Mixte | -2.322426 3.785714 -0.61 0.540 -9.757698 5.112846
|
EDU |
Primaire | -13.19697 4.246306 -3.11 0.002 -21.53686 -4.857075
Moyen | -11.2447 4.361401 -2.58 0.010 -19.81064 -2.678756
Secondaire | .7141498 4.253914 0.17 0.867 -7.640683 9.068983
|
SITMAT |
Monogame | 2.554427 3.750966 0.68 0.496 -4.8126 9.921453
Polygame | -6.709478 5.067001 -1.32 0.186 -16.66124 3.242287
Veuf | 3.068596 7.076381 0.43 0.665 -10.82966 16.96685
Divorcé | -13.34076 11.60691 -1.15 0.251 -36.13714 9.455615
|
AGE | -.257942 .2145425 -1.20 0.230 -.6793107 .1634268
MEN | .7636438 .7424602 1.03 0.304 -.6945736 2.221861
PARC | 1.433503 1.074874 1.33 0.183 -.677586 3.544593
MAT | -.7227274 .9731831 -0.74 0.458 -2.634093 1.188638
CRED | 6.110468 3.667312 1.67 0.096 -1.09226 13.3132
FORM | -1.63625 3.036183 -0.54 0.590 -7.599419 4.326919
_cons | 147.6565 14.60293 10.11 0.000 118.9758 176.3371
------------------------------------------------------------------------------
In [20]:
predict prod_xb, xb
predict prod_r, residual
predict prod_rstand, rstandard
predict prod_rstud, rstudent
I.1 Valeurs inhabituelles
Trois types de valeurs:
- aberrantes: valeurs qui sortent de la plage fréquante ;
- fortes : valeurs qui peuvent changer le pouvoir explicatif des regresseurs ;
- influentes : Valeurs qui modifient le résultat de la regression.
In [34]:
graph matrix PROD SEM REV AGE
graph export matgr.png, as(png) replace
file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph3.png saved as PNG format (file matgr.png not found) file matgr.png saved as PNG format file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph4.png saved as PNG format

Il n'y as pas de valeurs inhabituelles particulières.
In [35]:
* Valeurs aberrantes sur les résidus
stem prod_r
Stem-and-leaf plot for prod_r (Residuals) prod_r rounded to integers -8* | 410 -7. | 8776 -7* | 433321110 -6. | 9998877776665 -6* | 443332200000 -5. | 987655 -5* | 44433332111110 -4. | 998766655555 -4* | 444433221110000 -3. | 99998888777666665555555 -3* | 4444333333222100000 -2. | 999999888877776666555555 -2* | 444443333333333222222221111111000000000 -1. | 99999999888888887777766666666655555555 -1* | 44433333222221110000 -0. | 9999999998887777776666655 -0* | 4444444433333222111111 0* | 000001111111111222333333344444 0. | 55556666677777777888899 1* | 0001111111222222333334444 1. | 55566666677777777888889999 2* | 000001112222333334444444 2. | 555555556778888888888999999999 3* | 0000000001222223333444444 3. | 5555555666677777788888999 4* | 00000111111222334444444 4. | 555556667888999 5* | 00111222223333444 5. | 55555666778888899 6* | 0011222333 6. | 5566688 7* | 23444 7. | 9 8* | 4
In [36]:
* Valeurs fortes sur le résidus
predict prod_lev, leverage
stem prod_lev
Stem-and-leaf plot for prod_lev (Leverage) prod_lev rounded to nearest multiple of .001 plot in units of .001 1. | 56666666666667777777777778888888888888888888888899999999999999 ... (77) 2* | 0000000000000000000000000000000011111111111111111111111111111 ... (174) 2. | 5555555555555555555555555555555555556666666666666666666666666 ... (130) 3* | 00000000000000000000000000111111111111111111122222222222222233 ... (87) 3. | 555555555555555556666666666666677777777777788888999999 4* | 000000000111111122222233344 4. | 666777788899 5* | 0011111222224444 5. | 6667777899 6* | 2444 6. | 7* | 7. | 8* | 8. | 9* | 9. | 10* | 10. | 579 11* | 1134 11. | 79 12* | 4 12. | 8
In [38]:
lvr2plot, mlabel(ID)
graph export lvr2.png, as(png) replace
file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph5.png saved as PNG format (file lvr2.png not found) file lvr2.png saved as PNG format file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph6.png saved as PNG format

In [41]:
* Cook’s D and DFITS: distance pour évaluer les valeurs aberrentes
predict prod_d, cooksd
In [44]:
sum prod_d
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
prod_d | 602 .0017268 .002373 2.07e-09 .0253942
In [45]:
gsort -prod_d
list ID prod_d if prod_d > 1/100
+-----------------+
| ID prod_d |
|-----------------|
1. | Z54P .0253942 |
2. | F67G .021861 |
3. | F14G .0135174 |
4. | S26T .0112482 |
5. | C33Q .0103544 |
+-----------------+
In [48]:
predict prod_dfits, dfits
gsort -prod_dfits
list ID prod_dfits if abs(prod_dfits) > 2*sqrt(17/602)
+------------------+
| ID prod_df~s |
|------------------|
1. | Z54P .6773206 |
2. | E57P .4163871 |
3. | Z35W .4144422 |
4. | O85E .4009565 |
5. | O80M .3929912 |
|------------------|
6. | B78N .3768462 |
7. | Z21U .3530808 |
8. | A95L .341955 |
9. | R51T .3390989 |
10. | A46U .3375764 |
|------------------|
589. | F51Z -.3423429 |
590. | A99E -.3560463 |
591. | P45U -.3574846 |
592. | J42J -.361938 |
593. | T27M -.3812206 |
|------------------|
594. | O32F -.3861555 |
595. | M14R -.4055046 |
596. | I47A -.4097126 |
597. | M58X -.4139673 |
598. | D97K -.417295 |
|------------------|
599. | C33Q -.4319074 |
600. | S26T -.4508269 |
601. | F14G -.494814 |
602. | F67G -.6285325 |
+------------------+
In [49]:
* how each coefficient is changed by deleting the observation
dfbeta
Generating DFBETA variables ...
_dfbeta_1: DFBETA SEM
_dfbeta_2: DFBETA REV
_dfbeta_3: DFBETA 2.TYPSEM
_dfbeta_4: DFBETA 3.TYPSEM
_dfbeta_5: DFBETA 2.EDU
_dfbeta_6: DFBETA 3.EDU
_dfbeta_7: DFBETA 4.EDU
_dfbeta_8: DFBETA 2.SITMAT
_dfbeta_9: DFBETA 3.SITMAT
_dfbeta_10: DFBETA 4.SITMAT
_dfbeta_11: DFBETA 5.SITMAT
_dfbeta_12: DFBETA AGE
_dfbeta_13: DFBETA MEN
_dfbeta_14: DFBETA PARC
_dfbeta_15: DFBETA MAT
_dfbeta_16: DFBETA CRED
_dfbeta_17: DFBETA FORM
In [50]:
scatter _dfbeta_*, ylabel(-1(.5)3) yline(.28 -.28)
graph export nuagedf.png, as(png) replace
file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph7.png saved as PNG format (file nuagedf.png not found) file nuagedf.png saved as PNG format file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph8.png saved as PNG format


In [51]:
* avplot: added-variable plot or partial-regression plot for identifying influential points
avplot SEM, mlabel(ID)
graph export avpsem.png, as(png) replace
file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph9.png saved as PNG format
(file avpsem.png not found)
file avpsem.png saved as PNG format
file C:/Users/IBRAHIMA TALL/.stata_kernel_cache/graph10.png saved as PNG
format

In [ ]:
avplots
graph export avpls.png, as(png) replace

In [56]:
regress PROD SEM REV i.TYPSEM i.EDU AGE MEN PARC MAT CRED FORM if abs(prod_dfits) <= 2*sqrt(17/602)
Source | SS df MS Number of obs = 578
-------------+---------------------------------- F(13, 564) = 18.57
Model | 294558.409 13 22658.3392 Prob > F = 0.0000
Residual | 688036.663 564 1219.92316 R-squared = 0.2998
-------------+---------------------------------- Adj R-squared = 0.2836
Total | 982595.073 577 1702.93773 Root MSE = 34.927
------------------------------------------------------------------------------
PROD | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
SEM | .0421508 .0062292 6.77 0.000 .0299155 .0543861
REV | -.0297814 .0186709 -1.60 0.111 -.0664544 .0068915
|
TYPSEM |
Améliorée | 1.110064 3.503327 0.32 0.751 -5.771097 7.991226
Mixte | -2.100659 3.632077 -0.58 0.563 -9.234708 5.03339
|
EDU |
Primaire | -14.62977 4.08162 -3.58 0.000 -22.6468 -6.612733
Moyen | -14.3826 4.198691 -3.43 0.001 -22.62958 -6.135615
Secondaire | 1.478126 4.091624 0.36 0.718 -6.558555 9.514807
|
AGE | -.2969653 .2096174 -1.42 0.157 -.7086914 .1147607
MEN | .5886134 .7216209 0.82 0.415 -.8287792 2.006006
PARC | 1.454369 1.044087 1.39 0.164 -.5964049 3.505143
MAT | -.4751098 .9413538 -0.50 0.614 -2.324097 1.373878
CRED | 7.961222 3.510189 2.27 0.024 1.066583 14.85586
FORM | -2.198019 2.922501 -0.75 0.452 -7.938334 3.542295
_cons | 150.3307 14.32676 10.49 0.000 122.1903 178.471
------------------------------------------------------------------------------
In [58]:
estimates store mymodel
I.2 Distribution normale des résidus
Histogram, Normal P-P Plot ou la Normal Q-Q Plot des résidus studentisés.
In [ ]:
kdensity prod_r, normal
graph export kdens.png, as(png) replace
In [ ]:
* residual vs fitted plot
rvfplot //avplot, rvfplot, and rvpplot
In [37]:
testparm i.agedec
( 1) 2.agedec = 0
( 2) 3.agedec = 0
( 3) 4.agedec = 0
( 4) 5.agedec = 0
( 5) 6.agedec = 0
( 6) 7.agedec = 0
F( 6, 40976) = 20.61
Prob > F = 0.0000
In [40]:
*limit our examination to only the nonlinear effects
contrast p(2/7).agedec
Contrasts of marginal linear predictions
Margins: asbalanced
------------------------------------------------
| df F P>F
-------------+----------------------------------
agedec |
(quadratic) | 1 26.65 0.0000
(cubic) | 1 52.99 0.0000
(quartic) | 1 1.18 0.2778
(quintic) | 1 1.00 0.3179
(sextic) | 1 0.15 0.6959
(septic) | 1 0.02 0.8748
Joint | 6 20.61 0.0000
|
Denominator | 40976
------------------------------------------------
--------------------------------------------------------------
| Contrast Std. err. [95% conf. interval]
-------------+------------------------------------------------
agedec |
(quadratic) | -.0384272 .0074435 -.0530165 -.0238378
(cubic) | .0466045 .0064025 .0340554 .0591536
(quartic) | .0056375 .005194 -.0045429 .0158178
(quintic) | -.0042917 .0042968 -.0127134 .00413
(sextic) | .0014878 .0038067 -.0059734 .0089489
(septic) | -.0005636 .003578 -.0075765 .0064494
--------------------------------------------------------------
In [ ]:
marginsplot, recast(line) recastci(rarea)
Variables that uniquely identify margins: age
In [54]:
cap drop age_1 age_2
fp <age>: regress educ <age>
(fitting 44 models)
(....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%)
Fractional polynomial comparisons:
------------------------------------------------------------------------------
| Test Residual Deviance
age | df Deviance std. dev. diff. P Powers
-------------+----------------------------------------------------------------
omitted | 4 281982.95 3.179 4174.442 0.000
linear | 3 279481.55 3.107 1673.048 0.000 1
m = 1 | 2 278692.89 3.085 884.386 0.000 3
m = 2 | 0 277808.51 3.060 0.000 -- -2 .5
------------------------------------------------------------------------------
Note: Test df is degrees of freedom, and P = P > F is sig. level for tests
comparing models vs. model with m = 2 based on deviance difference,
F(df, 54741).
Source | SS df MS Number of obs = 54,746
-------------+---------------------------------- F(2, 54743) = 2168.74
Model | 40608.203 2 20304.1015 Prob > F = 0.0000
Residual | 512512.941 54,743 9.36216394 R-squared = 0.0734
-------------+---------------------------------- Adj R-squared = 0.0734
Total | 553121.144 54,745 10.103592 Root MSE = 3.0598
------------------------------------------------------------------------------
educ | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age_1 | -2060.389 45.13987 -45.64 0.000 -2148.864 -1971.915
age_2 | -1.367772 .0219094 -62.43 0.000 -1.410714 -1.324829
_cons | 23.36296 .1769524 132.03 0.000 23.01613 23.70978
------------------------------------------------------------------------------