The Sashelp.JunkMail
data set comes from a study that classifies whether an e-mail is junk e-mail (coded as 1) or not (coded as 0). The data were
collected in Hewlett-Packard labs and donated by George Forman. The data set contains 4,601 observations with 59 variables.
The response variable is a binary indicator of whether an e-mail is considered spam or not. There are 57 predictor variables
that record frequencies of some common words and characters and lengths of uninterrupted sequences of capital letters in e-mails.
The following steps display information about the Sashelp.JunkMail
data set and create Figure B.11:
title 'Junk E-mail Data';
proc contents data=sashelp.JunkMail varnum;
ods select position;
run;
title 'The First Five Observations Out of 4,601';
proc print data=sashelp.JunkMail(obs=5) heading=horizontal;
run;
Figure B.11: Junk E-mail Data
Test |
Num |
8 |
0 - Training, 1 - Test |
Make |
Num |
8 |
|
Address |
Num |
8 |
|
All |
Num |
8 |
|
_3D |
Num |
8 |
3D |
Our |
Num |
8 |
|
Over |
Num |
8 |
|
Remove |
Num |
8 |
|
Internet |
Num |
8 |
|
Order |
Num |
8 |
|
Mail |
Num |
8 |
|
Receive |
Num |
8 |
|
Will |
Num |
8 |
|
People |
Num |
8 |
|
Report |
Num |
8 |
|
Addresses |
Num |
8 |
|
Free |
Num |
8 |
|
Business |
Num |
8 |
|
Email |
Num |
8 |
|
You |
Num |
8 |
|
Credit |
Num |
8 |
|
Your |
Num |
8 |
|
Font |
Num |
8 |
|
_000 |
Num |
8 |
000 |
Money |
Num |
8 |
|
HP |
Num |
8 |
|
HPL |
Num |
8 |
|
George |
Num |
8 |
|
_650 |
Num |
8 |
650 |
Lab |
Num |
8 |
|
Labs |
Num |
8 |
|
Telnet |
Num |
8 |
|
_857 |
Num |
8 |
857 |
Data |
Num |
8 |
|
_415 |
Num |
8 |
415 |
_85 |
Num |
8 |
85 |
Technology |
Num |
8 |
|
_1999 |
Num |
8 |
1999 |
Parts |
Num |
8 |
|
PM |
Num |
8 |
|
Direct |
Num |
8 |
|
CS |
Num |
8 |
|
Meeting |
Num |
8 |
|
Original |
Num |
8 |
|
Project |
Num |
8 |
|
RE |
Num |
8 |
|
Edu |
Num |
8 |
|
Table |
Num |
8 |
|
Conference |
Num |
8 |
|
Semicolon |
Num |
8 |
|
Paren |
Num |
8 |
|
Bracket |
Num |
8 |
|
Exclamation |
Num |
8 |
|
Dollar |
Num |
8 |
|
Pound |
Num |
8 |
|
CapAvg |
Num |
8 |
Capital Run Length Average |
CapLong |
Num |
8 |
Capital Run Length Longest |
CapTotal |
Num |
8 |
Capital Run Length Total |
Class |
Num |
8 |
0 - Not Junk, 1 - Junk |
1 |
0.00 |
0.64 |
0.64 |
0 |
0.32 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.64 |
0.00 |
0.00 |
0.00 |
0.32 |
0.00 |
1.29 |
1.93 |
0.00 |
0.96 |
0 |
0.00 |
0.00 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0.00 |
0.00 |
0 |
0 |
0.00 |
0.000 |
0 |
0.778 |
0.000 |
0.000 |
3.756 |
61 |
278 |
1 |
0 |
0.21 |
0.28 |
0.50 |
0 |
0.14 |
0.28 |
0.21 |
0.07 |
0.00 |
0.94 |
0.21 |
0.79 |
0.65 |
0.21 |
0.14 |
0.14 |
0.07 |
0.28 |
3.47 |
0.00 |
1.59 |
0 |
0.43 |
0.43 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.07 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0.00 |
0.00 |
0 |
0 |
0.00 |
0.132 |
0 |
0.372 |
0.180 |
0.048 |
5.114 |
101 |
1028 |
1 |
1 |
0.06 |
0.00 |
0.71 |
0 |
1.23 |
0.19 |
0.19 |
0.12 |
0.64 |
0.25 |
0.38 |
0.45 |
0.12 |
0.00 |
1.75 |
0.06 |
0.06 |
1.03 |
1.36 |
0.32 |
0.51 |
0 |
1.16 |
0.06 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.00 |
0 |
0 |
0.06 |
0 |
0 |
0.12 |
0 |
0.06 |
0.06 |
0 |
0 |
0.01 |
0.143 |
0 |
0.276 |
0.184 |
0.010 |
9.821 |
485 |
2259 |
1 |
0 |
0.00 |
0.00 |
0.00 |
0 |
0.63 |
0.00 |
0.31 |
0.63 |
0.31 |
0.63 |
0.31 |
0.31 |
0.31 |
0.00 |
0.00 |
0.31 |
0.00 |
0.00 |
3.18 |
0.00 |
0.31 |
0 |
0.00 |
0.00 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0.00 |
0.00 |
0 |
0 |
0.00 |
0.137 |
0 |
0.137 |
0.000 |
0.000 |
3.537 |
40 |
191 |
1 |
0 |
0.00 |
0.00 |
0.00 |
0 |
0.63 |
0.00 |
0.31 |
0.63 |
0.31 |
0.63 |
0.31 |
0.31 |
0.31 |
0.00 |
0.00 |
0.31 |
0.00 |
0.00 |
3.18 |
0.00 |
0.31 |
0 |
0.00 |
0.00 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0.00 |
0.00 |
0 |
0 |
0.00 |
0.135 |
0 |
0.135 |
0.000 |
0.000 |
3.537 |
40 |
191 |
1 |