The Sashelp.JunkMail
data set comes from a study that classifies whether an e-mail is junk e-mail (coded as 1) or not (coded as 0). The data were
collected in Hewlett-Packard labs and donated by George Forman. The data set contains 4,601 observations with 58 variables.
The response variable is a binary indicator of whether an e-mail is considered spam or not. The 57 variables are continuous
variables that record frequencies of some common words and characters and lengths of uninterrupted sequences of capital letters
in e-mails. The following steps display information about the Sashelp.JunkMail
data set and create Figure B.10:
title 'Junk E-Mail Data'; proc contents data=sashelp.JunkMail varnum; ods select position; run; title 'The First Five Observations Out of 4601'; proc print data=sashelp.JunkMail(obs=5) heading=horizontal; run;
Figure B.10: Junk E-Mail Data
Junk E-Mail Data |
Variables in Creation Order | ||||
---|---|---|---|---|
# | Variable | Type | Len | Label |
1 | Test | Num | 8 | 0 - Training, 1 - Test |
2 | Make | Num | 8 | |
3 | Address | Num | 8 | |
4 | All | Num | 8 | |
5 | _3D | Num | 8 | 3D |
6 | Our | Num | 8 | |
7 | Over | Num | 8 | |
8 | Remove | Num | 8 | |
9 | Internet | Num | 8 | |
10 | Order | Num | 8 | |
11 | Num | 8 | ||
12 | Receive | Num | 8 | |
13 | Will | Num | 8 | |
14 | People | Num | 8 | |
15 | Report | Num | 8 | |
16 | Addresses | Num | 8 | |
17 | Free | Num | 8 | |
18 | Business | Num | 8 | |
19 | Num | 8 | ||
20 | You | Num | 8 | |
21 | Credit | Num | 8 | |
22 | Your | Num | 8 | |
23 | Font | Num | 8 | |
24 | _000 | Num | 8 | 000 |
25 | Money | Num | 8 | |
26 | HP | Num | 8 | |
27 | HPL | Num | 8 | |
28 | George | Num | 8 | |
29 | _650 | Num | 8 | 650 |
30 | Lab | Num | 8 | |
31 | Labs | Num | 8 | |
32 | Telnet | Num | 8 | |
33 | _857 | Num | 8 | 857 |
34 | Data | Num | 8 | |
35 | _415 | Num | 8 | 415 |
36 | _85 | Num | 8 | 85 |
37 | Technology | Num | 8 | |
38 | _1999 | Num | 8 | 1999 |
39 | Parts | Num | 8 | |
40 | PM | Num | 8 | |
41 | Direct | Num | 8 | |
42 | CS | Num | 8 | |
43 | Meeting | Num | 8 | |
44 | Original | Num | 8 | |
45 | Project | Num | 8 | |
46 | RE | Num | 8 | |
47 | Edu | Num | 8 | |
48 | Table | Num | 8 | |
49 | Conference | Num | 8 | |
50 | Semicolon | Num | 8 | |
51 | Paren | Num | 8 | |
52 | Bracket | Num | 8 | |
53 | Exclamation | Num | 8 | |
54 | Dollar | Num | 8 | |
55 | Pound | Num | 8 | |
56 | CapAvg | Num | 8 | Capital Run Length Average |
57 | CapLong | Num | 8 | Capital Run Length Longest |
58 | CapTotal | Num | 8 | Capital Run Length Total |
59 | Class | Num | 8 | 0 - Not Junk, 1 - Junk |
The First Five Observations Out of 4601 |
Obs | Test | Make | Address | All | _3D | Our | Over | Remove | Internet | Order | Receive | Will | People | Report | Addresses | Free | Business | You | Credit | Your | Font | _000 | Money | HP | HPL | George | _650 | Lab | Labs | Telnet | _857 | Data | _415 | _85 | Technology | _1999 | Parts | PM | Direct | CS | Meeting | Original | Project | RE | Edu | Table | Conference | Semicolon | Paren | Bracket | Exclamation | Dollar | Pound | CapAvg | CapLong | CapTotal | Class | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0.00 | 0.64 | 0.64 | 0 | 0.32 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.64 | 0.00 | 0.00 | 0.00 | 0.32 | 0.00 | 1.29 | 1.93 | 0.00 | 0.96 | 0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0.00 | 0.00 | 0 | 0 | 0.00 | 0.000 | 0 | 0.778 | 0.000 | 0.000 | 3.756 | 61 | 278 | 1 |
2 | 0 | 0.21 | 0.28 | 0.50 | 0 | 0.14 | 0.28 | 0.21 | 0.07 | 0.00 | 0.94 | 0.21 | 0.79 | 0.65 | 0.21 | 0.14 | 0.14 | 0.07 | 0.28 | 3.47 | 0.00 | 1.59 | 0 | 0.43 | 0.43 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.07 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0.00 | 0.00 | 0 | 0 | 0.00 | 0.132 | 0 | 0.372 | 0.180 | 0.048 | 5.114 | 101 | 1028 | 1 |
3 | 1 | 0.06 | 0.00 | 0.71 | 0 | 1.23 | 0.19 | 0.19 | 0.12 | 0.64 | 0.25 | 0.38 | 0.45 | 0.12 | 0.00 | 1.75 | 0.06 | 0.06 | 1.03 | 1.36 | 0.32 | 0.51 | 0 | 1.16 | 0.06 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0 | 0 | 0.06 | 0 | 0 | 0.12 | 0 | 0.06 | 0.06 | 0 | 0 | 0.01 | 0.143 | 0 | 0.276 | 0.184 | 0.010 | 9.821 | 485 | 2259 | 1 |
4 | 0 | 0.00 | 0.00 | 0.00 | 0 | 0.63 | 0.00 | 0.31 | 0.63 | 0.31 | 0.63 | 0.31 | 0.31 | 0.31 | 0.00 | 0.00 | 0.31 | 0.00 | 0.00 | 3.18 | 0.00 | 0.31 | 0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0.00 | 0.00 | 0 | 0 | 0.00 | 0.137 | 0 | 0.137 | 0.000 | 0.000 | 3.537 | 40 | 191 | 1 |
5 | 0 | 0.00 | 0.00 | 0.00 | 0 | 0.63 | 0.00 | 0.31 | 0.63 | 0.31 | 0.63 | 0.31 | 0.31 | 0.31 | 0.00 | 0.00 | 0.31 | 0.00 | 0.00 | 3.18 | 0.00 | 0.31 | 0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0.00 | 0.00 | 0 | 0 | 0.00 | 0.135 | 0 | 0.135 | 0.000 | 0.000 | 3.537 | 40 | 191 | 1 |