UNIQUEBY Function

UNIQUEBY (matrix <, by> <, index> ) ;

The UNIQUEBY function returns the locations of the unique BY-group combinations for a sorted or indexed matrix. The arguments to the UNIQUEBY function are as follows:

matrix

is the input matrix, which must be sorted or indexed according to the by columns.

by

is either a numeric matrix of column numbers, or a character matrix that contains the names of columns that correspond to column labels assigned to matrix by a MATTRIB statement or READ statement. If by is not specified, then the first column is used.

index

is a vector such that index[$i$] is the row index of the $i$th element of matrix when sorted according to by. Consequently, matrix[index, ] is the sorted matrix. index can be computed for a matrix and a given set of by columns with the SORTNDX call. If the matrix is known to be sorted according to the by columns already, then index should be 1:nrow(matrix). In this case, you can also omit the index argument.

The UNIQUEBY function returns a column vector whose $i$th row is the row in index whose value is the row in matrix of the $i$th unique combination of values in the by columns.

For example, the following statements use the SORTNDX subroutine to create a sort index for a matrix. The UNIQUEBY function is then used to determine the unique combinations of the columns of the matrix:

m = { 1 0,
      2 0,
      2 2,
      2 0,
      1 0,
      2 0,
      1 1 };
cols = 1:2;
call sortndx(ndx, m, cols);
  
sorted = m[ndx,];
unique_rows = uniqueby(m, cols, ndx);
unique_vals = m[ndx[unique_rows], cols ];
print sorted, unique_rows unique_vals;

Figure 23.361: Unique Values of the Sort Variables

sorted
1 0
1 0
1 1
2 0
2 0
2 0
2 2

unique_rows unique_vals  
1 1 0
3 1 1
4 2 0
7 2 2


In addition, the following statements compute the number of unique values and the number of elements in each BY-group:

n = nrow(unique_rows);
size = j(n,1);
do i = 1 to n-1;
   size[i] = unique_rows[i+1] - unique_rows[i];
end;
size[n] = nrow(m) - unique_rows[n] + 1;
print n, size;

Figure 23.362: Number of BY Groups and Number of Elements in Each Group

n
4

size
2
1
3
1


If matrix is already sorted according to the by columns (see the SORT call), then UNIQUEBY can be called with 1:nrow(matrix) for the index argument, or the last argument can be omitted as shown in the following statement:

unique_loc = uniqueby(sorted, cols);
print unique_loc;

Figure 23.363: Position of Unique Rows for a Sorted Matrix

unique_loc
1
3
4
7