Most software bug tracking systems have some notion of duplicate bugs in which one bug is declared to be the same as another bug. If bug A is considered a duplicate (DUP) of bug B, then a fix for B would also fix A. You can represent the DUPs in a bug tracking system as a directed graph where you add a link if A is a DUP of B.
The bug tracking system needs to check for two situations when users declare a bug to be a DUP. The first situation is called a circular dependence. Consider bugs A, B, C, and D in the tracking system. The first user declares that A is a DUP of B and that C is a DUP of D. Then, a second user declares that B is a DUP of C, and a third user declares that D is a DUP of A. You now have a circular dependence, and no primary bug is defined on which the development team should focus. You can easily see this circular dependence in the graph representation, because . Finding such circular dependencies can be done using cycle detection, which is described in the section Cycle. However, the second situation that needs to be checked is more general. If a user declares that A is a DUP of B and another user declares that B is a DUP of C, this chain of duplicates is already an issue. The bug tracking system needs to provide one primary bug to which the rest of the bugs are duplicated. The existence of these chains can be identified by calculating the transitive closure of the directed graph that is defined by the DUP links.
Given the original directed graph G (defined by the DUP links) and its transitive closure , any link in that is not in G exists because of some chain that is present in G.
Consider the following data that define some duplicated bugs (called defects) in a small sample of the bug tracking system:
data DefectLinks; input defectId $ linkedDefect $ linkType $ when datetime16.; format when datetime16.; datalines; D0096978 S0711218 DUPTO 20OCT10:00:00:00 S0152674 S0153280 DUPTO 30MAY02:00:00:00 S0153280 S0153307 DUPTO 30MAY02:00:00:00 S0153307 S0152674 DUPTO 30MAY02:00:00:00 S0162973 S0162978 DUPTO 29NOV10:16:13:16 S0162978 S0165405 DUPTO 29NOV10:16:13:16 S0325026 S0575748 DUPTO 01JUN10:00:00:00 S0347945 S0346582 DUPTO 03MAR06:00:00:00 S0350596 S0346582 DUPTO 21MAR06:00:00:00 S0539744 S0643230 DUPTO 10MAY10:00:00:00 S0575748 S0643230 DUPTO 15JUN10:00:00:00 S0629984 S0643230 DUPTO 01JUN10:00:00:00 ;
The following statements calculate cycles in addition to the transitive closure of the graph G that is defined by the duplicated defects in DefectLinks
. The output data set Cycles
contains any circular dependencies, and the data set TransClosure
contains the transitive closure . To identify the chains, you can use PROC SQL to identify the links in that are not in G.
proc optmodel; set<str,str> LINKS; read data DefectLinks into LINKS=[defectId linkedDefect]; set<num,num,str> CYCLES; set<str,str> CLOSURE; solve with NETWORK / loglevel = moderate graph_direction = directed links = (include=LINKS) cycle = (mode=first_cycle) out = (cycles=CYCLES) ; put CYCLES; create data Cycles from [cycle order node]=CYCLES; solve with NETWORK / loglevel = moderate graph_direction = directed links = (include=LINKS) transitive_closure out = (closure=CLOSURE) ; put CLOSURE; create data TransClosure from [defectId linkedDefect]=CLOSURE; quit; proc sql; create table Chains as select defectId, linkedDefect from TransClosure except select defectId, linkedDefect from DefectLinks; quit;
The progress of the procedure is shown in Output 9.6.1.
Output 9.6.1: Network Solver Log: Transitive Closure for Identification of Circular Dependencies in a Bug Tracking System
NOTE: There were 12 observations read from the data set WORK.DEFECTLINKS. |
NOTE: The number of nodes in the input graph is 16. |
NOTE: The number of links in the input graph is 12. |
NOTE: Processing cycle detection. |
NOTE: The graph does have a cycle. |
NOTE: Processing cycle detection used 0.00 (cpu: 0.00) seconds. |
{<1,1,'S0152674'>,<1,2,'S0153280'>,<1,3,'S0153307'>,<1,4,'S0152674'>} |
NOTE: The data set WORK.CYCLES has 4 observations and 3 variables. |
NOTE: The number of nodes in the input graph is 16. |
NOTE: The number of links in the input graph is 12. |
NOTE: Processing the transitive closure. |
NOTE: Processing the transitive closure used 0.00 (cpu: 0.00) seconds. |
{<'D0096978','S0711218'>,<'S0152674','S0153280'>,<'S0153280','S0153307'>,< |
'S0153307','S0152674'>,<'S0162973','S0162978'>,<'S0162978','S0165405'>,< |
'S0325026','S0575748'>,<'S0347945','S0346582'>,<'S0350596','S0346582'>,< |
'S0539744','S0643230'>,<'S0575748','S0643230'>,<'S0629984','S0643230'>,< |
'S0153307','S0153280'>,<'S0152674','S0153307'>,<'S0153307','S0153307'>,< |
'S0152674','S0152674'>,<'S0153280','S0152674'>,<'S0153280','S0153280'>,< |
'S0162973','S0165405'>,<'S0325026','S0643230'>} |
NOTE: The data set WORK.TRANSCLOSURE has 20 observations and 2 variables. |
NOTE: Table WORK.CHAINS created, with 8 rows and 2 columns. |
The data set Cycles
contains one case of a circular dependence in which the DUPs start and end at S0152674.
The data set Chains
contains the chains in the bug tracking system that come from the links in that are not in G.