This vignette will describe the decision rules used in the original
method of Song
(2013) and the High CMC method of Tong
et al. (2015). For illustrative purposes, we will consider a
comparison between a known match and known non-match pair of cartridge
cases from the stuy performed by Fadul et
al. (2011). The raw cartridge case scans can be downloaded from the
NIST
Ballistics Toolmark Research Database. The scans were preprocessed
using functions available in the cmcR package and are
not discussed here. Refer to the fadul_examples.R
script
available on the cmcR
GitHub page for how these scans were preprocessed. We will also not
discuss how similarity features are extracted from two processed scans.
Refer to the documentation of the comparison_allTogether
function on the cmcR website for information regarding
this procedure.
We will consider comparisons between three cartridge case scans. Fadul 1-1 and Fadul 1-2 are known matches (i.e., were fired from the same firearm) while Fadul 2-1 is a non-match. The comparisons considered are Fadul 1-1 vs. Fadul 1-2 and Fadul 1-1 vs. Fadul 2-1.
#Download a non-matching cartridge case to Fadul 1-1 and Fadul 1-2
fadul2.1_raw <- x3ptools::read_x3p("https://tsapps.nist.gov/NRBTD/Studies/CartridgeMeasurement/DownloadMeasurement/8ae0b86d-210a-41fd-ad75-8212f9522f96")
fadul2.1_processed <- fadul2.1_raw %>%
preProcess_crop(region = "exterior",
radiusOffset = -30) %>%
preProcess_crop(region = "interior",
radiusOffset = 200) %>%
preProcess_removeTrend(statistic = "quantile",
tau = .5,
method = "fn") %>%
preProcess_gaussFilter() %>%
x3ptools::sample_x3p()
The three processed cartridge cases are shown below.
The cell-based comparison procedure implemented in the
comparison_allTogether
function returns a data frame/tibble
of similarity features between two cartridge case scans. For each cell
in the “reference” scan (Fadul 1-1 in this example), the similarity
features include
(x,y)
,
required to align the reference cell in the target scan(x,y)
values(x,y,CCF)
feature setThe fundamental assumption underlying all CMC decision rules is that
truly matching cartridge case pairs should have similarity features that
are consistent across the cell/region pairs. In particular, a plurality
of cell/region pairs should “vote” for similar
(x, y, theta)
alignment values. In contrast, the
cell/region pairs of a truly non-matching cartridge cases should have
seemingly random (x, y, theta)
votes. The two decision
rules implemented in the cmcR package can be understood
as two different systems by which cells vote for
(x, y, theta)
values that they “believe” to be the true
alignment values for the overall cartridge case scans.
An actual implementation of the original method of Song (2013) is described in Song et al. (2014). The decision rule Song et al. (2014) describe using is based on
a virtual reference with three reference registration parameters θref, xref and yref generated by the median values of the collective θ, and x-, y-translation values of all cell pairs.
That is, a consensus is determined by finding the median registration phase values across the cell/region pairs for a particular cartridge case pair comparison. Then, the distances between the consensus registration values and the cell comparison values are assessed to determine whether they are within a specified distance of the consensus. This consensus assessment introduces threshold parameters Tx, Ty, Tθ, TCCF.
Let xi, yi, θi denote the translation and rotation parameters which produce the highest CCF for the alignment of cell/region pair i. Also let xref, yref, θref be the median over alignment values for a particular cartridge case comparison (these are the “virtual reference” values). A cell/region pair i is declared a match if all of the following conditions hold:
With respect to the voting system analogy, we might interpret this decision rule as a single-choice voting system similar to the system used in U.S. presidential elections. That is, every cell is allowed to submit one vote corresponding to the registration phase with the highest CCFmax value. Some of these votes are discarded if the associated CCFmax are below the TCCF threshold. A consensus is determined by counting the number of votes that are close to the reference values xref, yref, θref (which is dyadically defined based on the Tx, Ty, Tθ thresholds).
The plot below shows the values of xi, yi, θi, and CCFmax , i for each cell/region pair between Fadul 1-1 and Fadul 1-2 as well as Fadul 1-1 and Fadul 2-1. These values are shown as blue/red bars. The purple bands indicate the range of acceptable values within Tx = 20Ty, Tθ = 6 within xref, yref, θref and above TCCF = .5 to be declared “congruent.” As we might expect, a larger proportion of xi, yi, θi, and CCFmax , i values are within these acceptable ranges for the comparison between Fadul 1-1 and Fadul 1-2 than the comparison between Fadul 1-1 and Fadul 2-1. This indicates that there is a clearer “consensus” about the true alignment values for the matching cartridge case pair than the non-matching.
The first step in the High CMC method is to count the CMCs under the original method of Song (2013) in both comparison “directions,” meaning each scan plays the role as the “reference” and “target” scan. After these CMCs are counted, Tong et al. (2015) propose using the minimum of the 2 CMC counts as an initial CMC count prior to applying the High CMC decision rule. The figure below shows the behavior of the xi, yi, θi, and CCFmax , i values in each direction via a parallel-coordinates plot, which is useful for visualizing multi-dimensional data sets. Each connected path represents a single cell/region pair. The purple regions again represent the acceptable regions that are sufficiently “close” to the reference values (or above .5 in the case of the CCF). Paths that only traverse through purple regions are deemed congruent under the decision rule of the original method of Song (2013) and are colored blue. We can see that 19 cells are deemed congruent for the comparison in which Fadul 1-1 is treated as the reference while 18 are considered congruent in the other direction. As such, the initial CMC count used for the High CMC method would be 18.
By considering only the “top vote” of each cell as is done in the decision rule of the original method of Song (2013), information is lost regarding other registration phases for which a cell might also rank highly. As Tong et al. (2015) observe:
some of the valid cell pairs may be mistakenly excluded from the CMC count because by chance their correlation yields a higher CCF value at a rotation angle outside the threshold range Tθ.
The High CMC method lifts the single-choice restriction by allowing cells to cast a vote for the translation phase at every θ value for which it has a sufficiently large associated CCFmax value. Under this system, each vote represents the translation phase that the cell considers to be the true translation phase of the overall scans conditional on a particular θ value. In this way, the High CMC method might be viewed as an approval voting system in which an individual may cast a vote for all of the candidates that they would like. For each θ value, the number of translation phase votes that are close to the θ-specific reference values xref, θ, yref, θ are counted (now defined based only on the Tx, Ty thresholds). This yields what refer to as a “CMC-θ” distribution representing, as they consider it, the number of “congruent cells” per θ value. Thus, there may be more than one θ value for which a single cell/region pair is considered congruent. While seemingly contradictory (as there should be only one “true” θ alignment value), justify their method by the empirical observation:
[i]f two images are truly matching, the CMC-θ distribution of matching image pairs should have a prominent peak located near the initial phase angle Θ0, while non-matching image pairs may have a relatively flat and random CMC-θ distribution pattern.
The assumption underlying the High CMC method is that the number of cells classified as congruent should be larger near the true θ value (the “initial phase angle Θ0”, as they call it) than for other θ values if the cartridge case pair is indeed a match. These phenomena are illustrated in Figures and . shows the CMC counts per rotation value in both directions for the known match pair Fadul 1-1 and Fadul 1-2 from . We can clearly see a CMC mode around θ = −24 in one direction and 21 in the other, which is to be expected for a known match pair. , on the other hand, shows the CMC counts for the known non-match pair Fadul 1-1 and Fadul 2-1; in this comparison, no such CMC count mode is achieved.
An example of the CMC-θ
distribution for the comparison between Fadul 1-1 and Fadul 1-2 is shown
below. We can see that, conditional on θ = −24 degrees, more cells tend to
have similar x, y
values than conditional on θ = 30. The CCF values are also
larger. This indicates that θ = −24 is likely closer to the
“true” rotation than θ = 30 or
elsewhere. The two darker-shaded bars represent the θ values that have a “High CMC”
count as described above. Because these θ values are adjacent rather than
being far from each other, there is evidence that the “true” θ value is approximately θ = −24 or −27 degrees. We say that this comparison
direction would “pass” the High CMC criteria because the θ values with high CMC counts are
adjacent.
Based on this observation, outline the following procedure for the High CMC method:
Conduct both forward and backward correlations at each rotation and record the registration based on CCFmax, x, and y for each cell at each rotation. These data will be used in the next two steps separately.
At every rotation angle, each cell in the reference image finds a registration position in the compared image with a maximum CCF value. By selecting the registration with the maximum CCF value for each cell, the two CMC numbers determined by the four thresholds can be obtained based on the original algorithm []. The lower CMC number is used as the initial result.
Build CMC-θ distributions using the data generated in step 1, by counting the number of cells that have congruent positions at each individual rotation angle. Calculate the angular range of “high CMCs” using both the forward and backward CMC-θ distributions, as illustrated in Figs. 2 and 3.
If the angular range of the “high CMCs” is within the range Tθ, identify the CMCs for each rotation angle in this range and combine them to give the number of CMCs for this comparison in place of the original CMC number. In this step, if the range is narrower than Tθ, the nearby angles are included to make the range equal to Tθ; CMCs with same index in each rotation are only counted once.
introduce an additional criteria to identify a mode in the CMC count per θ distribution. Let {CMCθ : θ ∈ Θ} denote the CMC-θ distribution where Θ is the set of rotation values considered for the comparison. Define CMCmax ≡ maxθ{CMCθ : θ ∈ Θ}. a “high” CMC threshold as CMChigh≡ CMCmax − τ for some constant τ (they choose τ = 1). Now let Θhigh ≡ {θ : CMCθ ≥ CMChigh}. That is, Θhigh consists of the θ values with “high” CMC counts. propose calculating R = maxθΘhigh − minθΘhigh. If R ≤ Tθ, then there is evidence that a single mode exists in the CMC-θ distribution (and thus that the cartridge case pair is a match). Otherwise, no such mode exists (by their definition) and the cartridge case pair is likely not a match. The horizontal dashed lines in Figures and represent the CMChigh thresholds. The θ ∈ Θhigh are represented by blue bars. For the matching pair shown in , the range of Θhigh is less than the threshold Tθ = 6 degrees, so this pair would “pass” the High CMC criteria. In contrast, the range of Θhigh is larger than Tθ = 6 degrees for the non-match pair shown in . Thus, the non-match pair would “fail” the High CMC criteria.
The “prominent peak” empirical observation upon which the High CMC method is based does seem to hold for many known match and known non-match pairs in our experience. However, we’ve observed that the behavior of the CMC-θ distributions depend heavily on the preprocessing procedures used and thresholds set. In particular, the CMC-θ distributions for some KNM pairs exhibit the prominent peak behavior for a wide range of threshold values making them difficult to distinguish from KM pairs.