Competitive exclusion and habitat filtering influence community assembly, but ecologists and evolutionary biologists have not reached consensus on how to quantify patterns that would reveal the action of these processes. Currently, at least 22 α-diversity and 10 β-diversity metrics of community phylogenetic structure can be combined with nine null models (eight for β-diversity metrics), providing 278 potentially distinct approaches to test for phylogenetic clustering and overdispersion. Selecting the appropriate approach for a study is daunting. First, we describe similarities among metrics and null models across variance in phylogeny size and shape, species abundance, and species richness. Second, we develop spatially explicit, individual-based simulations of neutral, competitive exclusion, or habitat filtering community assembly, and quantify the performance (type I and II error rates) of all 278 metric and null model combinations against each assembly process. Many α-diversity metrics and null models are at least functionally equivalent, reducing the number of truly unique metrics to 12 and the number of unique metric + null model combinations to 72. An even smaller subset of metric and null model combinations showed robust statistical performance. For α-diversity metrics, phylogenetic diversity and mean nearest taxon distance were best able to detect habitat filtering, while mean pairwise phylogenetic distance-based metrics were best able to detect competitive exclusion. Overall, β-diversity metrics tended to have greater power to detect habitat filtering and competitive exclusion than α-diversity metrics, but had higher type 1 error in some cases. Across both α- and β-diversity metrics, null model selection affected type I error rates more than metric selection. A null model that maintained species richness, and approximately maintained species occurrence frequency and abundance across sites, exhibited low type I and II error rates. This regional null model simulates neutral dispersal of individuals into local communities by sampling from a regional species pool. We introduce a flexible new R package, metricTester, to facilitate robust analyses of method performance.