Natural Kinds Clustering | Predicting formational environments of mineral samples
Mineral modes of formation provide insights into Earth’s co-evolving geosphere and biosphere, and also have the potential to illustrate otherwise obscure aspects of planetary evolution. Due to the limitation of classification criteria (largely based on unique combinations of idealized major element composition and crystal structure), the modern mineral classification system does not offer insights into diverse modes of origin for each mineral. As a case study, here we use natural clustering (unsupervised machine learning) to divide pyrite into different clusters and link the clusters with pyrite formation environment. A variety of deposit types of pyrite (e.g., iron oxide copper-gold, orogenic Au, porphyry Cu, sedimentary exhalative, volcanic-hosted massive sulfide deposits, and barren sedimentary pyrite) are used to evaluate the clustering process. An array of trace elements determined by LA-ICP-MS are used as predictor variables. Different clustering algorithms are employed in this study and their dissimilarities in model outputs are highlighted. Using the natural clustering technique, we are able to divide the formation environment of pyrite into three categories — Low temperature, medium temperature and high temperature. This study has implications for the elucidation of mineral-forming environments and could be applied to clustering of a wide range of condensed planetary materials with different paragenetic origins.
Association Analysis | Predicting the location of previously unknown mineral deposits
The oldest minerals are surviving materials from the formation of our solar system and they provide information about the evolution of Earth and other planets. Mindat (mindat.org), the Mineral Evolution Database (RRUFF.info/Evolution), and the Global Earth Mineral Inventory are some of the well known datasets in the field of mineralogy, which contain data about almost all known localities on Earth where minerals have been found. The increase in the amount and accuracy of mineral data and the improvements in technological resources make it possible to explore and answer large, outstanding scientific questions, such as, understanding the mineral assemblages on Earth and how they compare to assemblages and localities on other planets. In this contribution, we present an affinity analysis method to: 1) Predict unreported minerals at an existing locality. 2) Predict localities for a set of known minerals.
Association Analysis, or Market Basket Analysis, is a machine learning method that uses mined association rules to find interesting patterns in the data. The strength of the rules is identified using some measures of interestingness, such as ‘lift’. For example, when the occurrence of a mineral predicted with high confidence at a given locality is unexpected (low support), the rule used for such a prediction is considered ‘very interesting’. Successful implementation of this methodology will greatly aid the mineral discovery process.
Label Distribution Learning | Estimating the chemical composition of minerals on Mars
To better understand the formational conditions and geologic history of the minerals found in by NASA MSL rover Curiosity in Gale crater, Mars the CheMin X-ray diffractometer team developed a crystal-chemical method to predict limited chemical compositions of the minerals observed in the CheMin samples [1,2]. In this study, we adapt a machine learning technique, Label Distribution Learning (LDL) , to predict multicomponent chemical compositions of Gale crater mineral phases, thereby allowing for more detailed petrologic interpretation of the geologic history of the martian surface.
LDL is a novel framework for classification problems with small datasets and has been widely applied to facial recognition problems such as age estimation. In this study, we adapt the LDL algorithm such that it can predict chemical elements (labels) and their abundances (degrees) for each martian mineral sample, based on crystallographic parameters. We evaluate performance using distance and similarity between label distributions as well as mean square error and also compare the results to traditional machine learning methods.
 Morrison et al. (2017) Am Min, 103(6): 848-856  Morrison et al. (2017) Am Min, 103(6): 857-871  Geng (2016) IEEE Transactions on Knowledge and Data Engineering, 28(7), 1734-1748
Minerals vs Microbes | Exploring the complex relationships between microbial populations and their geochemical environments
The reciprocal feedbacks between microorganisms and their environment have governed much of the coevolution of the biosphere, geosphere, and atmosphere throughout geological time. Evidence from the rock record highlights massive shifts in redox chemistry, trace metal availability, and primitive respiration during ancient Earth that may have been driven at least partially by changes in plate tectonics and volcanism. Our understanding of how deep subsurface processes in modern environments influence the trajectory of microbial evolution is limited. To better characterize the interactions between microorganisms and their environment, we sequenced 35 metagenomes from microbial communities along the Costa Rica volcanic arc, where sites varied significantly in terms of pH (0.85 to 9.75), temperature (26 to 88qC), sulfate concentrations (0.03 to 99.2 mM), and molecular hydrogen (<0.001 to 11.7 mM). Diverse pathways of carbon fixation were observed across most samples, including the Calvin-Benson Cycle and the Wood-Ljungdahl pathway. Network analysis showed sulfate and hydrogen negatively correlated with genes involved in these pathways, includingcbb3-type cytochromecoxidase. Sulfate also had a negative relationship with glycolysis, indicating that nutrient release from the deep subsurface may play a role in shaping both chemolithotrophic and heterotrophic communities at the surface.
Network Analysis | Exploiting the multivariate, multidimensional nature of complex evolving systems
A fundamental goal of mineralogy and petrology is the deep understanding of mineral phase relation-ships and the consequent spatial and temporal patterns of mineral coexistence in rocks, ore bodies, sediments, meteorites, and other natural polycrystalline materials. The multi-dimensional chemical complexity of such mineral assemblages has traditionally led to experimental and theoretical consideration of 2-, 3-, or n-component systems that represent simplified approximations of natural systems. Network analysis provides a dynamic, quantitative, and predictive visualization framework for employing “big data” to explore complex and otherwise hidden higher-dimensional patterns of diversity and distribution in such mineral systems. We introduce and explore applications of mineral network analysis, in which mineral species are represented by nodes, while coexistence of minerals is indicated by lines between nodes. This approach provides a dynamic visualization platform for higher-dimensional analysis of phase relationships, because topologies of equilibrium phase assemblages and pathways of mineral reaction series are embedded within the networks. Mineral networks also facilitate quantitative comparison of lithologies from different planets and moons, the analysis of coexistence patterns simultaneously among hundreds of mineral species and their localities, the exploration of varied paragenetic modes of mineral groups, and investigation of changing patterns of mineral occurrence through deep time. Mineral network analysis, furthermore, represents an effective visual approach to teaching and learning in mineralogy and petrology.
Tectonic controls on mineralization | Integrating mineralogical data resources with the EarthByte GPlates plate tectonic reconstruction platform