top of page

Exploring Geological Data: A Journey in R Markdown

In the realm of geological exploration, data is everything. Yet, navigating through the intricacies of multi-year datasets and deciphering their hidden stories within requires more than just an experienced and keen eye – it demands a systematic approach and a touch of coding magic. Join me on a journey through the world of R Markdown (in R Studio) as the chemical secrets of the Proof Project are uncovered.

Graphical interface of R Studio featuring coding, output, data, and packages
R Studio Graphic Interface of the Data Compilation Coding

Proof Project Overview

The Proof Property spans approximately 905.01 hectares of land within the traditional territories of the Skin Tyee Nation, the Nee-Tahi-Buhn Indian Band, the Nadleh Whut'en Band, and the Stellat'en First Nation. It comprises three mineral tenures and is situated within the gentle relief along the northern shore of Borel Lake and south of Francois Lake. This area is located within the Nechako Plateau of central BC, which is notoriously difficult to explore due to limited outcrop exposures, thick till blankets, and large expanses of flat or gently rolling country that remain almost completely unmarred by watercourse incisions for those coveted stream sediment samples valued in early exploration.

Regional exploration efforts within the Proof Project area include airborne magnetic and electromagnetic geophysical surveys, geological mapping, and rock physical properties surveys. These surveys have identified intriguing anomalies, including a large late-time tau (conductive) anomaly with a corresponding early-time Z off-time (resistivity) anomaly, as well as the presence of lithological units such as the Kalsalka Group volcanic rocks (the same host formation as the Blackwater Gold-Silver Deposit 85 kilometres south) and the Late Cretaceous Holy Cross Pluton - a feldspar porphyry, providing valuable insights into the geological setting of the area.

In 2018, a 30-kilometre-long caesium trend in lodgepole pine outer bark samples led to the identification of the Proof Project, and, with subsequent sampling revealed anomalous concentrations of gold, silver, antimony, arsenic, vanadium, and thallium in lodgepole pine twigs. The discovery of the Goldtree Zone in the first season, characterized by gold concentrations ranging from 21 to 38.8 parts per billion (ppb), marked a significant milestone in the project's development. Building on these discoveries, the 2022 exploration program revealed the Silvertree Zone, a 4-kilometre silver anomaly, ranging from 20 to 34 ppb silver) south of the Goldtree Zone.

The Task: R Markdown

This coding adventure's quest is to compile biogeochemical data from lodgepole pine twigs collected in 2018 and 2022. This data holds the key to understanding the chemical signatures of the study area, offering valuable insights into its geological and economic potential. But before diving into the chemical intricacies of our datasets, the stage has to be set for data formatting, non-detects, investigating sampling biases, and compositional data analysis.

First Look

The journey kicks off with a bird's-eye view of our data – a landscape scattered with numbers, columns, and headers sprawled across multiple rows (the typical assay results spreadsheet received from a lab). Wrangling this raw data into a coherent format is the first challenge, but with the power of R Markdown, the chaos can be transformed into order, laying the groundwork for statistical analyses as opposed to copying and pasting the assay results into a sample spreadsheet hoping there are no transcription errors.

Re-Format Assay Data

The lab sends out Excel files laden with headers spanning multiple rows. These data, often fragmented and scattered across multiple rows, require meticulous re-formatting to transform chaos into order. Through coding, the assay data is standardized and tamed into a coherent spreadsheet, extracting the required column names and detection limits of the assay data to prepare it for further investigations.

Summary Tables

With the data organized, summary tables can be calculated by calculating the standard summary statistics to start to unveil the hidden patterns within these cursory half-detection limit replacements for non-detects. Detection limits, medians, maximums, Coefficient of Variation, and Median Absolute Deviation – each value tells a story, painting a picture of the chemical landscape and where to focus future exploration efforts.

Prepare the Data for Analysis

Before embarking on an analytical quest, the data must first be prepared – converting all the elements to the same measurement units, imputing missing data, and ensuring uniformity across the years. It's a rarely done meticulous process, but one that lays the foundation for meaningful insights.

Compositional Data Transformation

As we venture deeper into the realm of data analysis, we encounter the transformative power of compositional data analysis. Through the power of transforming closed data to the open freedom of centred log ratio (clr) transformation, we unlock new perspectives, revealing the intricate relationships hidden within our datasets as they relate directly to the chemical compositions rather than the spurious associations that are influenced by assay digestion methods.

Sampling Bias and Correlations

But this data science journey is not without its challenges. Along the way sampling bias related to geology, alteration, mineralization, sampling years, and the influence of forest fires soil composition changes are investigated; the nuances of quantiles for data levelling are navigated; all resulting in deciphering the complexities of correlation analysis of clr transformed data, through student T-test critical values, and leading to statistically determining the significant correlations.

Conclusions: Unveiling Geological Insights

The culmination of this exploration journey has unearthed a wealth of discoveries that illuminate the geological and biogeochemical landscape of the Proof Project area. Through meticulous data analysis and interpretation, valuable insights have been gained into the potential mineralization within this area.

Correlation Analysis:

The correlation analysis conducted on the biogeochemical data has revealed intriguing patterns suggestive of a complex mineralization system. The presence of a gold-platinum-bismuth core surrounded by a silver-calcium-potassium-sodium-vanadium halo points towards a multi-element mineralization regime. These correlations, coupled with the distribution of pathfinder elements and alkali metals, hint at the presence of hydrothermal systems, volcanic activity, or intrusive bodies that may host economically viable mineral deposits.

Geological Setting:

The examination of the geological setting has provided further validation of these findings. The presence of volcanic rocks, bladed quartz veins, and alteration zones such as hematite-sericite alteration suggests a geological environment conducive to mineralization, possibly of epithermal or mesothermal origin. These geological features, combined with the observed mineralization patterns, strengthen the case for further exploration efforts in the area.

Geophysical Signatures:

Geophysical signatures, including magnetic anomalies and conductive bodies, have provided additional evidence of mineralized zones and alteration halos. These signatures serve as valuable guides for exploration efforts, helping to prioritize target areas for further investigation.

Exploration Implications:

The implications of the findings are significant for future exploration activities. While gold and silver may not be present in surface rock samples, their absence could be attributed to various factors such as leaching, precipitation depth, or structural controls. However, the presence of bladed quartz veins and zones of chlorite, hematite-sericite, and breccia, along with the gold and silver grades in vegetation samples, suggests the need for deeper exploration activities. Future activities could include drilling, geophysical surveys, and expanded geochemical analyses to confirm the presence and economic potential of mineral deposits at depth.

In summary, the study area exhibits characteristics indicative of various types of mineral deposits, including epithermal gold-silver deposits similar to the Blackwater Gold-Silver Mine. These findings underscore the importance of continued exploration efforts to assess the economic viability of this project and to delineate targets for future mining activities.


As the R Markdown data science journey draws to a close, the insights gained and the paths explored have left a lasting impression. Through the exploration of geological data, it is possible to unearth a wealth of knowledge even from the smallest of datasets and under the deepest of till and vegetation covers – from the subtle correlations between elements to the geological signatures etched into the earth itself.

In the end, my adventure in R Markdown (through R Studio) has not only equipped me with the tools to unravel the mysteries of the earth but also refuelled my drive to discovery. I can't help but wonder what new revelations can be uncovered in the vast expanse of biogeochemical exploration as more data becomes available. So the next time you're sorting through data in Excel and copying-pasting remember the power of data science — contact us today and turn your chaotic data into the next groundbreaking discovery.

Proof Project Exploratory Data Analysis and Compilation of Vegetation (Pine Tree Twig) Samples 2018 and 2022 title page featuring a red Jeep on a resource road
Title Page using R Markdown

About the Author

Dr. Diana Benz has 28 years of experience in the mineral exploration industry searching for diamonds and metals in a range of roles: from heavy minerals lab technician to till sampler, rig geologist, project manager and business owner/lead consultant. She has a Bachelor of Science in General Biology, a Master of Science in Earth Sciences researching diamond indicator mineral geochemistry, and a PhD in Natural Resources and Environmental Studies using geochemical multivariate statistical analysis techniques to interpret biogeochemical data for mineral exploration. Diana has conducted fieldwork in Canada (BC, NWT, YT and ON) as well as in Greenland. She has also been involved, remotely through a BC-based office, in mineral exploration projects in South America, Africa, Eurasia, Australia and the Middle East. Diana owns Takom Exploration Ltd., a boutique geological and environmental firm focused on metal exploration in BC and the Yukon.


bottom of page