library(tidyverse)
library(lubridate)
library(plotly)
library(knitr)

Libraries

# Ingesting the Master Database enriched with CEDA and HRV metrics
master_df <- read_csv("fitbit_master_database.csv", guess_max = 100000)

analysis_df <- master_df %>%
  filter(minute >= ymd_hms("2026-01-01 00:00:00"))

biometric_df <- analysis_df %>%
  filter(Workout_Type != "None")

Import & Prepare Data

1. Peak Heart Rate Per Session

Peak HR per session reveals the maximum cardiovascular demand placed on the body during each workout. Sized by session duration, this chart shows effort ceiling across modalities over time.

peak_hr_df <- biometric_df %>%
  filter(!is.na(heart_rate)) %>%
  mutate(date = as_date(minute)) %>%
  group_by(date, Workout_Type) %>%
  summarise(
    peak_hr       = max(heart_rate, na.rm = TRUE),
    duration_mins = n(),
    .groups       = "drop"
  )

p1 <- ggplot(peak_hr_df, aes(x = date, y = peak_hr,
                              color = Workout_Type,
                              size  = duration_mins)) +
  geom_point(alpha = 0.8) +
  scale_color_manual(values = c(
    "Warm up"       = "#f28e2b",
    "Cardio"        = "#e15759",
    "Weightlifting" = "#4e79a7"
  )) +
  scale_size_continuous(range = c(3, 9), guide = "none") +
  theme_minimal() +
  labs(
    title    = "Peak Heart Rate Per Session",
    subtitle = "Dot size = session duration  |  color = workout modality",
    x        = "Date",
    y        = "Peak HR (BPM)",
    color    = NULL
  ) +
  theme(
    legend.position  = "bottom",
    panel.grid.minor = element_blank()
  )

ggplotly(p1, tooltip = c("x", "y", "color", "size"))

Analysis: Each dot represents one session’s peak cardiovascular demand. Cardio sessions (red) consistently reach higher peak HR than Weightlifting (blue), which clusters in a mid-range band. The February 23rd Weightlifting session stands out immediately as a dot sitting well above the typical Weightlifting band — the same outlier investigated in the forensic audit below. Dot size reveals that longer sessions don’t always produce higher peak HR, suggesting intensity and duration are independent variables in this training block.

2. Biomechanical Efficiency

By plotting Steps against Heart Rate, we can visualize the “Cardiovascular Cost” of movement.

p2 <- biometric_df %>%
  filter(!is.na(steps) & !is.na(heart_rate)) %>%
  ggplot(aes(x = heart_rate, y = steps, color = Workout_Type)) +
  geom_point(alpha = 0.4, size = 1.5) +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE, linewidth = 1) +
  scale_color_manual(values = c("Warm up" = "#f28e2b", "Cardio" = "#e15759", "Weightlifting" = "#4e79a7")) +
  theme_minimal() +
  labs(title = "Steps vs. Heart Rate Correlation", 
       x = "Heart Rate (BPM)",
       y = "Steps per Minute")

ggplotly(p2)

Analysis: This scatterplot effectively “fingerprints” the workout styles. Cardio (red) shows a clear linear correlation—as steps increase, heart rate follows. Weightlifting (blue) creates a vertical cluster, proving high cardiovascular strain even when mechanical movement (steps) is near zero. This validates the “intensity” of lifting sessions that traditional step-counters often ignore.

3. Recovery: Ambient RMSSD

RMSSD (Root Mean Square of Successive Differences) is the gold standard for tracking parasympathetic recovery.

daily_recovery <- analysis_df %>%
  filter(!is.na(rmssd) & rmssd > 0) %>%
  group_by(date = as_date(minute)) %>%
  summarize(avg_recovery = mean(rmssd, na.rm = TRUE))

p3 <- ggplot(daily_recovery, aes(x = date, y = avg_recovery)) +
  geom_rect(
    aes(xmin = as_date("2026-01-23"), xmax = as_date("2026-03-23"),
        ymin = -Inf, ymax = Inf),
    fill = "grey80", alpha = 0.3, inherit.aes = FALSE
  ) +
  annotate("text",
           x     = as_date("2026-02-22"),
           y     = max(daily_recovery$avg_recovery) * 0.92,
           label = "No data\n(export gap)",
           size  = 3, color = "grey50") +
  geom_segment(aes(xend = date, yend = 0), color = "#9467bd", linewidth = 0.5) +
  geom_point(size = 3, color = "#9467bd") +
  theme_minimal() +
  labs(title = "Daily Recovery Performance (RMSSD)",
       x = "Date",
       y = "Avg RMSSD (ms)")

ggplotly(p3)

Analysis: The “Lollipop” chart shows your recovery trend across the month. Peaks in RMSSD (e.g., mid-January) correlate with rest days or higher quality sleep. A downward trend in these dots over several days would be a leading indicator of overtraining, signaling a need to reduce volume in the following week.

4. Forensic Audit: Metabolic Output vs Sweat Response

A deep dive into the Feb 23rd caloric spike identified in the Peak HR chart above. Despite CEDA sensor limitations during heavy lifting, the Feb 23rd window produced sufficient signal to validate the caloric anomaly.

audit_data <- analysis_df %>%
  filter(as_date(minute) == ymd("2026-02-23")) %>%
  filter(minute >= ymd_hms("2026-02-23 10:30:00") & 
         minute <= ymd_hms("2026-02-23 10:50:00"))

p4 <- ggplot(audit_data, aes(x = minute)) +
  geom_line(aes(y = calories, color = "Calories"), linewidth = 1) +
  geom_line(aes(y = `ceda magnitude real micro siemens` * 10, color = "Sweat Response (CEDA)"), linewidth = 1) +
  scale_color_manual(values = c("Calories" = "#e15759", "Sweat Response (CEDA)" = "#76b7b2")) +
  theme_minimal() +
  labs(
    title = "Metabolic Output vs. Nervous System Response",
    subtitle = "Feb 23rd: Validating calorie spikes against CEDA (scaled x10)",
    x = "Time",
    y = "Relative Magnitude",
    color = "Metric"
  )

ggplotly(p4)

Analysis: By layering CEDA over the Calorie spike, we can scientifically validate data integrity. A true physiological peak of 22.7 kcal/min would typically be accompanied by a massive sympathetic nervous system response (sweat spike). If the CEDA line remains relatively stable while calories skyrocket, it confirms the spike was a sensor artifact caused by mechanical wrist interference rather than true exertion.



Final Technical Audit Note
The data reveals a consistent “Sensor Dropout” during heavy Weightlifting sessions for CEDA and HRV metrics. This is a known limitation of wrist-worn photoplethysmography (PPG) sensors during mechanical wrist flexion. By filtering for valid pings and cross-referencing with heart rate, we have maintained a high degree of data integrity for this report.