Libraries
master_df <- read_csv("fitbit_master_database.csv", guess_max = 100000)
analysis_df <- master_df %>%
filter(minute >= ymd_hms("2026-01-01 00:00:00"))Import & Prepare Data
This visualization highlights the “Step Gap”—the difference between the base daily movement and the movement generated during tracked gym sessions.
ambient_steps <- analysis_df %>%
filter(Workout_Type == "None") %>%
mutate(date = as_date(minute)) %>%
group_by(date) %>%
summarize(steps = sum(steps, na.rm = TRUE)) %>%
mutate(Type = "Ambient (Non-Gym)")
total_steps <- analysis_df %>%
mutate(date = as_date(minute)) %>%
group_by(date) %>%
summarize(steps = sum(steps, na.rm = TRUE)) %>%
mutate(Type = "Total Daily Steps")
step_comparison <- bind_rows(ambient_steps, total_steps)
p1 <- ggplot(step_comparison, aes(x = date, y = steps, color = Type, group = Type)) +
geom_line(linewidth = 0.5) +
geom_point(size = 1) +
geom_text(
data = filter(step_comparison, Type == "Total Daily Steps" & steps > 10000),
aes(label = paste0(format(date, "%b %d"), "\n", format(steps, big.mark = ","))),
nudge_x = 4.5,
nudge_y = 600.0,
fontface = "bold",
size = 3,
show.legend = FALSE
) +
scale_color_manual(values = c("Ambient (Non-Gym)" = "#9467bd", "Total Daily Steps" = "#2ca02c")) +
theme_minimal() +
labs(title = "Total vs. Ambient Daily Step Count",
subtitle = "Markers indicate days exceeding 10,000 total steps",
x = "Date", y = "Steps", color = "Measurement") +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.2)), labels = scales::comma) +
theme(legend.position = "right")
ggplotly(p1)Analysis: The green line represents your full daily achievement, while the purple line shows the baseline “lifestyle” movement. The gaps between these lines indicate days where gym sessions (Cardio or Warm-ups) contributed significantly to the overall. Notice how on high-step days, the ambient movement often stays consistent, suggesting the gym is an addition to activity rather than a replacement.
Understanding which zones spend the most time in helps categorize the “strain” of each workout modality.
workout_df <- analysis_df %>%
filter(Workout_Type != "None") %>%
filter(!is.na(heart_rate))
p2 <- ggplot(workout_df, aes(x = heart_rate, fill = Workout_Type)) +
geom_density(alpha = 0.6, color = "white", linewidth = 0.5) +
scale_fill_manual(values = c("Warm up" = "#f28e2b", "Cardio" = "#e15759", "Weightlifting" = "#4e79a7")) +
theme_minimal() +
labs(title = "Heart Rate Distribution by Modality",
x = "Heart Rate (BPM)", y = "Density")
ggplotly(p2)Analysis: Cardio (red) shows a distinct right-shift, spending the majority of time in high-aerobic zones. Weightlifting (blue) displays a broader, multi-modal distribution; this reflects the intermittent nature of lifting, where heart rate spikes during a set and recovers during rest periods.
We evaluate the relationship between cardiovascular strain and caloric expenditure to determine which sessions are most “efficient.”
p3 <- ggplot(workout_df, aes(x = heart_rate, y = calories, color = Workout_Type)) +
geom_point(alpha = 0.4, size = 1.5) +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE, linewidth = 1) +
scale_color_manual(values = c("Warm up" = "#f28e2b", "Cardio" = "#e15759", "Weightlifting" = "#4e79a7")) +
theme_minimal() +
labs(title = "Caloric Burn vs. Cardiovascular Exertion",
x = "Heart Rate (BPM)", y = "Calories/Min")
ggplotly(p3)Analysis: This scatterplot shows a strong linear correlation across all types. However, the slopes of the trend lines reveal that Weightlifting often generates a higher caloric “return” at lower heart rates compared to steady-state Cardio, likely due to the metabolic cost of moving heavy external loads.
Looking into the outlier:
## # A tibble: 1 × 4
## minute Workout_Type heart_rate calories
## <dttm> <chr> <dbl> <dbl>
## 1 2026-02-23 10:38:00 Weightlifting 119 22.7
# 1. Capture the exact timestamp of the highest calorie minute
outlier_row <- workout_df %>%
slice_max(calories, n = 1)
outlier_time <- outlier_row %>% pull(minute)
# 2. Find the start time of the specific WEIGHTLIFTING block on that day
lift_start <- workout_df %>%
filter(as_date(minute) == as_date(outlier_time),
Workout_Type == "Weightlifting") %>%
summarize(start_time = min(minute)) %>%
pull(start_time)
# 3. Calculate the difference (Time into the Weightlifting session)
time_offset <- difftime(outlier_time, lift_start, units = "mins")
# Display the results
paste("The outlier occurred at", outlier_time,
"which was", time_offset,
"minutes into the Weightlifting session.")## [1] "The outlier occurred at 2026-02-23 10:38:00 which was 16 minutes into the Weightlifting session."
# 1. Compare all Monday WEIGHTLIFTING workouts
monday_review <- workout_df %>%
# Isolation Step: Only look at the modality where the outlier occurred
filter(Workout_Type == "Weightlifting") %>%
# Filter for Mondays
filter(wday(minute, label = TRUE) == "Mon") %>%
# Group by the specific date
mutate(date = as_date(minute)) %>%
group_by(date) %>%
# Summarize the core stats for the entire session
summarize(
total_duration_mins = n(),
total_calories = sum(calories, na.rm = TRUE),
avg_heart_rate = round(mean(heart_rate, na.rm = TRUE), 1),
max_heart_rate = max(heart_rate, na.rm = TRUE)
)
# 2. Render the table in the portfolio
kable(monday_review, caption = "Monday Weightlifting Performance Comparison")| date | total_duration_mins | total_calories | avg_heart_rate | max_heart_rate |
|---|---|---|---|---|
| 2026-01-05 | 27 | 191.4079 | 107.6 | 122 |
| 2026-02-02 | 31 | 208.2124 | 107.8 | 124 |
| 2026-02-09 | 25 | 199.2496 | 107.4 | 115 |
| 2026-02-16 | 21 | 229.1774 | 116.1 | 130 |
| 2026-02-23 | 41 | 367.2921 | 108.0 | 127 |
| 2026-03-02 | 25 | 199.2497 | 103.2 | 120 |
| 2026-03-23 | 32 | 270.1477 | 109.7 | 127 |
| 2026-03-30 | 35 | 280.0702 | 105.4 | 126 |
A comparative review of performance across all Monday sessions, which historically serve as the “volume lead” for the training week.
monday_review <- workout_df %>%
filter(Workout_Type == "Weightlifting") %>%
filter(wday(minute, label = TRUE) == "Mon") %>%
mutate(date = as_date(minute)) %>%
group_by(date) %>%
summarize(
total_duration_mins = n(),
total_calories = sum(calories, na.rm = TRUE),
avg_heart_rate = round(mean(heart_rate, na.rm = TRUE), 1)
)
monday_long <- monday_review %>%
pivot_longer(cols = c(total_calories, avg_heart_rate),
names_to = "metric", values_to = "value") %>%
mutate(metric = case_when(
metric == "total_calories" ~ "Total Calories",
metric == "avg_heart_rate" ~ "Average Heart Rate (BPM)"
))
p4 <- ggplot(monday_long, aes(x = date, y = value, fill = metric)) +
geom_col(show.legend = FALSE) +
facet_wrap(~metric, scales = "free_y") +
theme_minimal() +
scale_fill_manual(values = c("Total Calories" = "#e15759", "Average Heart Rate (BPM)" = "#4e79a7")) +
labs(title = "Monday Weightlifting: Intensity vs. Volume", x = "Date", y = NULL)
ggplotly(p4)Analysis: By faceting Calories against Heart Rate, we can spot the Feb 23rd outlier immediately. While Heart Rate stayed within the typical Monday range, the Caloric expenditure spiked significantly, suggesting either a massive increase in lifting volume or a period of sensor instability discussed in the technical audit.
Final Analysis: Variance of Feb 23rd
Performance
The session on Feb 23rd exhibited a 60% increase in total caloric expenditure compared to the February Monday average. * Temporal Analysis: A significant outlier (22.7 kcal/min) was identified. By calculating the delta between the Weightlifting start time (10:22 AM) and the peak (10:38 AM), we confirmed the anomaly occurred exactly 16 minutes into the lifting block. * Root Cause: While the specific lift is unrecorded, the 16-minute mark correlates with a self-reported period of high physical exertion following a 6-day recovery window. * Conclusion: The increased session volume and intensity were genuine results of extended recovery, though the specific magnitude of the peak was likely amplified by sensor instability during heavy mechanical exertion.