Fitbit Activity Analysis

library(tidyverse)
library(lubridate)
library(plotly)
library(knitr)

Libraries

master_df <- read_csv("fitbit_master_database.csv", guess_max = 100000)

analysis_df <- master_df %>%
  filter(minute >= ymd_hms("2026-01-01 00:00:00"))

Import & Prepare Data

1. Step Count Comparison

This visualization highlights the “Step Gap”—the difference between the base daily movement and the movement generated during tracked gym sessions.

ambient_steps <- analysis_df %>%
  filter(Workout_Type == "None") %>%
  mutate(date = as_date(minute)) %>%
  group_by(date) %>%
  summarize(steps = sum(steps, na.rm = TRUE)) %>%
  mutate(Type = "Ambient (Non-Gym)")

total_steps <- analysis_df %>%
  mutate(date = as_date(minute)) %>%
  group_by(date) %>%
  summarize(steps = sum(steps, na.rm = TRUE)) %>%
  mutate(Type = "Total Daily Steps")

step_comparison <- bind_rows(ambient_steps, total_steps)

p1 <- ggplot(step_comparison, aes(x = date, y = steps, color = Type, group = Type)) +
  geom_line(linewidth = 0.5) +
  geom_point(size = 1) +
  
  geom_text(
    data = filter(step_comparison, Type == "Total Daily Steps" & steps > 10000),
    aes(label = paste0(format(date, "%b %d"), "\n", format(steps, big.mark = ","))),
    nudge_x = 4.5,
    nudge_y = 600.0,
    fontface = "bold",
    size = 3,
    show.legend = FALSE
  ) +
  
  scale_color_manual(values = c("Ambient (Non-Gym)" = "#9467bd", "Total Daily Steps" = "#2ca02c")) +
  theme_minimal() +
  labs(title = "Total vs. Ambient Daily Step Count",
       subtitle = "Markers indicate days exceeding 10,000 total steps",
       x = "Date", y = "Steps", color = "Measurement") +
  scale_y_continuous(expand = expansion(mult = c(0.05, 0.2)), labels = scales::comma) +
  theme(legend.position = "right")

ggplotly(p1)

Analysis: The green line represents your full daily achievement, while the purple line shows the baseline “lifestyle” movement. The gaps between these lines indicate days where gym sessions (Cardio or Warm-ups) contributed significantly to the overall. Notice how on high-step days, the ambient movement often stays consistent, suggesting the gym is an addition to activity rather than a replacement.

2. Heart Rate Density

Understanding which zones spend the most time in helps categorize the “strain” of each workout modality.

workout_df <- analysis_df %>%
  filter(Workout_Type != "None") %>%
  filter(!is.na(heart_rate))

p2 <- ggplot(workout_df, aes(x = heart_rate, fill = Workout_Type)) +
  geom_density(alpha = 0.6, color = "white", linewidth = 0.5) +
  scale_fill_manual(values = c("Warm up" = "#f28e2b", "Cardio" = "#e15759", "Weightlifting" = "#4e79a7")) +
  theme_minimal() +
  labs(title = "Heart Rate Distribution by Modality",
       x = "Heart Rate (BPM)", y = "Density")

ggplotly(p2)

Analysis: Cardio (red) shows a distinct right-shift, spending the majority of time in high-aerobic zones. Weightlifting (blue) displays a broader, multi-modal distribution; this reflects the intermittent nature of lifting, where heart rate spikes during a set and recovers during rest periods.

3. Burn Efficiency

We evaluate the relationship between cardiovascular strain and caloric expenditure to determine which sessions are most “efficient.”

p3 <- ggplot(workout_df, aes(x = heart_rate, y = calories, color = Workout_Type)) +
  geom_point(alpha = 0.4, size = 1.5) +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE, linewidth = 1) +
  scale_color_manual(values = c("Warm up" = "#f28e2b", "Cardio" = "#e15759", "Weightlifting" = "#4e79a7")) +
  theme_minimal() +
  labs(title = "Caloric Burn vs. Cardiovascular Exertion",
       x = "Heart Rate (BPM)", y = "Calories/Min")

ggplotly(p3)

Analysis: This scatterplot shows a strong linear correlation across all types. However, the slopes of the trend lines reveal that Weightlifting often generates a higher caloric “return” at lower heart rates compared to steady-state Cardio, likely due to the metabolic cost of moving heavy external loads.

Looking into the outlier:

workout_df %>%
  slice_max(calories, n = 1) %>%
  select(minute, Workout_Type, heart_rate, calories)

## # A tibble: 1 × 4
##   minute              Workout_Type  heart_rate calories
##   <dttm>              <chr>              <dbl>    <dbl>
## 1 2026-02-23 10:38:00 Weightlifting        119     22.7

# 1. Capture the exact timestamp of the highest calorie minute
outlier_row <- workout_df %>%
  slice_max(calories, n = 1)

outlier_time <- outlier_row %>% pull(minute)

# 2. Find the start time of the specific WEIGHTLIFTING block on that day
lift_start <- workout_df %>%
  filter(as_date(minute) == as_date(outlier_time), 
         Workout_Type == "Weightlifting") %>%
  summarize(start_time = min(minute)) %>%
  pull(start_time)

# 3. Calculate the difference (Time into the Weightlifting session)
time_offset <- difftime(outlier_time, lift_start, units = "mins")

# Display the results
paste("The outlier occurred at", outlier_time, 
      "which was", time_offset, 
      "minutes into the Weightlifting session.")

## [1] "The outlier occurred at 2026-02-23 10:38:00 which was 16 minutes into the Weightlifting session."

# 1. Compare all Monday WEIGHTLIFTING workouts
monday_review <- workout_df %>%
  # Isolation Step: Only look at the modality where the outlier occurred
  filter(Workout_Type == "Weightlifting") %>%
  # Filter for Mondays
  filter(wday(minute, label = TRUE) == "Mon") %>%
  # Group by the specific date
  mutate(date = as_date(minute)) %>%
  group_by(date) %>%
  # Summarize the core stats for the entire session
  summarize(
    total_duration_mins = n(),
    total_calories = sum(calories, na.rm = TRUE),
    avg_heart_rate = round(mean(heart_rate, na.rm = TRUE), 1),
    max_heart_rate = max(heart_rate, na.rm = TRUE)
  )

# 2. Render the table in the portfolio
kable(monday_review, caption = "Monday Weightlifting Performance Comparison")

Monday Weightlifting Performance Comparison
date	total_duration_mins	total_calories	avg_heart_rate	max_heart_rate
2026-01-05	27	191.4079	107.6	122
2026-02-02	31	208.2124	107.8	124
2026-02-09	25	199.2496	107.4	115
2026-02-16	21	229.1774	116.1	130
2026-02-23	41	367.2921	108.0	127
2026-03-02	25	199.2497	103.2	120
2026-03-23	32	270.1477	109.7	127
2026-03-30	35	280.0702	105.4	126

4. Monday Weightlifting Deep-Dive

A comparative review of performance across all Monday sessions, which historically serve as the “volume lead” for the training week.

monday_review <- workout_df %>%
  filter(Workout_Type == "Weightlifting") %>%
  filter(wday(minute, label = TRUE) == "Mon") %>%
  mutate(date = as_date(minute)) %>%
  group_by(date) %>%
  summarize(
    total_duration_mins = n(),
    total_calories = sum(calories, na.rm = TRUE),
    avg_heart_rate = round(mean(heart_rate, na.rm = TRUE), 1)
  )

monday_long <- monday_review %>%
  pivot_longer(cols = c(total_calories, avg_heart_rate), 
                names_to = "metric", values_to = "value") %>%
  mutate(metric = case_when(
    metric == "total_calories" ~ "Total Calories",
    metric == "avg_heart_rate" ~ "Average Heart Rate (BPM)"
  ))

p4 <- ggplot(monday_long, aes(x = date, y = value, fill = metric)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~metric, scales = "free_y") + 
  theme_minimal() +
  scale_fill_manual(values = c("Total Calories" = "#e15759", "Average Heart Rate (BPM)" = "#4e79a7")) +
  labs(title = "Monday Weightlifting: Intensity vs. Volume", x = "Date", y = NULL)

ggplotly(p4)

Analysis: By faceting Calories against Heart Rate, we can spot the Feb 23rd outlier immediately. While Heart Rate stayed within the typical Monday range, the Caloric expenditure spiked significantly, suggesting either a massive increase in lifting volume or a period of sensor instability discussed in the technical audit.

Final Analysis: Variance of Feb 23rd Performance

The session on Feb 23rd exhibited a 60% increase in total caloric expenditure compared to the February Monday average. * Temporal Analysis: A significant outlier (22.7 kcal/min) was identified. By calculating the delta between the Weightlifting start time (10:22 AM) and the peak (10:38 AM), we confirmed the anomaly occurred exactly 16 minutes into the lifting block. * Root Cause: While the specific lift is unrecorded, the 16-minute mark correlates with a self-reported period of high physical exertion following a 6-day recovery window. * Conclusion: The increased session volume and intensity were genuine results of extended recovery, though the specific magnitude of the peak was likely amplified by sensor instability during heavy mechanical exertion.

Fitbit Activity Analysis

BT

2026-03-05

1. Step Count Comparison

2. Heart Rate Density

3. Burn Efficiency

4. Monday Weightlifting Deep-Dive