
Data Analyst Jobs
Practical, battle‑tested advice to make your analyses faster, clearer, and more impactful—whether you're working in spreadsheets, SQL, Python/R, or a BI tool.
Keep it simple, test assumptions early, and show the decision, not just the data.
"Sign‑ups dropped 10% WoW; is it acquisition, conversion, or tracking?"
"Which channels drive highest LTV given last 6 months of cohorts?"
Are the main metrics stable? Any tracking breaks? Which 2 segments are most different? What's the simplest story the data tells?
For complete time series (even when no events).
with dates as (
select date_trunc('day', d)::date as d
from generate_series('2025-01-01'::date, '2025-12-31'::date, interval '1 day') as gs(d)
)
select d.d, coalesce(sum(o.amount),0) as revenue
from dates d
left join orders o on o.order_date::date = d.d
group by 1
order by 1;Using window functions.
select category, product, revenue
from (
select category, product, revenue,
row_number() over (partition by category order by revenue desc) as rn
from sales
) s where rn <= 3;select * from (
select *, row_number() over (partition by user_id order by updated_at desc) as rn
from user_profile
) t where rn = 1;with signup as (
select user_id, date_trunc('week', signup_at)::date as cohort_week from users
),
activity as (
select user_id, date_trunc('week', activity_at)::date as act_week from events
)
select s.cohort_week,
date_diff('week', s.cohort_week, a.act_week) as weeks_since,
count(distinct a.user_id) * 1.0 / count(distinct s.user_id) over (partition by s.cohort_week) as retention
from signup s
left join activity a using (user_id)
where a.act_week is not null;import pandas as pd
import numpy as np
# Quick data health
summary = df.describe(include='all').T
summary['null_pct'] = df.isna().mean().round(3)
# Date spine join
date_index = pd.date_range(df.date.min(), df.date.max(), freq='D')
spine = pd.DataFrame({'date': date_index})
series = spine.merge(df.groupby('date', as_index=False)['revenue'].sum(), on='date', how='left').fillna(0)
# Cohort index
df['cohort'] = df['signup_at'].dt.to_period('W').apply(lambda r: r.start_time)
df['weeks_since'] = ((df['event_at'] - df['cohort']).dt.days // 7).clip(lower=0)library(dplyr)
library(lubridate)
summary <- df %>% summarise(across(everything(), list(nas = ~mean(is.na(.)))))
spine <- tibble(date = seq.Date(min(df$date), max(df$date), by = 'day'))
series <- spine %>% left_join(df %>% count(date, wt = revenue, name = 'revenue'), by='date') %>% mutate(revenue = coalesce(revenue, 0))/data, /notebooks, /src, /reports, /docs.requirements.txt/environment.yml; pin versions.Readability (CTEs, naming) / Correctness (joins, filters) / Performance (scans, predicates).
Great analysis is a service business: make it easy for others to act.