Most PMs treat their first ML project like a regular feature. They write a PRD, set a launch date, and ask the engineering team for an estimate. Three months later, the model ships at 71% precision. The PM has no idea what to do with that number.
This post is for PMs who want to be useful partners to their ML team.
Start with the problem, not the model
Applied scientists love to talk about models. Your job is to drag the conversation back to the user. Before any model gets built, get clear on the user, the context, the workaround, and the cost.
A weak PM asks, "Can we use a transformer for this?" The strong PM asks, "Right now, our support agents spend 14 minutes per ticket reading old emails. What part of that work can a model do?"
Marty Cagan calls this the difference between feature teams and product teams (Cagan 87). Feature teams build to spec. Product teams own outcomes. ML work amplifies the gap. Misframed problems become models that work in the notebook and die in production.
When you scope an ML problem, write a one-page problem brief. It should answer: what success looks like for the user, what data exists today, what you would do without ML, and the cost of a wrong prediction. The last question carries the most weight. A false positive on a fraud model means a customer gets locked out of their account. False negatives let thieves walk out with free shoes. These are different problems with different metrics.
Make success measurable before you write any code
Every ML project needs two layers of metrics. Online metrics measure business impact. Offline metrics measure model quality.
Online metrics are things like revenue per user, ticket resolution time, daily active users, or click-through rate. Offline metrics include precision, recall, AUC, and mean squared error. Your scientists will optimize the offline metric. The PM's job is to make sure the offline metric tracks the online one.
This sounds obvious. It is not. I shipped a model that beat offline AUC by four points. Click-through rate dropped 2% in the test. The model got better at predicting clicks on prior purchases. That offline metric was the wrong target.
Cassie Kozyrkov writes about this gap between model performance and business value (Kozyrkov). She argues that PMs should define the decision the model supports before anyone trains anything. What action does the prediction trigger? Who acts on it? If you cannot answer those questions, you are not ready for the work.
Get comfortable with probability
ML systems are probabilistic. They are wrong by design. Engineers from deterministic systems struggle with this fact. PMs who pretend not to are worse.
In interviews, you might get, "How would you launch a model that is 85% accurate?" The wrong answer is, "Wait until it hits 95%." A better answer covers confidence thresholds, fallback behavior, human review, and the cost of being wrong.
Here is a pattern that works. Route high-confidence predictions to automation. Send low-confidence ones to a human. Log everything for retraining. Audit the human's overrides monthly. If a loan application scores below 0.6, send it to an underwriter. Auto-approve scores above 0.95. The PM owns where to draw the line, because that line is a business decision.
Tradeoffs are your job
Every ML system has tradeoffs. Latency versus accuracy. Cost versus freshness. Recall versus precision. Cold-start coverage versus relevance. The science team can show you the curve. Picking the point on the curve is product work.
Andrew Ng has argued that data quality often beats model complexity (Ng). For most applied problems, a simple model on clean labels beats a complex one on messy data. When a scientist proposes a state-of-the-art architecture, ask about the baseline. Ask what a logistic regression would do. If nobody has tried it, that is your first ask.
Latency budgets matter too. A model that runs in 200ms feels fine in a chatbot. The same model is unusable in an ad-serving pipeline. Productionization questions belong in design review. Bring them up before the last sprint.
Plan for model decay
Every model decays in time. User behavior shifts, data distributions drift, and last quarter's training set stops matching this quarter's traffic. A launch begins the operations phase. The project keeps running for months.
Before you ship, agree on a few operational questions. What metric, when degraded, triggers a retrain? Who owns the retraining pipeline? How will you detect drift in production? If your data team cannot answer those, the model is not ready for users.
Lenny Rachitsky has written that the best AI PMs treat evals as a first-class artifact (Rachitsky). Build the eval set on day one. Version the file. Run it on every model candidate. Compare results across runs. An eval set is your regression test suite. Without one, you are shipping in the dark.
Bias and fairness belong in the brief
Fairness review belongs in the design brief. By the time the model is built, you are too late. If your model touches hiring, lending, healthcare, or housing, you have legal exposure and moral weight in the room.
Ask your scientists how the training data was collected, who is in it, who is missing, and how labels were assigned. Find out the cost of a wrong prediction for each subgroup. A fraud model with 99% accuracy that flags one group at 5x the rate of another is a broken model.
You do not need a PhD in fairness research. Ask, in plain words: who could this hurt and how would you spot the damage?
What this looks like in an interview
When an interviewer asks how you work with ML teams, pick a real project and walk through it. Cover the problem you scoped, the metric you defined, the tradeoff you owned, and one failure along the way. Be specific with numbers. Stay honest about the failure.
The PMs who get hired for AI roles do not pretend to be scientists. They know enough to be a useful partner. They own what scientists should not solve alone: the user, the metric, the policy, the rollout.
That is the job.
Works Cited
Cagan, Marty. Inspired: How to Create Tech Products Customers Love. 2nd ed., Wiley, 2017.
Kozyrkov, Cassie. "Why Businesses Fail at Machine Learning." Hacker Noon, 2019.
Ng, Andrew. "MLOps: From Model-Centric to Data-Centric AI." DeepLearning.AI, 2021.
Rachitsky, Lenny. "How the Best AI PMs Work." Lenny's Newsletter, 2024.