AI in E-Commerce: What's Actually Working in 2026

The e-commerce AI story in 2025 and 2026 has two chapters running simultaneously. One chapter is marketing-driven: every platform now calls something AI, regardless of whether it involves a model or just a rule-based recommendation engine from 2018. The other chapter is quieter but more interesting — specific implementations at specific companies that are producing measurable conversion improvements, lower return rates, and reduced customer service volume.

This post is about the second chapter.

Recommendation Engines: Old Problem, New Models

Product recommendations are the oldest AI problem in e-commerce. Collaborative filtering has been around since Amazon used it in the 1990s. What’s changed in 2026 isn’t the concept — it’s the model quality and the cold-start handling.

The traditional problem with collaborative filtering: it requires purchase history to work. A new user gets no recommendations, or generic ones. A new product gets no recommendations until it accumulates views. Both hurt.

Current approaches use multi-modal embeddings to sidestep cold start. A new product’s images, title, description, and category get embedded into a vector space alongside products that already have purchase history. New users’ browsing behavior (even within a single session) generates an embedding that finds nearest neighbors in that same space. The result is that useful recommendations appear within the first few seconds of a session, not after weeks of data accumulation.

Shopify’s built-in recommendations run on this architecture. Smaller merchants on custom platforms are building similar pipelines with pgvector (semantic similarity search in Postgres) and OpenAI’s embedding API for product text.

The measurable results come from careful testing. A recommendation widget that lifts click-through rate by 4% on the home page may have no impact on cart abandonment because the users who click are already browsing — not close to a purchase decision. Recommendations embedded on the cart page, product page, and post-purchase confirmation page tend to outperform home page placements by 3-5x on actual revenue impact, because those contexts carry higher purchase intent.

Search: The Case for Meaning Over Keywords

Traditional e-commerce search is keyword matching with some fuzziness for typos. It fails visibly when shoppers use the words they think in, not the words in your product catalog.

A shopper searches “office shoes comfortable all day standing” on a site whose catalog uses “ergonomic footwear,” “workplace-appropriate,” and “arch support.” Keyword search returns zero results. Semantic search returns the right products.

The standard implementation now:

Embed every product description at index time into a vector database
On search, embed the query using the same model
Return nearest-neighbor products by cosine similarity
Optionally re-rank with a lightweight cross-encoder model that scores the actual relevance of each candidate

The tooling has matured significantly. Elasticsearch ships hybrid search (keyword + vector) as a first-class feature. Typesense supports vector search natively. For teams already on Postgres, pgvector handles the similarity search without a separate service.

The business case is clearer than it looks. Zero-results searches are trackable: if your search returns no results and the user bounces, that’s measurable lost revenue. Semantic search on catalogs with 5,000+ products typically reduces zero-result searches by 60-80% according to Shopify’s published data from merchants who switched.

Returns Prediction: Preventive Rather Than Reactive

Clothing and footwear e-commerce have return rates of 20-40%. Each return costs $5-25 to process, plus the margin impact of items damaged in transit or going back into discounted inventory. AI returns prediction doesn’t eliminate returns, but it changes how they’re handled.

The model inputs that predict high return probability:

Size history: has this customer returned items in this size before?
Product reviews: items with reviews mentioning “runs small” or “material feels cheap” have higher return rates
Photo quality: products with fewer, lower-quality images have higher uncertainty and return rates
Customer segment: first-time buyers return at higher rates than repeat customers

A well-trained model can flag orders with >40% predicted return probability before they ship. What you do with that flag depends on your economics: for some merchants, it’s adding a “fit guide” email to that order’s post-purchase flow; for others, it’s proactively offering a free return label with the shipping confirmation to reduce the friction cost; for a few, it’s adding a human review step for unusually expensive items.

Smaller implementations have seen 8-15% reduction in return processing volume from this approach. The model itself is relatively simple — gradient boosting over tabular features works well here, and interpretability matters (you want to understand why an order is flagged).

Customer Service Deflection: Where the ROI Is Clearest

AI in customer service has the most concrete ROI case in e-commerce, specifically for deflecting “where is my order” (WISMO) queries.

WISMO queries typically represent 40-60% of e-commerce support ticket volume. They’re repetitive, have a known answer (look up the order, return tracking info), and require no judgment. They’re ideal for a well-scoped AI agent.

A deployed WISMO agent does three things:

Authenticates the customer (email + order number, or session-based)
Queries the order management system and shipping provider APIs
Returns status with a tracking link and an estimated delivery window

The containment rate (queries resolved without escalating to a human) for WISMO-scoped agents runs 85-95% in production deployments. The remaining 5-15% are edge cases: orders stuck in customs, packages marked delivered but not received, address correction requests. Those get routed to humans with context pre-filled.

For non-WISMO queries — product questions, return policy disputes, partial refunds — the containment rate drops to 40-60% and the failure modes are more costly. Agents that confidently give wrong answers about return windows or pricing create more trust damage than they prevent in support cost.

The practical implication: start with a narrow scope (WISMO only), measure containment and customer satisfaction separately, then expand scope only where the model performance is demonstrably good. Broadening scope too fast is how AI customer service gets a bad reputation.

Personalized Pricing and Promotions

This one requires a careful framing. Dynamic pricing (showing different prices to different customers based on willingness-to-pay signals) is legal in most markets but creates trust problems when customers discover it — and they do discover it. This isn’t a recommended pattern for consumer-facing retail.

What is worth doing: personalized promotion targeting. Instead of sending a 20% discount to your entire list, a model predicts which customers are at-risk of churning (haven’t purchased in X days, purchase velocity dropping) and sends the discount only to that segment. The same promotion budget goes further because it’s targeted at customers who need it rather than customers who would have purchased anyway at full price.

This requires cohort-level analysis and A/B testing to validate. A discount sent to customers who wouldn’t have churned is pure margin loss — called “discount waste” in the industry. Measuring discount waste is as important as measuring the discount’s effect on conversion.

Email platforms like Klaviyo and Braze have built predictive churn models directly into their products. For teams building custom, scikit-learn with a gradient boosting classifier trained on purchase interval features is a reasonable starting point.

What Still Mostly Disappoints

AI-generated product descriptions at scale — the quality isn’t wrong, but it isn’t differentiated. Products on competing sites end up with similar AI-generated copy because the models produce similar prose for similar inputs. Category-specific copy still benefits from human writers who know the audience.

Conversational shopping assistants — the experience of “tell me what you’re looking for” via a chat interface converts poorly for most product categories. Users on e-commerce sites are browsing and filtering, not having a conversation. The chat paradigm adds friction compared to a well-implemented filter UI. There are narrow exceptions (custom jewelry, complex technical products) where conversation helps, but the general case hasn’t borne out.

AI-generated product photography — quality is improving fast. But e-commerce conversion depends heavily on trust, and trust depends on seeing the actual product, not a model-generated approximation. Returns from AI-photographed product listings run higher in most studies because the generated images don’t accurately represent texture, color in different lighting, or size proportions.

The implementations that work share a common trait: they’re solving a specific, well-defined problem (zero-result search, WISMO volume, cold-start recommendations) with a measurable outcome. The implementations that disappoint are usually trying to inject AI into an experience that already works, hoping the word “AI” in the press release does the work that the implementation doesn’t.

AI in E-Commerce: What's Actually Working in 2026

Recommendation Engines: Old Problem, New Models

Search: The Case for Meaning Over Keywords

Returns Prediction: Preventive Rather Than Reactive

Customer Service Deflection: Where the ROI Is Clearest

Personalized Pricing and Promotions

What Still Mostly Disappoints

Technical SEO for JavaScript Apps in 2026: What Google Actually Renders

AI-Assisted Technical Documentation: Keeping Docs Accurate When Code Changes Fast

More from AI Integration

AI-Assisted Technical Documentation: Keeping Docs Accurate When Code Changes Fast

The Vercel AI SDK in 2026: Streaming, Tool Calls, and Multi-Step Agents

LLM Hallucination in Production: Mitigation Strategies That Actually Work

Working notes from
the studio.

Join the conversation.

Recommendation Engines: Old Problem, New Models

Search: The Case for Meaning Over Keywords

Returns Prediction: Preventive Rather Than Reactive

Customer Service Deflection: Where the ROI Is Clearest

Personalized Pricing and Promotions

What Still Mostly Disappoints

Technical SEO for JavaScript Apps in 2026: What Google Actually Renders

AI-Assisted Technical Documentation: Keeping Docs Accurate When Code Changes Fast

More from AI Integration

AI-Assisted Technical Documentation: Keeping Docs Accurate When Code Changes Fast

The Vercel AI SDK in 2026: Streaming, Tool Calls, and Multi-Step Agents

LLM Hallucination in Production: Mitigation Strategies That Actually Work

Working notes fromthe studio.

Join the conversation.

Working notes from
the studio.