March 17, 202610 min read

From Receipt Chaos to Inventory Accuracy

By Stockcount Team

Key takeaway

Stockcount matches messy receipt text to the right inventory items using a four-layer pipeline: exact alias lookup, fuzzy embedding match, ingredient semantic search, and word overlap validation. Each confirmed match creates an alias that makes future matching instant. The system gets smarter with every receipt you process, requiring less interaction over time.

Every time you shop at the same store, you make slightly different decisions. This week you grab a bag of five avocados. Next week you pick up three loose ones. Same ingredient, same store, but your receipt says two completely different things.

"HEB BAG AVOCADOS 5CT" one week. "SMALL HASS AVOCADOS" the next.

For inventory software that processes receipts, this is a deceptively hard problem. The strings look nothing alike. The units are different. The price-per-item math is different. But they're the same thing sitting on the same shelf, and your stock count shouldn't care how they arrived.

Here's how we solved it.

The Problem: One Ingredient, Many Faces

Receipt text is chaotic. Stores abbreviate however they want, mix packaging details into product names, and change formats without warning. A single ingredient in your kitchen might show up on receipts as any of these:

HEB BAG AVOCADOS 5CT
SMALL HASS AVOCADOS
AVOCADO HASS EA
ORG AVOCADOS 4CT BAG

A human glances at these and immediately thinks "avocados." Software has to work a lot harder.

The naive approach is to match on keywords: if it says "avocado" somewhere, it's probably avocados. But that falls apart fast. "HEB ST AVOCADO OIL F" also contains "avocado," and it's a completely different product. Keyword matching creates false positives that silently corrupt your inventory data.

The enterprise approach is to maintain a massive product catalog and map every possible receipt string to it upfront. That's what companies like Instacart and UberEats do with their Universal Catalog, a central database of canonical products with store-specific variants mapped by a combination of software and human review teams. It works, but it requires enormous upfront investment that doesn't make sense for a tool built for individual kitchens and small operations.

We needed something that starts with zero knowledge and gets smarter with every receipt.

The Architecture: A Learning Pipeline

Stockcount uses a four-layer matching system that processes each receipt line item through progressively more expensive checks, stopping as soon as it finds a confident match.

Layer 1: Exact Alias Match

The first check is the fastest. We maintain a table of "aliases": every receipt string we've ever successfully matched to an ingredient. When "HEB BAG AVOCADOS 5CT" appears on a receipt and we've seen that exact string before, we know instantly that it's avocados, it came in a 5-count bag, and each avocado costs roughly what it cost last time.

This is a simple database lookup. No AI involved. It resolves in milliseconds and handles the most common case: you buy the same things from the same store repeatedly.

Layer 1.5: Fuzzy Alias Match

Sometimes receipt strings change slightly between visits. A store might print "HEB BAG AVOCADOS 5CT" one week and "HEB BAG AVOCADO 5CT" the next, singular instead of plural. An exact match misses this.

We generate a vector embedding for every alias when it's created. When an exact match fails, we run a similarity search against all known aliases for that vendor. A high threshold (0.92, tighter than general semantic search) catches trivial variations like pluralization, spacing, and abbreviation differences without false-matching across genuinely different products.

This layer fires only when Layer 1 misses, and it's still fast: a single embedding generation plus a pgvector query scoped to one vendor's aliases.

Layer 2: Ingredient Embedding Search

If no alias matches (exact or fuzzy), the item is genuinely new to this vendor. We fall back to semantic search across all ingredients in your database. The receipt text "SMALL HASS AVOCADOS" gets embedded and compared against your ingredient catalog. Your existing "Avocados" ingredient scores high on similarity, and the system proposes it as a match.

This is where the human enters the loop. The system presents the proposed match and asks you to confirm. If you confirm, a new alias is created, and "SMALL HASS AVOCADOS" is now permanently linked to your "Avocados" ingredient. Next time, Layer 1 handles it silently.

Layer 3: Word Overlap Validation

A safety net for Layer 2. When semantic search proposes a match, we validate that the receipt text and ingredient name share meaningful words. This catches cases where embeddings are misleadingly similar. Two strings might be close in vector space but refer to different things.

Get "The StockCount Mail"

A developer newsletter written by me, Jeremy. Short, occasional notes on food cost and counting.

The Key Insight: Aliases Carry Conversion Data

Matching the receipt string to the right ingredient is only half the problem. "HEB BAG AVOCADOS 5CT" means 1 bag = 5 avocados. "SMALL HASS AVOCADOS" at $0.55 each means 6 individual avocados. The conversion from receipt units to your inventory's base unit (each, grams, milliliters) is different for each packaging format.

In our system, each alias stores its own conversion data. When you first confirm "HEB BAG AVOCADOS 5CT" as avocados, the system asks: "What's inside the bag?" You say: 5 each. That conversion, 1 bag → 5 each, is stored on the alias, not on a global conversion table.

This matters because conversions are properties of specific packaging formats, not of ingredients in general. A bag of avocados contains 5. A bag of limes might contain 6. If you store "1 bag = 5" globally, it eventually applies to the wrong product. Scoping the conversion to the alias eliminates that class of bug entirely.

The second time "HEB BAG AVOCADOS 5CT" appears on a receipt, the system knows the ingredient (Layer 1 match), knows the conversion (5 each per bag), and knows the approximate cost (from the previous receipt). It processes the line item with zero interaction.

The Lifecycle: Self-Managing Aliases

Aliases aren't static. They evolve based on how you interact with the system.

Confidence builds over time. Every successful auto-match increments a counter on the alias. An alias that has matched 15 times without being overridden is rock-solid. One that was just created is less certain. This implicit confidence tracking means the system can make smarter decisions as it accumulates history.

Mistakes are correctable in the normal flow. If the system auto-matches a receipt string to the wrong ingredient, you override it during receipt review, the same screen you're already looking at. The old alias is deactivated (never deleted; we keep the history), and a new alias for the correct ingredient is created. No admin panel, no settings page. The correction happens where the problem is visible.

Re-confirmation reactivates. If an alias was deactivated by mistake and the same receipt string appears again, the normal confirmation flow creates a new alias (or reactivates the old one if the mapping is the same). The system is self-healing.

Naming the Ingredient, Not the Package

One lesson we learned the hard way: if the canonical ingredient name includes packaging details, the matching system works against you.

Early on, processing "HEB BAG AVOCADOS 5CT" produced an ingredient called "Avocados, Bagged 5ct." That name made sense for one receipt. But when "SMALL HASS AVOCADOS" came in the next week, the system saw "Avocados, Bagged 5ct" in the database and reasonably concluded this was a different product. Small hass avocados aren't "bagged 5ct" avocados. So it got flagged as new, creating a duplicate.

The fix was a naming convention enforced in the AI's system prompt: name the product, not the packaging. "Avocados" is an ingredient. "Bagged 5ct" is a property of how HEB sells them, which is exactly what the alias captures. The ingredient name is what you see when counting stock or building a recipe. "Avocados" on a shelf tells you what to count. "Avocados, Bagged 5ct" tells you something about a receipt you processed last month, which isn't useful when you're standing in front of a shelf with a clipboard.

Qualifiers earn their place in the ingredient name only when they affect how you count or cook. "Black Beans, Dried" vs. "Black Beans, Canned". Those are different items on different shelves with different uses in recipes. "Chicken Breast, Boneless Skinless" tells a cook something important. "Chicken Breast, 3-Pack" does not.

This convention maps to how the best restaurant inventory tools work. MarginEdge, the industry leader for restaurant back-of-house software, uses the same separation: "Vendor Items" are the raw invoice strings (unique per vendor), and "Products" are the generic, countable items you track in inventory and use in recipes. One Product can have many Vendor Items across different vendors and packaging formats. They explicitly recommend using the product's full, generic name, not a shorthand, and not packaging details.

Cost Intelligence Per Format

A side benefit of the alias architecture: cost tracking per packaging format comes almost for free. Since each alias records a cost entry every time it matches a receipt, we can show you that bagged avocados from HEB cost $0.60 per avocado while loose ones cost $0.80 each. Over time, this builds a rolling price history per format. Not just "what did avocados cost" but "what did avocados cost in each way I buy them."

On the ingredient detail page, this looks like:

Avocados (base unit: each)

HEB BAG AVOCADOS 5CT | 1 bag → 5 each | $0.60/each (last: Mar 10)

SMALL HASS AVOCADOS | individual | $0.80/each (last: Mar 17)

You can see at a glance whether the bag is actually a better deal, how prices trend over time per format, and whether switching vendors or packaging saves money. This is purchasing intelligence that most inventory tools, consumer or professional, don't offer at this level of granularity.

The Data Model

For those interested in the technical structure, here's how the entities relate:

Ingredient is the canonical item: "Avocados." It has a base unit (each, grams, or milliliters), a category (Produce), and this is what appears in your stock list, recipes, and reports. One ingredient can be purchased from multiple vendors in multiple formats.

IngredientVendor represents the relationship between an ingredient and a vendor: "HEB sells Avocados." It stores the canonical cost and vendor-specific metadata like brand and preferred status.

VendorProductAlias is the receipt string: "HEB BAG AVOCADOS 5CT." Each alias belongs to one IngredientVendor and stores its own conversion data (1 bag → 5 each), a vector embedding for fuzzy matching, lifecycle data (match count, active status, last matched date), and the latest cost.

AliasCostEntry is the price history: one entry per receipt encounter, recording the unit cost, receipt quantity, converted quantity, and line total. This powers the per-format cost trends.

The key relationship: one Ingredient → many IngredientVendors → many VendorProductAliases → many AliasCostEntries. Each layer adds specificity without complicating the one above it.

The Flywheel

The most important property of this system is that it gets better with use. Every receipt you process adds aliases. Every confirmation teaches the system a new mapping. Every override corrects a mistake.

The 5th receipt you process requires a lot of interaction. Most items are new, most aliases haven't been created yet. The 50th requires much less. The 200th is nearly silent.

That's the trajectory we designed for: an app that feels like work the first week and feels like magic by the third month.

Stockcount is an AI-powered inventory management tool that processes receipts through a conversational interface. It's designed for kitchens and small food operations that want professional-grade cost tracking without enterprise complexity.

Stop manually matching receipt items

Stockcount learns your vendors' receipt formats automatically. The more receipts you process, the less work you do. Voice-powered inventory that gets smarter over time.

Get "The StockCount Mail" — occasional notes from Jeremy on food cost and counting.

See pricing →