Introduction
As a database primer, the intention of this article is to
provide an introduction to Universal Product Codes, and their usage in a relational database.
Universal Product Codes are codes used to uniquely identify
retail products in many countries. The system originated in the United States
in the 1970s to make grocery checkouts faster. Universal codes are printed on
product packaging in the form of barcodes, for scanning at checkout. The code
is also generally printed in text just below the barcode. Being universal means
that the code on any given product will be the same throughout the country it
is sold in.
Universal Product Codes come in more than one format. The
most popular format is called the UPC-A, followed by the UPC-E and the EAN, among
others. The UPC-A barcode represents 12 numerical characters (no letters or
special characters). Each number is represented by a unique barcode. A typical
UPC-A looks like this (this is for illustration only and is not a real barcode):
In this illustration, the first digit (1) is called the
prefix. The next five digits (22222) represent the manufacturer. All products
made by that manufacturer will have the same manufacturer code. The three
digits after that (333) represent the product family code. For instance, all
cereals manufactured by Kellogg will have the same code here, assuming Kellogg
has coded them as one “family”. The next two digits (44) represent the individual
product. An obvious limitation here is that each family can only have a hundred
products within it, but manufacturers get around that by playing with family
codes. The last digit (5) is the check digit, used to verify that the scanner
has read the previous 11 characters correctly. A formula is used to compute the
check digit from the first 11 characters, and then verified against this 12th
character.
The next most popular format is the UPC-E. It was invented
to allow encoding of small items, such as bars of candy or chewing gum, that are
too small to print a regular UPC-A on. It consists of 8 numeric characters.
Unlike the UPC-A, the UPC-E does not have a straightforward representation for
manufacturer, product family and product. It is simply a compressed version of
the UPC-A. The EAN code (originally European Article Number, now International
Article Number but still called EAN, not IAN) is a variation of the UPC-A that
adds an extra character to the left of the barcode, representing the country where
the product is sold.
A real example of a UPC-A code is 043000014240, which
belongs to the cereal Honey Bunches of Oats, made by the company Post. Here,
the manufacturer code is 43000, the product family code is 014, and the product
code is 24. The barcode representations of the UPC-A and the EAN respectively
look like this:
Note that the EAN has 13 digits, with the one extra digit on the left representing the country of retail, in this case the United States.
The remaining 12 are the same as the UPC-A.
Product Codes in Databases
If you are creating a database to store retail products,
chances are you will want to store and look up UPC-A codes at the very least,
and probably also UPC-E codes. These are the two most frequently used formats
in the United States; product codes in many other countries are based on the US
code system, and are very similar. You might also have the need to translate UPC-A
codes to UPC-E and back. For more granular lookup and analysis, you may also
want to store code components (manufacturer code, product family code, product
code) separately, in addition to the entire code. Since the codes are always
numeric, the seemingly obvious choice is to store all these codes and
components as numeric types. But think again, because often the leading
characters in the entire UPC-A, and in individual components, are one or more
zeroes. A manufacturer code of, say ‘053’ becomes '53' when saved as an integer, making it incorrect and useless. I
prefer storing the codes and components as char (they are always fixed length). However, many code related operations such as converting between UPC-A and UPC-E, calculating check digit etc require
the numerical values of code characters, so you will need to convert to a numeric type
in SQL code.
Probably the three most frequently used product code related operations are:
1. Calculate UPC-A check digit, given the first 11
characters
2. Convert a UPC-A into a UPC-E (not all
UPC-A codes can be converted, they have to meet certain conditions)
3. Convert a UPC-E into a UPC-A (all valid UPC-E
codes have a counterpart UPC-A)
The code to perform these operations, in the form of T-SQL functions, is below. I am the original author of this code, and provide it here for anybody to use for free.
______________________________________________________________________________
1. Calculate UPC-A check digit:
if exists (select 1 from INFORMATION_SCHEMA.ROUTINES where ROUTINE_NAME = 'udf_Calculate_UPCA_CheckDigit')
begin
drop function dbo.udf_Calculate_UPCA_CheckDigit
end
go
create function dbo.udf_Calculate_UPCA_CheckDigit
( @upca_11 varchar(20)
)
returns varchar(1)
as
begin
/**************************************************************************
2013/01/15 Hakim Ali
Function returns the Check Digit when passed in first 11 characters
of UPC-A.
**************************************************************************/
-- Local variables
declare @checkdigit varchar(1)
declare @sum_of_odds int
declare @sum_of_evens int
-- Initial settings
set @checkdigit = ''
set @upca_11 = ltrim(rtrim(isnull(@upca_11,'')))
set @sum_of_odds = 0
set @sum_of_evens = 0
-- Calculate CheckDigit
if (len(@upca_11) = 11 and isnumeric(@upca_11) = 1)
begin
set @sum_of_odds = convert(int,substring(@upca_11,1,1)) +
convert(int,substring(@upca_11,3,1)) +
convert(int,substring(@upca_11,5,1)) +
convert(int,substring(@upca_11,7,1)) +
convert(int,substring(@upca_11,9,1)) +
convert(int,substring(@upca_11,11,1))
set @sum_of_evens = convert(int,substring(@upca_11,2,1)) +
convert(int,substring(@upca_11,4,1)) +
convert(int,substring(@upca_11,6,1)) +
convert(int,substring(@upca_11,8,1)) +
convert(int,substring(@upca_11,10,1))
if ((((@sum_of_odds * 3) + (@sum_of_evens)))%10 = 0)
begin
set @checkdigit = '0'
end
else
begin
set @checkdigit = convert(varchar(1),(10 - (((@sum_of_odds * 3) + (@sum_of_evens)))%10))
end
end
return @checkdigit
end -- end create function
go
2. Convert UPC-A to UPC-E:
if exists (select 1 from INFORMATION_SCHEMA.ROUTINES where ROUTINE_NAME = 'udf_UPCA_to_UPCE')
begin
drop function dbo.udf_UPCA_to_UPCE
end
go
create function dbo.udf_UPCA_to_UPCE
( @upcA varchar(50) -- limiting to varchar(12) only reads first 12 characters of any longer strings passed in
)
returns varchar(8)
as
begin
/**************************************************************************
2013/01/17 Hakim Ali
Function to take in UPC-A, calculate and return its UPC-E.
**************************************************************************/
-- Local variables
declare @manuf_code varchar(5)
declare @product_code varchar(5)
declare @upcE varchar(8)
-- Initial settings
set @upcA = ltrim(rtrim(isnull(@upcA,'')))
set @upcE = ''
-- Calculate UPC-E
if (-- Required conditions for conversion: length must be 12, must start with 0 or 1, must have 0000 between positions 5 and 12
len(@upcA) = 12
and left(@upcA,1) in ('0','1')
and substring(@upcA,5,8) like '00%'
and isnumeric(@upcA) = 1
)
begin
set @manuf_code = substring(@upcA,2,5)
set @product_code = substring(@upcA,7,5)
-- ----------------------------------------------------------------------------
-- Note: iterations must be followed in order. If type 1 applies, use it over type 2 and so on.
-- ----------------------------------------------------------------------------
-- Type 1
if (right(@manuf_code,3) in ('000','100','200') and convert(int,@product_code) between 0 and 999)
begin
set @upcE = left(@upcA,1)
+ left(@manuf_code,2)
+ right(@product_code,3)
+ substring(@manuf_code,3,1)
+ right(@upcA,1)
end
-- Type 2
else if (right(@manuf_code,2) = '00' and convert(int,@product_code) between 0 and 99)
begin
set @upcE = left(@upcA,1)
+ left(@manuf_code,3)
+ right(@product_code,2)
+ '3'
+ right(@upcA,1)
end
-- Type 3
else if (right(@manuf_code,1) = '0' and convert(int,@product_code) between 0 and 9)
begin
set @upcE = left(@upcA,1)
+ left(@manuf_code,4)
+ right(@product_code,1)
+ '4'
+ right(@upcA,1)
end
-- Type 4
else if (convert(int,@product_code) between 5 and 9)
begin
set @upcE = left(@upcA,1)
+ left(@manuf_code,5)
+ right(@product_code,1)
+ right(@upcA,1)
end
end -- main if: Required conditions
return @upcE
end -- create function
3. Convert UPC-E to UPC-A:
if exists (select 1 from INFORMATION_SCHEMA.ROUTINES where ROUTINE_NAME = 'udf_UPCE_to_UPCA')
begin
drop function dbo.udf_UPCE_to_UPCA
end
go
create function dbo.udf_UPCE_to_UPCA
( @upcE varchar(50) -- limiting to varchar(8) only reads first 8 characters of any longer strings passed in
)
returns varchar(12)
as
begin
/**************************************************************************
2013/01/18 Hakim Ali
Function to take in UPC-E, calculate and return its UPC-A.
**************************************************************************/
-- local variables
declare @main_six varchar(6) -- The 6 important (middle) digits of UPCE
declare @upcA varchar(12) -- Calculated UPCA
-- Iniial settings
set @upcE = ltrim(rtrim(isnull(@upcE,'')))
set @upcA = ''
-- Calculate UPC-A
if (-- Required conditions for conversion: length must be 8, must start with 0 or 1
len(@upcE) = 8
and left(@upcE,1) in ('0','1')
and isnumeric(@upcE) = 1
)
begin
set @main_six = substring(@upcE,2,6)
if (right(@main_six,1) in ('0','1','2'))
begin
set @upcA = left(@upcE,1)
+ left(@main_six,2)
+ right(@main_six,1)
+ '0000'
+ substring(@main_six,3,3)
+ right(@upcE,1)
end
else if (right(@main_six,1) = '3')
begin
set @upcA = left(@upcE,1)
+ left(@main_six,3)
+ '00000'
+ substring(@main_six,4,2)
+ right(@upcE,1)
end
else if (right(@main_six,1) = '4')
begin
set @upcA = left(@upcE,1)
+ left(@main_six,4)
+ '00000'
+ substring(@main_six,5,1)
+ right(@upcE,1)
end
else
begin
set @upcA = left(@upcE,1)
+ left(@main_six,5)
+ '0000'
+ substring(@main_six,6,1)
+ right(@upcE,1)
end
end -- main if: Required conditions
return @upcA
end -- end create function