2013/11/15

T-SQL Throwdown at SQL Saturday Dallas (#255)

The first open-to-public running of my SQL game, T-SQL Throwdown, happened at SQL Saturday Dallas on 2013/11/02, at the University of Texas in Arlington. Thanks to the NTSSUG Board, and especially Ryan, Sri and Ganesh for making it possible for me to run the event at SQL Saturday.

This being my first presentation at the community level, I was a little afraid that nobody would show up. I had run it at my workplace a couple of times and it was generally liked, but at those times I had access to the audience to explain beforehand what this was about. But now, I could not predict how people would take to this idea based on only a small paragraph in the schedule. I was hoping for the best.

Turns out people did show up. My team and I had to explain the game a few times, and a very few people even left the room after hearing us because they weren't interested in playing a SQL game. But fortunately, most people stayed and participated, and I dare say even enjoyed the experience. I owe a huge debt of gratitude to my team of judges/experts/technical support who helped me run the event and improve it way beyond my original idea: Gabe F, Ed S, Patrick B, Adam P, John G, John S, Mark R, Kathy K.

The winning team (Brad, Ranjith, Marc) received a $100 gift card, runners up (Gonzalo, Ryan, Sal) got $50, sponsored by Think Finance. I received some very nice feedback via the session evals:
















A couple of user groups even expressed interest in running the event at their gatherings. Looking forward to more throwdown fun.

2013/07/28

Find Stored Procedures and Functions That Reference Certain Tables

During the data discovery process on a database that I have inherited from another designer, I often have the need to determine all the stored procedures that interact with a certain table. For instance, I may need to look up all the stored procedures that insert records into a certain table. Or all the stored procedures that look up a certain view. Here is a code snippet that allows me to plug in the name of the table/view, and returns all the code objects (stored procedures, functions, triggers etc) that interact with that table/view. It returns the object type, the name of the object, and the code within that object broken up into more than one Code Sequence if the code is lengthy enough (longer than 4000 characters). You could, of course, also plug in the name of a code object to return other code objects that look up your searched object, for instance to find all the stored procedures that use a certain function.

Just replace the "search_term" in the code snippet with the name of the table/view/stored procedure/function etc that you want to search on.

select distinct
ObjectType = o.type_desc
, ObjectName = o.name
, CodeSequence = c_display.colid
, Code = c_display.[text]
from sys.objects o
inner join sys.syscomments c_search
on c_search.id = o.[object_id]
inner join sys.syscomments c_display
on c_display.id = o.[object_id]
where o.type_desc not in ('INTERNAL_TABLE','SERVICE_QUEUE','SYSTEM_TABLE')
and c_search.[text] like '%search_term%'
order by 1, 2, 3

2013/07/17

Delete All Database User Accounts for a Given Server Login

Ever had the need to remove all database-level user accounts from SQL Server for a given server level login? Say an employee leaves your organization, and you need to remove her account from all databases. Do you do this manually? Do you use the GUI in SSMS? Do you script this out? What if the database level accounts do not have the same name as the login? What if the account name is different in every database? What if there are hundreds of databases on your server, making this a tedious exercise?

Fear not, here is a script that you can run on SQL Server that will drop all database-level user accounts for a specified login. Thanks to Jason Strate for this article which inspired me to create this script. Just set the variable @LoginName to the login for which you want accounts deleted, and execute.


/*===================================================================================================
2013/07/15 hakim.ali@SQLzen.com

        SQL Server 2005 and higher.

        Script to delete (drop) all database level user accounts for a given server login from all
        databases on a database server. This will drop users even if they have a different name from
        the server login, so long as the two are associated. This will not drop users from a database
        where there are schemas/roles owned by the user.
       
        ** USE ONLY IF YOU ARE AN EXPERIENCED DBA FAMILIAR WITH SERVER LOGINS, DATABASE USERS, ROLES,
        SCHEMAS ETC. USE AT YOUR OWN RISK. BACK UP ALL DATABASES BEFORE RUNNING. **
=====================================================================================================*/
use [master]
go

--------------------------------------------------------------------------------------
-- Set the login name here for which you want to delete all database user accounts.
declare @LoginName nvarchar(200); set @LoginName = 'LOGIN_NAME_HERE'
--------------------------------------------------------------------------------------

declare @counter int
declare @sql nvarchar(1000)
declare @dbname nvarchar(200)

-- To allow for repeated running of this script in one session (for separate logins).
begin try drop table #DBUsers end try begin catch end catch

----------------------------------------------------------
-- Temp table to hold database user names for the login.
----------------------------------------------------------
create table #DBUsers
(                       ID          int identity(1,1)
                ,       LoginName   varchar(200)
                ,       DB          varchar(200)
                ,       UserName    varchar(200)
                ,       Deleted     bit
)

-- Add all user databases.
insert into #DBUsers
(                       LoginName
                ,       DB
                ,       Deleted
)
select                  @LoginName
                ,       name
                ,       1
from            sys.databases
where           name not in ('master','tempdb','model','msdb')
and             is_read_only = 0
and             [state] = -- online
order by        name

----------------------------------------------------------
-- Add database level users (if they exist) for the login.
----------------------------------------------------------
set @counter = (select min(ID) from #DBUsers)

while exists (select 1 from #DBUsers where ID >= @counter)
begin
        set @dbname = (select db from #DBUsers where ID = @counter)
        set @sql = '
        update          temp
        set             temp.UserName = users.name
        from            sys.server_principals                      as logins
        inner join      [' + @dbname + '].sys.database_principals  as users
                        on users.sid = logins.sid
                        and logins.name = ''' + @LoginName + '''
        inner join      #DBUsers                                   as temp
                                on temp.DB = ''' + @dbname + ''''

        exec sp_executesql @sql
       
        set @counter = @counter + 1
end

-- Don't need databases where a login-corresponding user was not found.
delete          #DBUsers
where           UserName is null

----------------------------------------------------------
-- Now drop the users.
----------------------------------------------------------
set @counter = (select min(ID) from #DBUsers)

while exists (select 1 from #DBUsers where ID >= @counter)
begin
        select  @sql = 'use [' + DB + ']; drop user [' + UserName + ']'
        from    #DBUsers
        where   ID = @counter

        --select @sql
        begin try exec sp_executesql @sql end try begin catch end catch
        set @counter = @counter + 1
end

----------------------------------------------------------
-- Report on which users were/were not dropped.
----------------------------------------------------------
set @counter = (select min(ID) from #DBUsers)

while exists (select 1 from #DBUsers where ID >= @counter)
begin
        set @dbname = (select db from #DBUsers where ID = @counter)
        set @sql = '
        update          temp
        set             temp.Deleted = 0
        from            sys.server_principals                       as logins
        inner join      [' + @dbname + '].sys.database_principals   as users
                        on users.sid = logins.sid
                        and logins.name = ''' + @LoginName + '''
        inner join      #DBUsers                                    as temp
                                on temp.DB = ''' + @dbname + ''''

        exec sp_executesql @sql
       
        set @counter = @counter + 1
end

-- This shows the users that were/were not dropped, and the database they belong to.
if exists (select 1 from #DBUsers)
begin
        select                  LoginName
                        ,       [Database]     = DB
                        ,       UserName       = UserName
                        ,       Deleted        = case Deleted when 1 then 'Yes' else 'No !!!!!!' end
        from            #DBUsers
        order by        DB
end
else
begin
        select [No Users Found] = 'No database-level users found on any database for the login "' + @LoginName + '".'
end

/*===================================================================================================
Not automatically dropping the login. If there are database level users that were not dropped,
dropping the login will create orphaned users. Enable at your discretion.
=====================================================================================================*/
/*
set @sql = 'drop login [' + @LoginName + ']'
exec sp_executesql @sql
*/



2013/07/03

Tables and Clustered Indexes Part 2

A little while ago I wrote a piece on why a table does not so much have a clustered index as a table is a clustered index*. Here's a follow up with some SQL code that further strengthens that assertion, at least the way it is implemented in SQL Server.

Let's start by creating a very simple table and inserting one record into it:

create table dbo.test (id int)

insert into dbo.test (id) select 1

Note that this table is currently a heap, it does not have any indexes of any type. Verify that the table has one record:

select * from dbo.test


Here is the execution plan for this select:


As expected, we have a table scan on the heap. Now let's add a non-clustered index to this table on the id column: 

create nonclustered index NCIndex_test on dbo.test (id)

Run the select statement again, and you will see the same result again, as expected. The execution plan, of course, is different. Now the query uses the non-clustered index instead of the table scan (I won't get into seeks vs scans here, different conversation).


Now let's try disabling (not dropping) the non-clustered index:

alter index NCIndex_test on dbo.test disable

Once again, run the select statement. We now see the same result and execution plan as in our first select statement above. The index is ignored, and the table is treated like a heap, resulting in a table scan. Again, so far all of this is as expected.

Now, let's repeat the above exercise with a clustered index. To keep everything else the same between the two exercises, I will drop and recreate the table.

drop table dbo.test

create table dbo.test (id int)

insert into dbo.test (id) select 1

Now, running our select will again show us the table scan:

select * from dbo.test



Let us now create the clustered index:

create clustered index CIndex_test on dbo.test (id)

Running our select statement again, here's what our execution plan looks like with the clustered index:

select * from dbo.test


Now let's disable (not drop) the clustered index:

alter index CIndex_test on dbo.test disable

Now if you run the select statement, what do you expect to see? I would have expected to see the table treated as a heap, and a table scan in the execution plan. It seems reasonable to assume that clustered table minus clustered index equals heap, right?

select * from dbo.test


Apparently, not so. Disabling the non-clustered index causes the query to treat the table like a heap, but disabling the clustered index does not have the same effect. Disabling the clustered index disables the table itself.

Thus, when you add a clustered index to a table, the table becomes the clustered index*. A table does not so much have a clustered index as a table is a clustered index.


* There is much more to a clustered index, including nodes, leaves, a root, a b-tree structure etc. This is a simplification to help illustrate the point. As an aside, if you drop the clustered index (drop index CIndex_test on dbo.test), the table goes back to being a heap and returns results from the select statement. Here is what Books Online/MSDN has to say on the subject: "Disabling an index prevents user access to the index, and for clustered indexes to the underlying table data... Disabling a clustered index on a table prevents access to the data; the data still remains in the table, but is unavailable for data manipulation language (DML) operations until the index is dropped or rebuilt."

2013/05/07

Thoughts on Database Design and Development

I recently worked on a database of somebody else's design that had me wondering why the designer did certain things. I generally give the designer the benefit of doubt when I don't know their reasons, but there are certain mistakes that are plainly a result of a lack of experience/expertise. So I decided to write up a few guidelines on database design and development. This is a set of general recommendations for putting together a relational database for transactional operations. Every pointer in here may not always be the right thing for your situation, but in general if you understand the reasons behind these guidelines then you should be able to put together a decent database.


Database Design

Understand the Business: Before you even begin, understand the business problem you are trying to solve. This will help you determine the requirements yourself instead of relying on somebody else to give them to you. You will also be able to come up with a more optimal design by cutting out the translator (requirements gatherer) and doing it yourself. You will know "why" you need to do certain things, as opposed to being told "what" to do.

Primary Keys: Generally, all tables should have a primary key which defaults to a clustered index. Avoid heaps. As for the primary key itself, call it "ID", apply data type integer, and give it the identity/auto-increment property. While this works for most tables, a not very uncommon exception is a child table that will almost always be referenced by its parent ID foreign key. Think Sales_Order and Sales_Order_Detail. You will probably never look up detail records directly by detail ID; it is much more likely that you will first find the Sales_Order ID and then look for its matching details (children). In such tables, you're better off creating a composite PK/clustered index on two fields in this order: Parent.ID, Child.ID. (Yes this violates 3NF, but that's another discussion.) Do not use GUIDs as your primary key / clustered index. There is almost never a good reason to do so. An integer (or bigint if necessary) works much better.

Foreign Keys: All table relationships must be enforced via a formal foreign key relationship. Don't use undocumented foreign keys, i.e. using the primary key values of table A as column values in table B without formally establishing the foreign key relationship. I have seen many databases that ignore this, and among other things it causes the problem of orphaned records and makes data discovery very difficult.

Indexes: Indexes are great for read performance, but they come at the expense of making writes slower, and databases larger on disk. So do index, but only as much as necessary. Try and anticipate the queries that will be reading your data most frequently, and the ones on which performance is crucial. Then add indexes to "cover" those queries. Look up what a covering index is, and then build as necessary. Also, have a plan in place to maintain your indexes, to periodically check on and fix fragmentation, and also to remove indexes that are no longer necessary.

Documentation: Take the time to create and maintain a schema description document. Add all the entities your database uses, especially tables, views, stored procedures and functions. Describe what the purpose of the table is, its dependencies and relationships, its columns, foreign keys. Maintain this document, update as necessary. There are tools to automate some of this. If you can also add business process descriptions, this document will be golden.

Triggers: Avoid them. They might seem like a great idea to automatically perform an action when something is done to a set of data, but they cause their own set of headaches. They are difficult to track and troubleshoot, they often get forgotten, and potentially lead to circular operations.

Naming Convention: Settle on one before you start creating database objects, and stick to it. Use this for all your objects: tables, views, stored procedures, functions, indexes, triggers (if you must use them), and the database itself.

Normalization: Normalize normalize normalize. If your app must have a de-normalized view, please put it in a separate view. But as far as your tables go, normalize with the intention of not storing any piece of data more than once. That is how the relational database was meant to be used. Don't get too hung up on which normal form to use and the differences between them. Just stick to the intention of storing any piece of data only once, and you'll be fine.

Data Types: Yes memory is cheap, but performance is still king. Don't use bigint if int will do the job. Don't use varchar(200) if varchar(20) will suffice. When your data volume grows, these little things add up and make a huge difference. Do think of future growth and scaling out/up, but don't overkill and don't try to solve problems that have not come up yet. For instance, if you are developing a new app with a target audience of 100,000 users, don't try to scale it up to 50 million users at this point. When that problem happens, you will be better equipped (in terms of resources, money, personnel) to handle a good problem to have. Right now, it could scuttle your efforts if you try to build and host a database that is much bigger than you'll need in the foreseeable future.

Big Data (I am currently also inclined to include NoSQL with this): This is a fad, a term that is currently "hot". Vendors that have jumped onto the bandwagon will market it to you. Stay away from it. For most business solutions, the relational database still works beautifully.

XML: Storing xml inside a relational database is a questionable idea. It is a compromise. XML and relational data are two very different paradigms designed to address two very different problems, and they don't marry well. I am not against generating xml from relational data for consumption by web services etc, or parsing xml streams out into atomic elements to store in relational format. Just not a big fan of storing xml as is in a table. If you must do this, question why.


Database Development

Row-by-Row Operations: Avoid loops, cursors, and "for" type constructs. If at all possible, use set-based operations. Jeff Moden has written some very good material on the how/why of this, look it up.

Documentation: Repeat after me: code is not its own documentation. Commit this to the foundation of your very being. Your future self will thank you a few months from now, when you have to come back and rework some complex code and you don't have the time to figure out the what/how/why of code. Others who may have to work on your code a few months from now will thank you too. Write comments, introductions, and explanations in your code objects. Over-communicate.

Naming Conventions: Name entities so that the name tells you what they are. If a variable contains an account number, don't call it @x or @an or @acc. Call it @Account_Number. This makes code much easier to read and maintain.

Readability: Code must be neatly organized and easy to read. Prefer readability over "cuteness" (using very smart techniques that are impossible for anybody to read/understand). Be consistent with your code layout. If you write inner joins a certain way, write them that way all the time. Be consistent with upper or lower case, and ask everybody in your organization to follow established standards. This may be painful in the short term, but in the long term it will make code easier for DBAs and developers to read and will help speed development time. Promise.

Source Code Control: Script out your schema. As far as possible, don't actually create any objects through a GUI. Play with the GUI, sure, but when it comes time to execute your changes, script them out and save them. Check the scripts into source control. This way, you will not have to repeat or remember each detail of each object, and deployment on multiple machines will be easier, in addition to giving you a backup. Also script out your configuration table data as csv files and check those into source control too. This will help auto-populate your config tables once you've created them.

Backup and Restore: Definitely have a restore plan, not just a backup plan. If you do not know that you can restore your backups successfully (not guess, but know for sure), you don't have a backup plan at all. The only way to know is to actually restore your backups. Do it. Test it. Plan it. Formally implement it.

Database as a Service: This is something I'd like to put into practice, but for some reason or another have not been able to achieve yet. This is my ideal database consumption method: don't allow SQL code in your application. Put every database operation inside a stored procedure, and give your application permissions to only execute stored procedures and nothing more: no DML (select / insert / update / delete) and definitely no DDL (create / alter / drop). Only give applications execute permission on stored procedures. Why? Among the benefits are:
- Caching the execution plan for performance.
- Using the database server (as opposed to the web server or application server) for data manipulation which is what it was designed to do.
- Not sending massive volumes of data over the network between the database server and other servers.
- Making your application more secure by reducing vulnerability to a SQL injection attack.
- Preventing non-database programmers from writing potentially inefficient and possibly buggy database code.
- Being able to tweak the stored procedure for performance tuning or simple functionality without having to re-release/update the application or its code.


In conclusion, you don't necessarily have to do all of the above, but at least take the time to understand why these are recommended and then make your own informed decision. Your application will be better off for it.

2013/05/06

Could You Live Like a DBA?


Ever wondered how everyday life would pan out if you took the DBA approach to life's events?


Backups: You have a nightly backup routine. This means a differential backup every night, when you tell your spouse everything that has happened during the day in minute detail, and a full backup on weekends when you tell them everything that has happened during the week in your entire life. Don't think either of you would sleep much.

Replication: You are on the phone continuously, telling your twin sibling at a remote location everything that you are experiencing. Everything. What you see, hear, smell, eat, work on etc. Your twin, of course, has no life of her own, and is living vicariously through you, ever prepared for you to drop dead so she can take over. Talk about sibling rivalry.

Restores: Remember that nightly backup you transferred to your spouse earlier? You periodically and randomly ask your spouse to repeat some of those conversations back to you, ensuring no detail is skipped, so you can confirm that the backup is working. Marriage counselor's number handy?

Scans: When you go looking for the soy sauce in your pantry, you read the label on every single item in there: all the spices, all the jars and bottles, every bag of chips, every bottle of juice and soda, everything, because the items are arranged in no particular order. It takes you a few hours to put together one meal, and your kids have gone to bed hungry.

Seeks: After scanning through your pantry multiple times and getting tired of how long it takes each time, you decide to rearrange all the items in order by alphabet. Now you can retrieve the soy sauce in no time. Kids are happy again.

Fragmentation and Re-indexing: Over time, as your spouse adds items to the pantry at random locations (because he is not a DBA) and removes items leaving gaps on the shelves, the ordering gets so out of whack that you revert to going through all items again to find the soy sauce. Eventually, you decide to create an automated re-indexing plan by having your spouse rearrange everything by alphabet once a week. Divorce lawyer's number handy?

More Backups (because, you know, a DBA can never have enough backups): When you prepare a sandwich for your kid's school lunch, you also make a second sandwich. You pack the first one in your child's lunch box, and you courier the second one to arrive at school just in time for lunch. You don't tell anybody that you have a third one in your car, you know, just in case...

Adding Records: When you discuss baby names with your spouse for the upcoming addition to your family, you very briefly consider an integer for the name, and then even more briefly think that a tinyint would be more appropriate, before you think of the divorce lawyer and decide to shut up. You do, however, bring up the Bobby Tables joke because you think it's funny. Your spouse does not.

Searching: You wonder on a daily basis why you can't just enter this into Google's search box: select * from closeby_restaurants where cuisine = 'chinese' and distance_from_home <= 2 miles and deliciousness = 'awesome' and service_time = 'quick'...

Normalization: When your spouse asks you for a copy of your shopping list before he goes shopping, you point him to the location of the original list instead because you do not want to create duplicates. He throws his hands up and leaves without your list. Your items never get bought.

Normalization 2: You do not have a driver's license, or a bank account, or a passport, or a mortgage in your name, because you are incapable of filling out any forms in duplicate, leave alone triplicate.

Set-based Operations: After the kids have gone to bed and you sit down with your spouse to watch some TV, you roll out all the TVs, desktops, laptops, tablets and phones in your house into the living room and fire up a different episode of your favorite show on each one at the same time, because you know that set-based operations are better than episode-by-agonizing-episode. The kids were very entertained when you did this with their cartoons. But the spouse - not so much.


So if your domestic life isn't exactly a picture of happiness and you're getting grief from your spouse, kids, and siblings, now you know why. You're a DBA!

2013/04/24

Surviving Midnight

This post is about an interview experience I had recently. I have interviewed many people for SQL jobs, but this time I was the interviewee. That’s a nerve-wracking experience at any time, but was especially memorable this last time because it involved one of the Midnight DBAs. In case you haven’t heard of the Midnight DBAs, I’ll give you some background.

There are many well-known authors, presenters and bloggers within the Microsoft SQL Server community, including various user groups, conferences, forums and online communities. Two of the stars of this SQL universe are the husband/wife duo known as the Midnight DBAs: Sean and Jen McCown. I first heard about Sean one day at work when my coworkers were discussing how Sean liked to interview people for database jobs and then blog about the (sometimes disastrous) results. If the interview didn't go well, he would rip apart the candidate for his or her lack of basic SQL knowledge (examples here, here, here, and here). You have to bear in mind that Sean holds a Microsoft Certified Master (MCM) certification in SQL Server – among the highest certifications available. It’s not a certification you can acquire by just reading books. You really have to know your subject very well. There are only about a hundred SQL MCMs in the world, a very exclusive group. In summary, this person knows a lot about SQL Server and a lot about interviews. And he’s not lenient when it comes to the amount of knowledge he believes a candidate should possess.

Coming back to my story, I had been approached by a company looking for people with SQL skills. After an initial phone screen their HR person sent me an invitation for an in-person technical interview with their database director and senior DBA. So far, so good. As I read the names of the people who would be there though, my heart stopped. The senior DBA was Sean McCown. There was no way I would get past this guy. I half expected to fail the interview but decided to do it anyway, reasoning that one of two things would happen. Possibility one was that I would pass the interview, which would be an accomplishment knowing who it was with. Possibility two was that I would fail the interview, which would be painful but I would have had a chance to interact with a mind like that and evaluate where I stood. I figured I’d learn so much from the experience, it would still be a win for me.

When I arrived for the interview, the first relevant thing Sean said was, “I have 30 minutes to find out if you are worth a damn." To say my heart was racing wouldn't be much of an exaggeration. Despite the veins in my head pulsing (I don't think I'd ever felt them before), I maintained a cool facade and said “Fair enough. Let's go.” I was able to correctly answer most of the technical questions he fired at me. When he asked something I wasn't familiar with, I simply admitted that I didn't know. And guess what? He was actually very friendly and nice! He took the time to explain the answer to me on these questions. I was pleasantly surprised. I actually enjoyed the conversation, and got offered the job!

My takeaway, and my advice to you if you’re going to be in a technically grueling session: be yourself. Know your stuff. Don’t bluff no matter what. Saying, “I don't know” is okay. Be calm and confident in what you do know. There's almost no point in trying to cram for an interview the day before – if you don't know your subject, you won’t learn it in one day.

As a very young child, I remember reading books about witches and ghosts that scared me and led me to believe that midnight was the witching hour when all kinds of scary things came out of the woodwork. Turns out midnight is not so bad after all.

Good luck.

2013/01/20

Universal Product Codes: a Database Primer


Introduction

As a database primer, the intention of this article is to provide an introduction to Universal Product Codes, and their usage in a relational database.

Universal Product Codes are codes used to uniquely identify retail products in many countries. The system originated in the United States in the 1970s to make grocery checkouts faster. Universal codes are printed on product packaging in the form of barcodes, for scanning at checkout. The code is also generally printed in text just below the barcode. Being universal means that the code on any given product will be the same throughout the country it is sold in.

Universal Product Codes come in more than one format. The most popular format is called the UPC-A, followed by the UPC-E and the EAN, among others. The UPC-A barcode represents 12 numerical characters (no letters or special characters). Each number is represented by a unique barcode. A typical UPC-A looks like this (this is for illustration only and is not a real barcode):


In this illustration, the first digit (1) is called the prefix. The next five digits (22222) represent the manufacturer. All products made by that manufacturer will have the same manufacturer code. The three digits after that (333) represent the product family code. For instance, all cereals manufactured by Kellogg will have the same code here, assuming Kellogg has coded them as one “family”. The next two digits (44) represent the individual product. An obvious limitation here is that each family can only have a hundred products within it, but manufacturers get around that by playing with family codes. The last digit (5) is the check digit, used to verify that the scanner has read the previous 11 characters correctly. A formula is used to compute the check digit from the first 11 characters, and then verified against this 12th character.

The next most popular format is the UPC-E. It was invented to allow encoding of small items, such as bars of candy or chewing gum, that are too small to print a regular UPC-A on. It consists of 8 numeric characters. Unlike the UPC-A, the UPC-E does not have a straightforward representation for manufacturer, product family and product. It is simply a compressed version of the UPC-A. The EAN code (originally European Article Number, now International Article Number but still called EAN, not IAN) is a variation of the UPC-A that adds an extra character to the left of the barcode, representing the country where the product is sold.

A real example of a UPC-A code is 043000014240, which belongs to the cereal Honey Bunches of Oats, made by the company Post. Here, the manufacturer code is 43000, the product family code is 014, and the product code is 24. The barcode representations of the UPC-A and the EAN respectively look like this:


Note that the EAN has 13 digits, with the one extra digit on the left representing the country of retail, in this case the United States. The remaining 12 are the same as the UPC-A.


Product Codes in Databases

If you are creating a database to store retail products, chances are you will want to store and look up UPC-A codes at the very least, and probably also UPC-E codes. These are the two most frequently used formats in the United States; product codes in many other countries are based on the US code system, and are very similar. You might also have the need to translate UPC-A codes to UPC-E and back. For more granular lookup and analysis, you may also want to store code components (manufacturer code, product family code, product code) separately, in addition to the entire code. Since the codes are always numeric, the seemingly obvious choice is to store all these codes and components as numeric types. But think again, because often the leading characters in the entire UPC-A, and in individual components, are one or more zeroes. A manufacturer code of, say ‘053’ becomes '53' when saved as an integer, making it incorrect and useless. I prefer storing the codes and components as char (they are always fixed length). However, many code related operations such as converting between UPC-A and UPC-E, calculating check digit etc require the numerical values of code characters, so you will need to convert to a numeric type in SQL code.

Probably the three most frequently used product code related operations are:

1. Calculate UPC-A check digit, given the first 11 characters
2. Convert a UPC-A into a UPC-E (not all UPC-A codes can be converted, they have to meet certain conditions)
3. Convert a UPC-E into a UPC-A (all valid UPC-E codes have a counterpart UPC-A)

The code to perform these operations, in the form of T-SQL functions, is below. I am the original author of this code, and provide it here for anybody to use for free.

______________________________________________________________________________

1. Calculate UPC-A check digit:

if exists (select 1 from INFORMATION_SCHEMA.ROUTINES where ROUTINE_NAME = 'udf_Calculate_UPCA_CheckDigit')
begin
drop function dbo.udf_Calculate_UPCA_CheckDigit
end
go

create function dbo.udf_Calculate_UPCA_CheckDigit
( @upca_11 varchar(20)
)
returns varchar(1)
as
begin
/**************************************************************************
2013/01/15 Hakim Ali
Function returns the Check Digit when passed in first 11 characters
of UPC-A.
**************************************************************************/

-- Local variables
declare @checkdigit varchar(1)
declare @sum_of_odds int
declare @sum_of_evens int

-- Initial settings
set @checkdigit = ''
set @upca_11 = ltrim(rtrim(isnull(@upca_11,'')))
set @sum_of_odds = 0
set @sum_of_evens = 0

-- Calculate CheckDigit
if (len(@upca_11) = 11 and isnumeric(@upca_11) = 1)
begin
set @sum_of_odds = convert(int,substring(@upca_11,1,1)) +
convert(int,substring(@upca_11,3,1)) +
convert(int,substring(@upca_11,5,1)) +
convert(int,substring(@upca_11,7,1)) +
convert(int,substring(@upca_11,9,1)) +
convert(int,substring(@upca_11,11,1))

set @sum_of_evens = convert(int,substring(@upca_11,2,1)) +
convert(int,substring(@upca_11,4,1)) +
convert(int,substring(@upca_11,6,1)) +
convert(int,substring(@upca_11,8,1)) +
convert(int,substring(@upca_11,10,1))

if ((((@sum_of_odds * 3) + (@sum_of_evens)))%10 = 0)
begin
set @checkdigit = '0'
end
else
begin
set @checkdigit = convert(varchar(1),(10 - (((@sum_of_odds * 3) + (@sum_of_evens)))%10))
end

end

return @checkdigit
end -- end create function
go

______________________________________________________________________________

2. Convert UPC-A to UPC-E:

if exists (select 1 from INFORMATION_SCHEMA.ROUTINES where ROUTINE_NAME = 'udf_UPCA_to_UPCE')
begin
drop function dbo.udf_UPCA_to_UPCE
end
go

create function dbo.udf_UPCA_to_UPCE
( @upcA varchar(50) -- limiting to varchar(12) only reads first 12 characters of any longer strings passed in
)
returns varchar(8)
as
begin
/**************************************************************************
2013/01/17 Hakim Ali
Function to take in UPC-A, calculate and return its UPC-E.
**************************************************************************/

-- Local variables
declare @manuf_code varchar(5)
declare @product_code varchar(5)
declare @upcE varchar(8)

-- Initial settings
set @upcA = ltrim(rtrim(isnull(@upcA,'')))
set @upcE = ''

-- Calculate UPC-E
if (-- Required conditions for conversion: length must be 12, must start with 0 or 1, must have 0000 between positions 5 and 12
len(@upcA) = 12
and left(@upcA,1) in ('0','1')
and substring(@upcA,5,8) like '00%'
and isnumeric(@upcA) = 1
)
begin

set @manuf_code = substring(@upcA,2,5)
set @product_code = substring(@upcA,7,5)

-- ----------------------------------------------------------------------------
-- Note: iterations must be followed in order. If type 1 applies, use it over type 2 and so on.
-- ----------------------------------------------------------------------------

-- Type 1
if (right(@manuf_code,3) in ('000','100','200') and convert(int,@product_code) between 0 and 999)
begin
set @upcE = left(@upcA,1)
+ left(@manuf_code,2)
+ right(@product_code,3)
+ substring(@manuf_code,3,1)
+ right(@upcA,1)
end

-- Type 2
else if (right(@manuf_code,2) = '00' and convert(int,@product_code) between 0 and 99)
begin
set @upcE = left(@upcA,1)
+ left(@manuf_code,3)
+ right(@product_code,2)
+ '3'
+ right(@upcA,1)
end

-- Type 3
else if (right(@manuf_code,1) = '0' and convert(int,@product_code) between 0 and 9)
begin
set @upcE = left(@upcA,1)
+ left(@manuf_code,4)
+ right(@product_code,1)
+ '4'
+ right(@upcA,1)
end

-- Type 4
else if (convert(int,@product_code) between 5 and 9)
begin
set @upcE = left(@upcA,1)
+ left(@manuf_code,5)
+ right(@product_code,1)
+ right(@upcA,1)
end

end -- main if: Required conditions

return @upcE

end -- create function

______________________________________________________________________________

3. Convert UPC-E to UPC-A:

if exists (select 1 from INFORMATION_SCHEMA.ROUTINES where ROUTINE_NAME = 'udf_UPCE_to_UPCA')
begin
drop function dbo.udf_UPCE_to_UPCA
end
go

create function dbo.udf_UPCE_to_UPCA
( @upcE varchar(50) -- limiting to varchar(8) only reads first 8 characters of any longer strings passed in
)
returns varchar(12)
as
begin
/**************************************************************************
2013/01/18 Hakim Ali
Function to take in UPC-E, calculate and return its UPC-A.
**************************************************************************/

-- local variables
declare @main_six varchar(6) -- The 6 important (middle) digits of UPCE
declare @upcA varchar(12) -- Calculated UPCA

-- Iniial settings
set @upcE = ltrim(rtrim(isnull(@upcE,'')))
set @upcA = ''

-- Calculate UPC-A
if (-- Required conditions for conversion: length must be 8, must start with 0 or 1
len(@upcE) = 8
and left(@upcE,1) in ('0','1')
and isnumeric(@upcE) = 1
)
begin
set @main_six = substring(@upcE,2,6)

if (right(@main_six,1) in ('0','1','2'))
begin
set @upcA = left(@upcE,1)
+ left(@main_six,2)
+ right(@main_six,1)
+ '0000'
+ substring(@main_six,3,3)
+ right(@upcE,1)
end

else if (right(@main_six,1) = '3')
begin
set @upcA = left(@upcE,1)
+ left(@main_six,3)
+ '00000'
+ substring(@main_six,4,2)
+ right(@upcE,1)
end

else if (right(@main_six,1) = '4')
begin
set @upcA = left(@upcE,1)
+ left(@main_six,4)
+ '00000'
+ substring(@main_six,5,1)
+ right(@upcE,1)
end

else
begin
set @upcA = left(@upcE,1)
+ left(@main_six,5)
+ '0000'
+ substring(@main_six,6,1)
+ right(@upcE,1)
end

end -- main if: Required conditions

return @upcA

end -- end create function

______________________________________________________________________________