Printable HTML Pages

The secret number is: 660px

And you thought I was going to say it was 42.

660 pixels is the page width for printing html pages without clipping from most browsers (actually, I got 672px before I saw clipping). Of course, your millage may vary. It appears that IE defaults to 0.75 inch borders all the way around. Firefox has 0.5 inch borders, and Opera says it has 0.5 inch borders, but looks like it has 0.25 inch borders. Anyway, when your browser magically converts from the screen context to the printer context, it seems to map that 660 to fit inside of the 0.75 inch borders that IE defaults to. Which works nicely for the other browsers as well, so that is the number I am going to start using.

Now, if you provide one css style sheet for printing and one for screen, you can eliminate the menus, nav bars, ads, and other non-critical elements of the page so that only the main content div is printed. Then, you can make your content div be 660 pixels fixed width.

That means you can still use the artistically pleasing “rule of thirds” for your theme’s layout and still have 1000px pages with one third devoted to a nav bar that doesn’t show up in the printed page. in order to better display on a monitor.

I’m not sure what the effect looks like on a mobile phone. You could use CSS to re-arrange the page for mobile devices so that the 660px was the entire width, and the nav bar / side section was available somewhere else.

No Comments Posted in Uncategorized
Sample User Authentication Database Schema

My User/Password Database Schema for Business Applications

When I first bumped into the built in ASP.Net User Authentication Provider, I lived for a couple of hours in bliss thinking to myself “that’s convenient, I’ll never have to worry about the user authentication portion of my websites and business applications again” — until I tried to use it.

It really is pretty decent, but it is geared toward large “web 2.0″ applications where users sign up and start contributing. Not business applications where I.T. or H.R. adds users and controls groups very closely. Where there are more than just “admin” and “user” account types.

In my experience, businesses have many roles within their organization, and they want to be able to drill into the application and have a single button or text field to be hidden or disabled based on whether or not a user is in a particular group.

Some want them to be tied in to Active Directory, but some don’t even have computer logins for all their employees, and some don’t want applications (especially 3rd party applications) to be tied in to Active Directory (dispute the marketing that claims otherwise).

authActivity

authActivity

Id : Integer <<PK>> <<auto incr>>

DisplayName : String(64) <<not null>>

Detail : String

An Activity is a single thing a user might want to do in the system. For instance “Create users,” “Delete users,” “Re-activate users,” “Assign Employee ID Numbers,” or “View Social Security Numbers.”

As developers build features into the software, they create new records in this table. There is no reason to be skimpy and try to lump a bunch of activities into a single umbrella activity. You can create activities for every feature of the software if you want.

Activities are only created (or maintained) by development. The people who administrate the system do not have any power to create new activities (other than to bug development to create an activity) because it is the responsibility of the development team to make sure the code is aware of what activities could be performed in certain areas of the software and to enable/disable things based on the current user’s permission set. It wouldn’t do any good for a non-developer to create an activity (like say: “Reset Other People’s Passwords”) if the software has no concept of what that means.

The Detail field on this (and the other tables) is an XML field not just a description field. The actual data stored in that field is really up to you, and your application could support any number of configuration fields inside of that detail field. This flexible design is really convenient, but comes at the cost of not having straightforward SQL access (sorting, and filtering) to the contents of the field. You are almost required to pull it into your application and use LINQ to XML to get to the stuff you store in the details field.

authRole

authRole

Id : Integer <<PK>> <<auto incr>>

DisplayName : String(64) <<not null>>

Detail : String

A role is a set of permissions. A company might have an “HR” and an “HR Admin” role. As well as a “BookKeeper” role and a “Salesman” role. Perhaps they will also have a “Tech Support’ role and a “Systems Admin” role. Each role has any number of activities associated with it, overlap is OK (so the Systems Admin role and the Tech Support role can have a fair amount of overlap, that is OK). It is also OK for a single user to be assigned multiple roles. For instance a “Salesman” could also be a “Payroll Clerk.” This design is based entirely on what experience has dictated as I have built business software over the years.

authRoleActivity

authRoleActivity

Id : Integer <<PK>> <<auto incr>>

RoleId : Integer <<FK-authRole>>

ActivityId : Integer <<FK-authActivity>>

Detail : String

This table is simply the many-to-many join table that connects authRoles and authActivities.

authUser

authUser

Id : Integer <<PK>> <<auto incr>>

DisplayName : String(64) <<not null>>

UserName : String(32) <<not null>>

Password : String(32)

Detail : String

This should be self explanatory, except for the fact that there is not a field for keeping track of who is active. (See the next table for why.)

authUserActive

authUserActive

Id : Integer <<PK>> <<auto incr>>

UserId : Integer <<FK-authUser>>

AsOf : DateTime <<not null>>

Until : DateTime

Detail : String

As businesses get to be a little more mature, they are no longer worried about simply whether or not a user is active and therefore allowed to log in right now, but they are interested in WHEN a user was allowed to log in. They want to be able to create a user who will expire in a month. Or they want to be able to have their summer interns only be active during the summer months. And, when they are audited, they need to be able to show that a particular user had access to a particular system at a particular time.

In order to accommodate this, I created this table, which will have entries for each time period where a user is active. So, the system will look to see if there is a record in this table where the login day is between the AsOf date and the Until date. If there is one, then the user is active (or was active on the date you tested for).

You should use a constraint to make sure AsOf < Until. Also, you can either choose to allow Until to be null which means "Until we assign an end date" or you can set it to the max date for your DBMS as a default field value. Using a null is technically more "correct" but I think the logic for querying looks nicer if you can assume that Until contains a constant value guaranteed to be in the future.

authUserRole

authUserRole

Id : Integer <<PK>> <<auto incr>>

UserId : Integer <<FK-authUser>>

RoleId : Integer <<FK-authRole>>

AsOf : DateTime <<not null>>

Until : DateTime

Detail : String

This is a simple many-to-many join table assigning users to groups with one little twist: It also has the AsOf / Until logic. When you are audited, it isn’t good enough to know whether or not a user had access to the system. You need to be able to show that they had access to a particular feature or not.

And, here is a SQL script for generating the Schema in SQL Server 2008 (don’t say I never gave you anything).

CREATE FUNCTION 
dbo.MaxDateTime()
returns datetime AS
begin
    RETURN convert(datetime, '99991231 23:59:59:997')
end
go
 
CREATE FUNCTION 
dbo.MinDateTime()
returns datetime AS 
begin
    RETURN convert(datetime, '17530101')
end
go
 
CREATE TABLE 
authActivity
( Id int PRIMARY KEY identity
, DisplayName varchar(64) UNIQUE NOT NULL
, Detail text
);
 
CREATE TABLE 
authRole
( Id int PRIMARY KEY identity
, DisplayName varchar(64) UNIQUE NOT NULL
, Detail text
);
 
CREATE TABLE 
authRoleActivity
( Id int PRIMARY KEY identity
, RoleId int NOT NULL FOREIGN KEY REFERENCES authRole(Id)
, ActivityId int NOT NULL FOREIGN KEY REFERENCES authActivity(Id)
, Detail text
, constraint nk_authRoleActivity UNIQUE (RoleId, ActivityId)
);
 
CREATE TABLE 
authUser
( Id int PRIMARY KEY identity
, DisplayName varchar(64) UNIQUE NOT NULL
, UserName varchar(32) UNIQUE NOT NULL
, Password char(32)
, Detail text
);
 
CREATE TABLE 
authUserActive
( Id int PRIMARY KEY identity
, UserId int NOT NULL FOREIGN KEY REFERENCES authUser(Id)
, AsOf datetime NOT NULL
, Until datetime NOT NULL DEFAULT dbo.MaxDateTime()
, Detail text
, constraint nk_authUserActive UNIQUE (UserId, AsOf)
, constraint ch_authUserActive_AsOfUntil CHECK (AsOf < Until) 
);
 
CREATE TABLE 
authUserRole
( Id int PRIMARY KEY identity
, UserId int NOT NULL FOREIGN KEY REFERENCES authUser(Id)
, RoleId int NOT NULL FOREIGN KEY REFERENCES authRole(Id)
, AsOf datetime NOT NULL
, Until datetime NOT NULL DEFAULT dbo.MaxDateTime()
, Detail text
, constraint nk_authUserRole UNIQUE (UserId, RoleId, AsOf)
, constraint ch_authUserRole_AsOfUntil CHECK (AsOf < Until) 
);
No Comments Posted in Microsoft SQL Server, My Sample DB
C# With Block

So, one of my big “beefs” with C# is the lack of support for “with” blocks. If you just want the solution to my complaints, scroll down most of the way through the article.

I come from a C/C++ background, but spent several years in the VB6 world, and came to the conclusion that with blocks are pretty handy. They do an EXCELLENT job of self-documenting the fact that you intend to do several things to a single object. They are used most frequently (for me at least) as a way of moving values from one object to another… like this:

With cmdOK
  .Top = cmdCancel.Top
  .Left = cmdCancel.Left + cmdCancel.Width + 90
End With

I wish it had support for aliasing, and using more than one with block variable at a time because I tend to use it to work with 2 objects at once… I envision something like this…

With cmdOK as btn1
  With cmdCancel as btn2
    btn1.Top = btn2.Top
    btn1.Left = btn2.Left + btn2.Width + 90
  End With
End With

You might be asking: what do you gain other than a handful of keystokes? I mean, technically, the With blocks require more keystrokes than the plain old code required in the first place.

Good point. Except the alias thing does something important. It makes it easy to copy and paste the code to do something similar someplace else, all you change is the aliases at the top, and you don’t have to worry about whether or not you found all the places where you used that variable.

It also makes it so that you can de-reference a property at the end of a long dot-chain, and give it a name that goes out of scope when you are done. For instance…

With MyApp.frmMain.cmdOK as btn1
  With MyApp.frmMain.cmdCancel as btn2
    btn1.Top = btn2.Top
    btn1.Left = btn2.Left + btn2.Width + 90
  End With
End With

That saves a temporary reference to the buttons rather than calling down the object tree with each iteration. I have often thougth that it would be convenient if you could have one with block with two aliases, like this…

With MyApp.frmMain.cmdOK as btn1, MyApp.frmMain.cmdCancel as btn2
  btn1.Top = btn2.Top
  btn1.Left = btn2.Left + btn2.Width + 90
End With

But that is (was) VB and now life is about C#. Except C# doesn’t support With Blocks - because I guess With Blocks are for wimpy programming languages.

So, I decided, wouldn’t it be interesting to create a with block construct?
Here is the best I can do:

first, you need an extension class

public static class Extender
{
    public delegate void Proc<T>(T arg);
 
    public static void With<T>(this T blockVariable, Proc<T> block)
    { block.Invoke(blockVariable); }
}

Then, you can use it like this:

class Program
{
    static void Main(string[] args)
    {
          //using System.Xml.Linq
          XElement element1 = new XElement("hello");
          XElement element2 = new XElement("world");
 
          element1.With(e1 => 
          {
              e1.Add(element2); 
          });
     }
}

You could even nest two or more of these with blocks…

        element2.With(e2 =>
        {
            element1.With(e1 =>
            {
                  e1.Add(e2);
            });       
        });

It ain’t great, but it isn’t half bad. And, it does fulfill all of the needs I have for a with bock. It is self documenting in the sense that is says: “I intend to do several things with this object.” It also lets you assign a temporary name to a variable without creating an orphan variable that isn’t needed elsewhere in your function. And, it could be copied and pasted to do something very similar elsewhere with minimal modification (only modify the top, and the parts that need to be different in the new case).

The truth of the matter is, with extension methods and Lambda, C# really is starting to mature and get to a point where it has much of what makes Ruby and Python compelling, and, at the same time, it enjoys compile time type safety, and a powerful development environment. I dig it.

Or, you could do this (not quiet as self documenting, and a little odd, but it would work)

(new XElement("hello")).With(e1 => 
{
    e1.Add("more stuf");
    doSomethingWith(e1); 
});

An object that falls out of scope immediately after the little block that uses it. This sort of move makes more sense back in vb6 when the framework actually deleted objects when they were no longer referenced, rather than let garbage collection do it later on. It would also be a bit more readable in a VB6 style With block. But the fact that it would work exactly as expected means that my C# with block is a genuine with block structure.

What do you think it will take to get the guys down at Microsoft to add With Block support to the core language? - Yeah, me neither.

No Comments Posted in LINQ
Why Doesn’t LINQ Perform?

I’m not talking about speed / performance, LINQ is plenty fast for my needs. I’m wondering why linq can’t perform any ACTIONs, why it can’t DO anything.

Yes, I understand, its called “Language Integrated Query” because it is meant to be used as a way to ASK questions, not PERFORM operations.

That is all fine and dandy, but, despite its name, LINQ is actually a set manipulation language more than anything else. It is a language for working with entire sets of things in a uniform way.

Let’s get serious here, in 90% of cases, immediately following a LINQ query, we see a foreach statement that iterates over the items. Often times, this loop has only a single action inside of it. For instance:

var itemsThatQualify = 
	from i in itemList 
	where i.hasSomeQuality()
	select i;
 
foreach(var item in itemsThatQualify)
{
	Item.doSomething();
}

Its getting to the point where I am getting tempted to perform the Query inside of the foreach statement instead of before it. But I think the result is less readable, what do you think? …

foreach(var item in (
	from i in itemList 
	where i.hasSomeQuality()
	select i))
{
	item.doSomething();
}

The best thing I can say about that is that it DOESN’T look like english at all. In fact, the lambda version of this statement is actually more readable because it is less wordy…

foreach(var item in itemList.Where(i => i.hasSomeQuality()))
{
	item.doSomething();
}

It ain’t great, but its a bit better than the last thing. But the first one is still more self-documenting.

I’m getting to the point where I wish I could do this…

from i in itemList 
where i.hasSomeQuality()
perform i.doSomething();

I actually typed that out at work the other day and said “why did they NOT make that work?” That is elegant, and would save streamline tons of repetitive coding.

Then, the wheels got turning. LINQ is written using Templates and Extension methods. hmm. Both of those are available to me for building my own stuff. Why not write my own LINQ Extension called “Perform”… Here it is, complete with a Main function that shows it off.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
 
namespace LinqPerform
{
    public static class LinqExtension
    {
        public delegate void Proc<T>(T arg);
 
        public static void Perform<TSource>(
            this IEnumerable<TSource> source,
            Proc<TSource> performer)
        { foreach (TSource item in source) performer.Invoke(item); }
    }
 
    class Program
    {
        static void Main(string[] args)
        {
            // something enumarable to work with
            var intList = new List<int> { 0, 1, 2, 3, 4, 5 };
 
            // DoSomething() with each element
            intList.Perform(i => Console.Out.WriteLine(i));
 
            // get a little tricker (evens only)
            Console.Out.WriteLine(); // a visual break
            intList.Where(i => i % 2 == 0)
                .Perform(i => Console.Out.WriteLine(i));
 
            // get a little wordier (with odds only)
            Console.Out.WriteLine(); // a visual break
            (from i in intList
             where i % 2 == 1 //odds only
             select i
            ).Perform(i => Console.Out.WriteLine(i));
 
            // do a little more, VERY little more
            Console.Out.WriteLine(); // a visual break
            intList.Where(i => i % 2 == 0)
                .Perform(i =>
                {
                    Console.Out.Write(i.ToString() + " ");
                    Console.Out.Write((i+1).ToString() + " ");
                    Console.Out.WriteLine();
                });
 
            // wait for a return key
            Console.In.ReadLine();
        }
    }
}

It Worked!!! I cant believe it.
Now if I could only get native queryable support…

from i in intList
where i % 2 == 1
perform Console.Out.WriteLine(i);

would look so much better than…

(from i in intList
 where i % 2 == 1 //odds only
 select i
).Perform(i => Console.Out.WriteLine(i));

But I guess I can make due.

No Comments Posted in LINQ
HOW TO: Print-off a SQL Server 2008 Database Schema

Overview

This little utility can read either a Microsoft SQL Server 2008 database schema, or a Microsoft Access database schema, and generate an HTML file documenting the structure of the tables, along with some statistical information about some of the columns. Here is a sample of what the output looks like. The entire C# project is available as a zip file containing both source code and executable binaries. I have made it available under a BSD style license, please respect my copyright.

General Use

To use the utility, simply open an *.mdf or *.mdb file with the program. You can do that by specifying a path to the database from the command line, or by just running the program and using the open file dialog it presents to select a database file. Then, you will be prompted about where to save the resulting HTML output. Make sure you save it in a folder with the provided dbSchema.css file or it will be pretty bland. The CSS file provided will italicize the primary key(s), bold the required fields, hide the min / max columns, and put page breaks between database tables. I have made an effort to provide plenty of css classes to the HTML so the look and feel of the resulting document should be entirely customizable. Have fun!

Disclaimers

Of course, you use this utility at your own risk, yadda yadda yadda. But let me make one actual warning about this program: it calculates min/max/count statistics on every column of every table, so don’t run it against a LARGE, active database. If you want to crunch on a big production database, extract a backup copy to a development server and run the numbers against that — don’t say I didn’t warn you. It took a couple of minutes of number crunching to spit out info for the database I updated the program for, and it has a hundred or so tables of various sizes a handful of which have a million or so records in them. That should give you a ballpark figure about how much load is put on the server when it milks your data into a nice little report.

something about how to connect it to SQL Server 2008.

Discussion

I recently pulled out an old program that I wrote in VB6 for printing off an access database table structure just to see how much my coding style has changed over the last 6 or 7 years. I decided to rewrite the whole thing using C# on the .Net 3.5 framework using the OleDB data provider, and then abstracted it a bit and added a second class that can pull from a SQL 2008 express database. It uses SQL Server EXPRESS but with a little bit of tweaking of the code (especially the connection string) it could easily pull from a true Microsoft SQL 2008 Server, and probably a 2005 server.

The first thing I noticed while rewriting the program was the shift in focus. In the old version, the real “work” was spent interacting with the printer – getting the fonts right, the bolding to make it easier to read, the page breaks, columns, avoiding table / field orphaning. The newer version completely avoided that whole ball of wax. Instead, I opted to output a simple html file and use CSS to get the page breaks and fonts looking exactly the way I like them. Focusing instead on just tackling the problem using elegant code, it was actually a fun project rather than just tedious work.

I found that I made extensive use of LINQ, which seemed odd when I thought about it because I couldn’t use use LINQ to SQL (what most developers usually use LINQ for) because that would require that I knew the DB schema before I wanted to print it off, and that sort of destroys the purpose of writing a general purpose database schema printing program.

There is no “LINQ to OleDB” or “LINQ to ODBC” and there certainly isn’t a “LINQ to MySQL” or “LINQ to Oracle.” And, even though there is a LINQ to SQL, and I am technically trying to print off the schema of a SQL Server database, LINQ to SQL still isn’t an option because LINQ to SQL is really more of a “LINK” to a specific database schema, NOT a general LINK to any unknown SQL database, and certainly not as a way to discover the database layout at runtime.

At its very core, LINQ isn’t about database queries, it is really a lot more extensive than that. LINQ is a language for interacting with SETS of OBJECTS. I made use of Linq to DataSets, Linq to XML, and plain old LINQ to Objects. This particular project was a lot of fun, because it reminded me just how much I love LINQ, and not just as a substitute for SQL. I especially like the new XElement XML functions that came out with to Linq to XML they are quite a treat to work with (especially if you remember the old school XML document API).

The entire XML document that loops through all the tables, and all the fields, along with columns about each field is generated from a single statement using no loops whatsoever. Its really cool. Instead of using loops, it uses the XElement constructor that can handle a set of child elements as one of its parameters, combined with a LINQ statement to generate that set of elements in a single statement. Here is the statement that spits out the html file:

output.WriteLine(
  new XElement("html",
    new XElement("head",
      new XElement("title", "Database Schema"),
      new XElement("link",
        new XAttribute("rel", "stylesheet"),
        new XAttribute("type", "text/css"),
        new XAttribute("href", "dbSchema.css"),
        new XAttribute("media", "all"))),
    new XElement("body",
      new XElement("div", new XAttribute("id", "container"),
        from TableInfo table in reader.getSchema() 
        select 
        new XElement("div", new XAttribute("class", "dbtable"),
          new XElement("h1", table.TableName, new XAttribute("class", "tablename")),
          String.IsNullOrEmpty(table.Description) ? null :
          new XElement("h2", table.Description, new XAttribute("class", "tabledesc")),
          new XElement("table", new XAttribute("class", "tablecolumns"),
            new XElement("tr",
              new XElement("th", "Column Name", new XAttribute("class", "columnname")),
              new XElement("th", "Type", new XAttribute("class", "datatype")),
              new XElement("th", "Reference", new XAttribute("class", "reference")),
              new XElement("th", "Min", new XAttribute("class", "range")),
              new XElement("th", "Max", new XAttribute("class", "range")),
              new XElement("th", "Count", new XAttribute("class", "count")),
              new XElement("th", "Nulls", new XAttribute("class", "nulls")),
              new XElement("th", "Description", new XAttribute("class", "description"))),
            from col in table.Columns
            let isPk = table.PrimaryKeyFields.Contains(col.ColumnName)
            select
            new XElement("tr", new XAttribute("class", "column" 
                  + (col.AllowNull ? " allownull" : " notnull") 
                  + (isPk ? " primarykey" : "")
                  + (string.IsNullOrEmpty(col.ReferenceTo) ? "" : " foreignkey")),
              new XElement("td", col.ColumnName,
                new XAttribute("class", "columnname" )),
              new XElement("td", new XAttribute("class", "datatype"),
                col.DataType,
                col.MaxLength <= 0 || col.MaxLength > 16*1024 ? "" : "(" + col.MaxLength.ToString() + ")"),
              new XElement("td", nbsp(col.ReferenceTo), new XAttribute("class", "reference")),
              new XElement("td", nbsp(col.Stats.Min), new XAttribute("class", "range")),
              new XElement("td", nbsp(col.Stats.Max), new XAttribute("class", "range")),
              new XElement("td", nbsp(col.Stats.Count), new XAttribute("class", "count")),
              new XElement("td", nbsp(col.Stats.NullCount), new XAttribute("class", "nulls")),
              new XElement("td", nbsp(col.Description), new XAttribute("class", "description"))
            )
          )
        )
      )
    )
  ).ToString()
);

Yes, it does a lot in a single statement, but it isn’t hard to read. It “flows” in the same basic layout as an html document, and it reads more or less like a bunch of single lines – one for each element. But in truth, it is a single statement that creates an entire, complex html file in a single call to the constructor of the root element. The entire XML tree is created in its entirety and written out to the file without ever being assigned into a variable — without any element of the HTML assigned to a variable before being added to the tree.

Summary

Sometimes I need a utility that lets me just print off a database schema in a readable format to set it on the desk next to me while I develop against the database and use it as the authoritative reference. This is the utility I developed for that purpose. Let me know if you’d like to see it updated to allow printing the table structures from another database engine.

No Comments Posted in LINQ, Microsoft SQL Server, XML
My Email Schema in SQL Server 2008

Here is the SQL Server script for creating my simple Email Database for the samples I will be blogging about on this site. The schema is documented here. And, sooner or later, I’ll post SQL scripts for other languages as well.

Dont forget to check out the Pro’s and Con’s for using SQL Server to host my sample email database down at the bottom of this blog entry

-- This script requires SQL Server 2008
CREATE DATABASE Email;
USE Email;
 
-- We store messages in folders
CREATE TABLE Folder
  ( Id        INT PRIMARY KEY IDENTITY
  , ParentId      INT FOREIGN KEY REFERENCES Folder(Id)
  , DisplayName    VARCHAR(64) NOT NULL
  , Description    TEXT
  , CONSTRAINT nk_Folder UNIQUE (ParentId, DisplayName)
  );
 
-- an address is a unique emaill address
CREATE TABLE Address
  ( Id        INT PRIMARY KEY IDENTITY
  , FriendlyName    VARCHAR(128)
  , UserName      VARCHAR(64) NOT NULL  -- always store in lower case
  , DomainName    VARCHAR(128) NOT NULL -- always store in lower case
  , DisplayName    AS
        CASE WHEN FriendlyName IS NULL THEN 
              UserName + '@' + DomainName
        ELSE 
              FriendlyName + ' <' + UserName + '@' + DomainName + '>'
        END
  , CONSTRAINT nk_Address UNIQUE (UserName, DomainName)
  );
 
-- Each message represents a single email
CREATE TABLE Message  
  ( Id        INT PRIMARY KEY IDENTITY
  , FolderId      INT FOREIGN KEY REFERENCES Folder(Id)  NOT NULL
  , SenderId       INT FOREIGN KEY REFERENCES Address(Id) NOT NULL
  , SentOn      DATETIME  NOT NULL
  , Subject      VARCHAR(1024)  NOT NULL
  , EmailHeader    TEXT
  , EmailBody      TEXT
  , IsUnread      BIT    NOT NULL  DEFAULT 1
  , IsJunk      BIT    NOT NULL  DEFAULT 0
  , IsStar      BIT    NOT NULL  DEFAULT 0
  , Priority      TINYINT NOT NULL DEFAULT 3 
        CHECK (Priority IN (1,2,3,4,5))
  );
 
-- One entry for each header (to/cc/bcc) on a message
CREATE TABLE MessageAddress
  ( MessageId      INT FOREIGN KEY REFERENCES Message(Id)  NOT NULL
  , HeaderType    CHAR(4)  NOT NULL
        CHECK (HeaderType IN ('to', 'from', 'cc', 'bcc', 'r-to'))
  , AddressId      INT FOREIGN KEY REFERENCES Address(Id)  NOT NULL
  , CONSTRAINT pk_MessageAddress 
        PRIMARY KEY (MessageId, HeaderType, AddressId)
  );
 
-- An attachment is a single binary file    
CREATE TABLE Attachment
  ( Id        INT PRIMARY KEY IDENTITY
  , FileName      VARCHAR(512)  NOT NULL
  , Extension      VARCHAR(64)  NOT NULL
  , Content      VARBINARY(MAX) NOT NULL
  , FileSize      BIGINT  NOT NULL
  , Hash        CHAR(32)  NOT NULL
  , CONSTRAINT nk_Attachment UNIQUE (Hash, FileSize)
  );
 
-- Join Table
CREATE TABLE MessageAttachment
  ( MessageId      INT  NOT NULL
        FOREIGN KEY REFERENCES Message(Id)
  , AttachmentId    INT  NOT NULL
        FOREIGN KEY REFERENCES Attachment(Id)
  , CONSTRAINT pk_MessageAttachment 
        PRIMARY KEY (MessageId, AttachmentId)
  );
 
 
 
DECLARE @RootId INT;
SET @RootId = 1;
SET IDENTITY_INSERT Folder ON;
INSERT INTO Folder (Id, DisplayName)
VALUES (@RootId, '');
SET IDENTITY_INSERT Folder OFF;
 
-- some root level folders  
 
INSERT INTO 
Folder (ParentId, DisplayName)
VALUES (@RootId, 'Inbox')
  , (@RootId, 'Outbox')
  , (@RootId, 'Sent')
  , (@RootId, 'Trash')
  , (@RootId, 'Drafts')
  ;
 
DECLARE @InboxId INT;
SET @InboxId = (
  SELECT Id 
  FROM Folder 
  WHERE ParentId = @RootId 
  AND DisplayName = 'Inbox'
  );
 
-- a few sub folders
INSERT INTO 
Folder (DisplayName, ParentId)
VALUES ('Home', @InboxId)
  , ('School', @InboxId)
  , ('Work', @InboxId)
  ;
 
-- I guess we should a couple of addresses
INSERT INTO 
Address (FriendlyName, UserName, DomainName)
VALUES ('Brent Larsen', 'brentoboy', 'gmail.com')
  , (NULL, 'jimbob', 'someplace.com')
  ;


Pro’s and Con’s

Of using SQL Server for this database

The only reason I can think of to use SQL Server to host an email database is if you wanted to have your entire office share a single database for their email. If you have a Windows Server running a decent copy of SQL Server, chances are you already have Exchange installed. So, it would be pointless to do such a thing.

The only reason I can think of to write an email application is if you don’t like the available options (Outlook, Thunderbird) and want your own. I cant think of a compelling reason to do that, but if I were going to build my own email application I would probably be building it for a single person. So a smaller database engine might be better suited, such as SQLite or Access.

But, as this theoretical database is merely a sample created for database discussion, lets assume that there is a need for an email application built on top of such a database. What reasons might we have to build a SQL Server backend?

Firstly, if we are building the application that will use the database in C#, SQL Server would be an excellent choice as it enjoys its current place as the only database supported by LINQ (which is more than just hype, its good stuff). In most cases, from what I can see, SQL Server is the database of choice simply because .Net is the framework of choice, and it is an integrated part of the whole Microsoft Development experience. And, I’m not saying that maliciously either. Microsoft Visual Studio has been the best programming IDE since Visual Basic 5. Compare it to Eclipse or any other IDE and you will see the competition lacking in bringing you a unified, well designed productive development environment. So there you have it. The most compelling reason I can think of to select SQL Server for our email program is because Visual Studio is so dang productive.

No Comments Posted in Microsoft SQL Server, My Sample DB
Example Email Database Schema

My Simple Example Database Schema

I decided to draw up a simple database schema to start using for examples on the site. In today’s blog entry I will describe the different tables and their relationships, and, in the future, I will post up sql scripts for creating the tables in different sql dialects.

I selected an email database as my example schema because I wanted a schema that most people are familiar with so I can explain database concepts instead of spending too much time explaining the schema itself.

My table names are singular; I like it that way, if you want to comment on that, do it someplace where others can read about the root issue (this blog is about this schema, not about the singular plural debate) like this post.

I’ll post Sql Scripts for generating this database in various DMBS’s here:

The Schema

Folder

Folder


Id : Integer <<PK>> <<auto incr>>

ParentId : Integer <<FK-Folder>>

DisplayName : String(64) <<not null>>

Description : String

Folders are organized into a hierarchical tree based on the ParentId as the self referencing field. I wish I could make the ParentId field not null constrained, but the root folder has no parent. I might someday blog about ways to approach this problem.

Each email message will be stored in exactly one Folder. That relationship is defined in the Message table.

Address

Address


Id : Integer <<PK>> <<auto incr>>

FriendlyName : String(128)

UserName : String(64) <<not null>>

DomainName : String(128) <<not null>>

DisplayName : String <<read only>>

A row in the Address table represents a unique email address.

The FriendlyName is the actual person’s name or nickname usually found before the email address.

The UserName is the portion of the email address before the @ symbol, should always be stored as lower case.

The DomainName is the portion of the email address found after the @ symbol, should always be stored as lower case.

The DisplayName is a read only field calculated (or concatenated) from the other fields. I figure it will be so common to concatenate the above fields together into a readable email address, I created this field to do it for you.

The table should be constrained to not allow multiple records with exactly the same lowercase(username@domainname).

Message

Message


Id : Integer <<PK>> <<auto incr>>

FolderId : Integer <<FK-Folder>> <<not null>>

SenderId : Integer <<FK-Address>> <<not null>>

SentOn : DateTime <<read only>> <<not null>>

Subject : String(1024) <<not null>>

EmailBody : String <<not null>>

IsUnread : Boolean <<not null>> <<Default: True>>

IsJunk : Boolean <<not null>> <<Default: False>>

IsStar : Boolean <<not null>> <<Default: False>>

Priority : Integer <<Between 1 and 5>> <<Default: 3>>

A row in the Message table represents a single email message.

I think the model here is fairly self explanatory, I will however point out that by defining the FolderId and SenderId here, we limit ourselves so that a Message can only be stored in a single folder at a time, and can only be sent from a single email address. I don’t see either of these as a problem.

If, in the future, we decide we want a message to be stored in multiple folders, we could consider making virtual folders that have links to messages rather than attempt to keep track of a single message actually being in two folders.

Of course, an email header contains the sender along with the time it was sent, as well as other data that could be significant. So, why do we have fields for SenderId, SentOn and Subject? Mostly so we can sort and index based on those fields.

MessageAddress

MessageAddress


MessageId : Integer <<FK-Message>> <<not null>>

HeaderType : char(4) <<not null>> <<{to, from, cc, bcc, r-to}>>

AddressId : Integer <<FK-Address>> <<not null>>

This table is used to define a many to many relationship between Messages and Address. This sort of table is necessary because a single message could be sent to multiple “to” addresses as well as multiple “cc” addresses. So, each link has a header type so we can know the nature of each relationship between an email and its related addresses.

Even though an email header can contain tons of somewhat useless information, there are a few pieces of the header that are of interest from a data seaching standpoint. I had considered making an associative array style table that could store all the header fields from every message, but decided against it. But, the From, To, CC, BCC fields are of interest and might want to be queried against some day, so I made this table to allow many email addresses to be linked with a single email based on their relationship as defined in the header.

Attachment

Attachment


Id : Integer <<PK>> <<auto incr>>

FileName : String(512) <<not null>>

Extension : String(64) <<not null>>

Content : LargeBinary <<not null>>

FileSize : LongInteger <<not null>>

Hash : String(32) <<not null>>

A row in the Attachment table represents a single file attached to an email.

I would have included a foreign key field to link an attachment to the email to which it is attached, but attachments can be rather large, and it is somewhat common for the same attachment to be attached to multiple emails. So I decided to create the Hash and Size fields as a way of detecting whether or not a file is an exact duplicate of an existing file.

The take home message here is that the relationship between Messages and Attachments is many-to-many meaning one message can have multiple attachments, and one attachment can belong to multiple message. In order to keep track of a many to many relationship, we need a separate table where one record represents a single link between the two tables. (See the MessageAttachment table defined below).

The Hash field will be an MD5 checksum generated from the Content of the file.

MessageAttachment

MessageAttachment


MessageId : Integer <<FK-Message>> <<not null>>

AttachmentId : Integer <<FK-Attachment>> <<not null>>

This table is a simple many to many relationship table.

No Comments Posted in My Sample DB
Initial thoughts on ASP.NET MVC

This blog entry is entirely abstract, no code whatsoever.  And it contains opinion.  If opinion offends you, or if you only want functional code to play with, this entry will probably just frustrate you. That said, my topic of discussion is the new MVC option for ASP.NET.

The Model View Controller Framework: Brought to You by Microsoft

Well, Microsoft has an official Model / View / Controller add-on available for ASP.NET. This might not be news to you. And, I venture to say that most people who use it will be coming from a previous iteration of ASP.NET and not really know much of Ruby on Rails, or Django (the open source MVC architectures). I went ahead and took the plunge and did the test drive of ASP.NET MVC just to see what I thought, and here are my initial feelings.

My first thought is that the Microsoft MVC is much more like Rails than it is like Django. The URL is parsed for you into /Controller/Action/id, of course, you could override it (in true rails fashion) the default is built for you (unlike Django). It also has controller classes (like Rails) which have functions for each Action.

What ASP.NET MVC lacks is the Rails “Magic”. Honestly I am not a fan of Rails Magic, I prefer Django’s philosophy that magic should not be what makes everything work. The magic I am talking about actually runs deeper than you might think. Rails and Django are written in so called “scripting languages”. In scripting languages, objects are very fluid, objects can define themselves at run time rather than having strict classes built at compile time. The effect that this has is two-fold. (1) a lot of rails magic isn’t even possible, such as defining database relationships in the model using odd functions in the class definition such as HasAndBelongsToMany, or creating functions that are called before the actions in a class (actions seem to be called directly by the framework). Also, you must explicitly return a View object in an action, even if you just want the view by the same name as the action to be returned. (2) You get early binding instead of late binding. This means you get compile time warnings about calling functions or accessing data members that don’t exist. It also means you get compile time decisions about which overloaded functions to call as opposed to run time object evaluation. This means it is capable of being much faster.

In Rails and Django, it’s the base classes provided by the model that make application development so powerful. The ActiveRecord and Django.Model classes are the beginning of development for an MVC app in either of those two languages. In Django, I usually start building an app by pulling up models.py and defining my data. It gives me a single file that lets me define all the fields and relationships in my data model. The framework uses that to produce the actual tables and SQL necessary to finish the task. When you are done, the “model” layer includes all the data related stuff, and will also contain the validation information that you want controlled at the database level. In Django, the model is the starting point. When you want to add features, you start with the model and work out from there.

Rails is another animal altogether. You start with an update file, you create or modify each table by creating an update function (and corresponding downgrade function). These functions use ActiveRecord to keep things database neutral, but the end result is that the database is defined by the result of the update functions, and not really defined by classes with properties representing the database fields. The actual classes that represent records in the database are generated magically at runtime (hurray for scripting languages) and the files that define the classes only really contain information about relationships between different tables and other code that cant be generated on the fly by accepting the default implementation for everything. To be totally honest, I hate this method of doing things. It makes it terribly difficult to just print off something that lets you see your database structure and pick it all apart. The things that define your model are divided up into hundreds of little files, none of which tell enough of the story to be truly helpful, but all of which contain enough that they can’t be ignored. The only thing easy about the way Rails defines its model is the upgrade (and downgrade) path for your schema is significantly easier to walk than the upgrade path in Django (or ASP.NET MVC).

Microsoft’s way is a bit more like Django, with a Microsoft centered twist. Instead of defining the data classes in code and generating the DB creation scripts, you define the database in whatever database engine you intend to use, and then generate the classes to match using LINQ. This approach is the way .NET has always treated databases, and has the effect of making it hard(er) to switch to a new DB engine. You can see the repercussions of this: SQL Server Express ships free with Visual Studio Express (also free), and by default, people will use SQL Server. Once you generate your model classes and start building on top of them … your hooked. To be totally honest though, who cares.

Everyone seems to agree that model/view/controller is a decent design pattern and makes for easier web application development. But, no one seems to agree on what belongs in each section of the app. For instance, in Rails, the “Model” is largely generated by the database, and has nothing but database interaction code in it. The View files are all used to generate a page viewable in the browser. All the extra fluff (like business logic) is in the Controller layer. In Django, the Model defines the data, and contains very little more than table and field names. It could be used to enforce business logic, or not – you decide, its your app. The controller is really the Django framework itself, along with a few things you define, which are usually settings, not code. So, the View contains all the page handling logic, as well as forms validation etc.

So how does Microsoft split things up? The controller layer is for web page routing and forms validation (similar to Rails). The View layer contains aspx pages (again similar to rails), and the Model contains EVERYTHING else (business logic, LINQ classes, or some other database interaction code of your choice). The tutorial I went through even said that if the controller classes have action functions that start getting to be more than a few lines long you should consider moving that logic into a function that is stored in the model layer somewhere.

The model is the fundamental difference between ASP.NET MVC and previously existing MVC web frameworks. For Rails and Django, the model is what makes each one distinct and powerful. The model gets to focus of the development attention. The elegant design of the model is what is so darned attractive. In ASP.NET the “model” is left for you to define. Now, just because it is left very open ended, that doesn’t mean it is really a cop out. In fact, it seems that the addition of LINQ to the .NET framework is intended to be the model language for ASP.NET MVC. So, it isn’t like they gave you nothing to work with, it is just a much fuzzier line than what I expected coming from a Rails/Django background.

The other fundamental difference I see is the complete lack of scaffolding. There are some scaffold generators out there, and there seems to be a plug in or something in the works, but honestly, day one in Rails you learn to use scaffolding – it’s the first big “wow, that rocks” experience you get when developing in Rails. With Django, the built in admin pages have the same effect (if not significantly more so). Lack of scaffolding left my tour of the Microsoft MVC feeling a bit under-whelmed.

The other thing worth noting is that you loose a lot of what makes ASP.NET so productive.  In ASP.NET, the page is central.  All the logic for a page is contained on that page.  You have instant Ajax without even knowing it by using an <asp:textbox> tag with scripts that run on the server.  All the magic is generated for you to pass things back and forth between the logic defined in the page and actions on the client end.   This can’t be magically done for you when the processing center of the page is now a controller class that uses pages like functions instead of fully fledged independant applications.  You have to handle your post-backs manually.  Not that difficult, but it is a step backward (at least in that particular direction) from doing things the normal ASP.NET way.  My hope is that ASP.NET MVC will get to be so popular, and enough people will harp on this shortcomming that it will be fixed — or a suitable alternative for forms processing will be provided.  Currently I dont see a conveneint “form” base class that lets me generate and validate forms the way Django does, and I dont see the built in Ajax functions that makes Rails easy to use dispite the fact that you have to handle it all in the controller.   Forms processing is sort of underbaked.

This is a preliminary judgment of the new framework, and overall, I am excited to start using it. It seems good. As a semi-compiled language with early-binding, it has potential to be much faster than rails (time will tell). It does have the effect of organizing the code so that designers can work on the view, application programmers can work in the controller and systems architects can focus on the model without getting in each other’s way. I’m reasonably impressed. I think it’s a step forward. I think LINQ is better than ActiveRecord, but I doubt it will be as nice as Django Models. I guess only time will tell.

The other thing I want to see is whether or not the fanboys who author the books published by Microsoft Press will completely ignore the fact that MVC was first popularized by Ruby on Rails, and the ASP.NET MVC framework is obviously designed using Rails as the primary reference for how to do it right. Will Microsoft’s propaganda act as though they invented the idea, like everything else that comes out of Microsoft?

More importantly, I wonder if, like other things, Microsoft will learn from the other guys, and then go on to make something significantly more usable. Maybe in a couple of more major releases of the .NET framework, will ASP.NET MVC be truly superior? I bet it will.

No Comments Posted in ADO.NET, Microsoft SQL Server, Python Programming Language, Sqlite
Tagged , , , , ,
Using XSLT to transform XML into HTML

It is getting to be more and more common for web services to product XML data as opposed to serving up fully fledged HTML. Client side applications are then responsible to take the XML data they requested and rewrap it into whatever formatting they want. XSLT is a sort of style sheet for taking a segment of XML data and converting it to another format. It is similar to CSS in several ways, but much more complex (and therefore much more powerful – and sometimes more annoying).

When I first bumped into XSLT I was immediately interested in its capacity to allow someone to enhance and simplify HTML. In much the same way that HTML got more readable when web developers started using CSS, it can be simplified even further by using XSLT. For instance, many CSS based design patterns require the creation of multiple unnecessary DIVs in order to generate a flexible button background. It is not uncommon to see the following instead of the old school way of flexible buttons via tables and corner images.

<div class=”button”><div><div><div><div><div><div>
	Click me! 
</div></div></div></div></div></div></div>

Then, the CSS for the button class has background images assigned based on descendant DIVs and you create a button without including a bunch of images in the HTML. This is an improvement over traditional HTML/table buttons, but it has its drawbacks. Consider the fact that, in order for this button to be rendered correctly, you would need to remember to put exactly 6 child DIVs in your HTML.

If your designer wanted to spruce things up a bit, and add an extra image for a cool corner enhancer thing, you’d have to go find all the DIVs where class=”button” and then wrap the inner text in a SPAN or some equally mundane chore. Wouldn’t it be easier to define a new element altogether that just works (let’s call it COOL-BUTTON), and any changes to the way the cool button needs to be rendered in “real” html would just take care of themselves? What you need is a button “template” that gets applied to all your web pages when you use a COOL-BUTTON tag.

The COOL-BUTTON could be used like this:

<cool-button>Click me!</cool-button>

And somewhere you would have a template file somewhere that says:

<cool-button> means
<div class=”button”><div><div><div><div><div><div>
	(contents here)
 </div></div></div></div></div></div></div>

That is more or less what XSLT was made to do. Now, that I have talked it up, let me say that XML Style Sheets are not an oasis for web developers. They are actually kind of ugly, but powerful enough that it is worth knowing what they are and what they do so you can choose to use them when the time is right.

I decided to invent my own outline XML language that allows me to put together a simple outline, and then write an XSL file to convert the outline XML into viewable HTML using ordered lists. The outline I built looks like this.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="outline.xsl"?>
<outline>
  <li>First Heading</li>
  <sub>
      <li>sub heading</li>
	  <sub>
	    <li>point 1</li>
	    <li>point 2</li>
	  </sub>
 
  	  <li>another sub heading</li>
	  <sub>
	    <li>point 1</li>
	    <li>point 2</li>		
	  </sub>
 
	  <li>sub heading</li>
	  <sub>
	    <li>First Example</li>
	    <li>Second Example</li>
	    <li>Third Example</li>
	  </sub>
  </sub>
 
  <li>Second Heading</li>
  <sub>
    <li>more</li>
	<li>even more</li>
  </sub>
 
  <li>Third Heading</li>
</outline>
</xml>

Remember, this is XML not XHTML, so things might look a bit different. The first line is a DTD (document type declaration). Just saying “this is XML.” The second line says “format this XML using the outline.xsl style sheet” (outline.xsl is listed below).

The outline xml I have defined here uses tags to wrap the whole page similar to the way tags wrap an entire html document. Inside the outline, you either have items (<li> elements) or sub sections (<sub> elements). Sub sections contain more items and subsections. You (the xml author) don’t have to worry about what kind of number system to use at each level of the outline, that is provided for you by the eXtensible style sheet. Nor do you have to worry about whether or not your designer uses tables or lists to accomplish displaying the elements (or something different altogether).

In order for a browser to display this outline, it has to be converted to valid html. This is done by the browser – assuming it can find the style sheet. Here is the stylesheet I wrote to handle this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    version="1.0" />
<xsl:output method="html" media-type="text/html" 
    indent="yes" />
 
<xsl:template match="/">
<html>
<head>
	<title>outline</title>
	<style type="text/css">
		ol { list-style-type: upper-roman; }
		ol ol { list-style-type: upper-alpha; }
		ol ol ol { list-style-type: decimal; }
		ol ol ol ol { list-style-type: lower-alpha; }
		ol ol ol ol ol { list-style-type: lower-roman; }
	</style>
</head>
<body>
<ol>
	<xsl:apply-templates />
</ol>
</body>
</html>
</xsl:template>
 
<xsl:template match="/outline/li">
	<li><h1><xsl:apply-templates /></h1></li>
</xsl:template>
 
<xsl:template match="sub">
	<ol>
		<xsl:apply-templates />
	</ol>
</xsl:template>
 
<xsl:template match="/outline/sub/li">
	<li><h2><xsl:apply-templates /></h2></li>
</xsl:template>
 
<xsl:template match="/outline/sub/sub/li">
	<li><h3><xsl:apply-templates /></h3></li>
</xsl:template>
 
<xsl:template match="/outline/sub/sub/sub/li">
	<li><h4><xsl:apply-templates /></h4></li>
</xsl:template>
 
</xsl:stylesheet>

The first thing you will probably notice is that this is not at all like CSS. This stylesheet language is valid XML. Personally, I think this is really gross. XML is a wonderful document markup language, and a beautiful language for defining hierarchical data. It is downright ugly for program control and looping, but someone thought XML would be a good language to write style sheets in, and here we are.

The first two lines define the document type (basic DTD type stuff). The third line declares what sort of format our output is intended to be interpreted as (You have to say “this is HTML” or the browser can assume it isn’t). The “indent = yes” is a convenient way to tell the output stream to enforce proper indentation of sub elements — pretty cool.

Lines 5 to 23 define one xsl:template element. The “match=’/’ “attribute says that the root node of the document (in this case ) should be reformatted to look like an html page with a few styles for our ordered lists. Line 19 tells it to take the stuff inside the root node (the body of the xml document) and apply templates to it, and insert it here.

Lines 25 to 27 define another xsl:template element saying all first level li elements should be wrapped in H1 tags.

Lines 29 to 33 say that any sub (no matter what level it is at) should just wrap things in an ordered list element.

Lines 35 to 46 say that li elements that go deeper into the tree should be wrapped in smaller and smaller headers.

You can view the results here
Now, view the source for the resulting page. Notice it isn’t the HTML, it is the outline XML. Which means that the way I made it work isn’t even visible to the end user (unless they can read XSL files).

So, what is this doing in a blog about databases? I’m glad you asked. Some people believe that XML is the database language of the future. For instance, the QuickBooks SDK passes data back and forth via XML. You could take a customer element directly from a QB request and use XSLT to make it viewable in a browser in whatever HTML format you want it to have. If nothing else, XSLT is an interesting language.

XSLT could ultimately be used by someone to make a unified user interface markup language. You could define a markup language very similar to HTML that is more geared toward user interface programming rather that document formatting. Then, user interface designers would use the new language to design user interfaces, which could then be made viewable in a browser via one XSL style sheet, and viewable in an application framework designed to build forms based on the XML layout. It really is an interesting language, it opens the doors to all kinds of possibilities.

No Comments Posted in XML
Tagged , , , , , , ,
Oracle PL/SQL Associative Arrays Example

I used this example in a presentation the other day, so I figured I’d post it up for anyone who is interested. This uses features that are new to Oracle 11g (such as Indexing by a VARCHAR2 key).

Sample Code showing use of Associative Arrays

DECLARE
-- create an associative array data type
    TYPE account_map 
      IS TABLE OF VARCHAR2(40 CHAR)
      INDEX BY VARCHAR2(5 CHAR);
 
-- the associative array we will be working with
    asset_accounts account_map; --no need to initialize
 
-- create a basic VARRAY with the keys we will be using
    TYPE account_keys IS varray(4) OF VARCHAR2(5 CHAR);
    asset_account_keys account_keys := account_keys('CASH','AR','INV','PPE');
 
    current_key VARCHAR2(5 CHAR);
BEGIN
--  add some values, notice we don't need to extend the associative array
    asset_accounts('CASH') := 'Cash or cash equivalents';
    asset_accounts('AR') := 'Accounts receivable';
    asset_accounts('INV') := 'Inventory';
    asset_accounts('PPE') := 'Property, plant and equipment';
 
-- If we know the keys, we can display the values:
    DBMS_OUTPUT.put_line('--- by direct value ---');
    DBMS_OUTPUT.put_line('CASH: ' || asset_accounts('CASH'));
    DBMS_OUTPUT.put_line('AR: '  || asset_accounts('AR'));
    DBMS_OUTPUT.put_line('INV: '  || asset_accounts('INV'));
    DBMS_OUTPUT.put_line('PPE: '  || asset_accounts('PPE'));
 
-- if we have a nice varray or nested table that knows the keys, we can use it
    DBMS_OUTPUT.put_line('--- by table of lookup keys ---');
    FOR i IN 1..asset_account_keys.COUNT LOOP
        DBMS_OUTPUT.put(asset_account_keys(i) || ': ');
        DBMS_OUTPUT.put_line(asset_accounts(asset_account_keys(i)));
    END LOOP;
 
-- if we don't know what is in there, we can discover the keys.
    DBMS_OUTPUT.put_line('--- by first / next discovery ---');
    current_key := asset_accounts.FIRST;  -- 'AR' is alphabetically first 
 
    -- we loop until we run out of keys
    WHILE NOT current_key IS NULL LOOP
        -- display the element for the current key
        DBMS_OUTPUT.put(current_key || ': ');
        DBMS_OUTPUT.put_line(asset_accounts(current_key));
 
        -- get the next key
        current_key := asset_accounts.next(current_key);
    END LOOP;
 
-- we can read them backwards too    
    DBMS_OUTPUT.put_line('--- by last / prior discovery ---');
    current_key := asset_accounts.LAST;  -- 'PPE' is alphabetically last
 
    -- we loop until we run out of keys
    WHILE NOT current_key IS NULL LOOP
        -- display the element for the current key
        DBMS_OUTPUT.put(current_key || ': ');
        DBMS_OUTPUT.put_line(asset_accounts(current_key));
 
        -- get the previous key
        current_key := asset_accounts.PRIOR(current_key);
    END LOOP;
 
END;
/
No Comments Posted in Oracle Database
Tagged , , , , , , , , , ,
How to select something “from dual” in a non-Oracle database

What is the dual equivelant in Postgres, MySQL, Sqlite, or Ms SQL Server?

I had someone ask me yesterday what Sqlite calls the dual pseudotable (for those who don’t know, dual is a heavily used, Oracle specific concept).

I started at him blankly for a minute and thought how best to answer without laughing. But honestly its a good question.

The answer is simple enough too:

You drop the FROM clause.

For Example

If you want to

SELECT 'hello world'
FROM dual;

in any SQL dialect other than Oracle, you simply say this:

SELECT 'hello world';

So, why does oracle need a from dual in the first place?

I don’t know, but I think it has something to do with a couple of water coolers (this is not a true story as far as I know).

You see, at oracle, when they were putting together early drafts of the SQL standard, they had their little select clauses, and in the original specification, some things were optional (like where clauses and order by clauses) but the from clause was not optional. I mean, why would you select something from nowhere.

That is where the water cooler comes in. A couple of developer guys down at the water cooler are taking a break because they are each struggling with something or other in the code, and the first guy vents his frustration I have to select something but it isn’t really coming from a specific table, what do I do for a from clause? — the other guy has his own equally perplexing problem, and he vents it out.

Then, they come up with a crazy idea to kill both birds with one stone. A dual solution. They will create a pseudo table named dual and it will fix both of their problems. And off they went.

The other half of the story happens at another water cooler at the development offices of another SQL database engine (not sure who). Two very similar programmers are at the water cooler, and the same question comes up. The other guy just says “Duh, let’s just make the from clause optional!” which they do.

So, the real question is not why don’t other databases have a dual pseudo table, the real question is this: why does Oracle still require the use of from dual when there isn’t a true target table? My guess is that, whoever came up with dual was so impressed with themselves, that they just can’t let it go – but that’s just a hunch.

No Comments Posted in Microsoft SQL Server, MySQL, PostgreSQL, Sqlite, Vanilla SQL
Tagged , , , , ,
Oracle: Insert All / Sequence Headaches (Error: ORA-02287)

I am really quite fond of the Oracle’s INSERT ALL statement, despite the occasionally critical blog entry.

The flaw I wish to harp on today is the error that pops up when you try to use a sequence in the INSERT ALL statement. I, along with countless other DB developers, use sequences just about every time I add a record to a database.  It seems very short sighted to restrict the INSERT ALL statement so that they can’t handle sequences.

This blog entry will show how I was able to bypass this limitation and make productive use of INSERT ALL.

I intend to show how to overcome this particular error:
ORA-02287: sequence number not allowed here

First, an example of a few statements that will fail, and some comments on why.

Suppose you have one table that has a one to many relationship with another table. Oracle allows you to use a single statement to add a row to the parent and also add one or more rows to the child table (or a whole bunch of child tables really) all in one convenient INSERT ALL statement.

Unsuccessful Attempts

Here is what I tried that I thought should have worked.

-- this doesn't work, but should
-- ORA-02287: sequence number not allowed here
INSERT ALL
  INTO ordered_list VALUES (list_id, list_title)
  INTO ordered_list_node VALUES (list_id, 1, red_id)
  INTO ordered_list_node VALUES (list_id, 2, yellow_id)
  INTO ordered_list_node VALUES (list_id, 3, blue_id)
SELECT master_sq.NEXTVAL AS list_id
  ,    'Primary colors' AS list_title
  ,    (SELECT id FROM color WHERE title = 'Red') AS red_id
  ,    (SELECT id FROM color WHERE title = 'Yellow') AS yellow_id
  ,    (SELECT id FROM color WHERE title = 'Blue') AS blue_id
FROM dual;

The part that doesn’t work in on line 8, where I pull a value from the SEQUENCE named master_sq. It turns out that, even though this statement selects only a single row, it still could get confused and pull multiple values from the sequence. I know that it can get confused because of an attempt I made (described below) to cheat and bypass the restriction on using a sequence. Rather than fix that problem, they just make it raise an error.

I feel that raising an error is a real cop out, because it makes sense that an obvious use of the INSERT INTO statement is to insert a row into a parent table along with its child elements. Why else would you do an insert into multiple tables at once, except to reuse the primary key of the main table? Why they don’t allow reading from sequences is beyond me, because sequences are where we get our primary key values from in the first place. But, I digress, lets solve the problem and not rant about the need for a solution.

So, the next thing I tried was to pop the master_sq.NextVal BEFORE the INSERT INTO statement, and then just use CurrVal instead. I figured this would work because it doesn’t run the chance of being called multiple times and doing something unexpected.

Here’s the next piece of SQL I tried that failed, prompting this whole blog entry:

-- this doesn't work, but could if they wanted to allow it
-- ORA-02287: sequence number not allowed here
SELECT master_sq.NEXTVAL FROM dual;
 
INSERT ALL
  INTO ordered_list VALUES (list_id, list_title)
  INTO ordered_list_node VALUES (list_id, 1, red_id)
  INTO ordered_list_node VALUES (list_id, 2, yellow_id)
  INTO ordered_list_node VALUES (list_id, 3, blue_id)
SELECT master_sq.CURRVAL AS list_id
  ,    'Primary colors' AS list_title
  ,    (SELECT id FROM color WHERE title = 'Red') AS red_id
  ,    (SELECT id FROM color WHERE title = 'Yellow') AS yellow_id
  ,    (SELECT id FROM color WHERE title = 'Blue') AS blue_id
FROM dual;

I know this COULD work because one of the solutions I came up with uses this method. But, someone at Oracle decided it would be easier to just block all uses of sequences in the select clause of an insert all statement.

This Works, But I Don’t Like It

The next thing I tried was this, it does work, but defeats the purpose, and isn’t as elegant as what I ended up with in the end:

INSERT INTO ordered_list (title) VALUES ('Primary colors');
 
INSERT ALL
  INTO ordered_list_node VALUES (list_id, 1, red_id)
  INTO ordered_list_node VALUES (list_id, 2, yellow_id)
  INTO ordered_list_node VALUES (list_id, 3, blue_id)
SELECT (SELECT id FROM ordered_list WHERE title = 'Primary colors') AS list_id
  ,    (SELECT id FROM color WHERE title = 'Red') AS red_id
  ,    (SELECT id FROM color WHERE title = 'Yellow') AS yellow_id
  ,    (SELECT id FROM color WHERE title = 'Blue') AS blue_id
FROM dual;

In this chunk of code, I insert the parent row first, and then I find its id using a select query when I create the child statements. I think this is a bad solution for two reasons. (1) you have to fetch the record you just inserted using something other than the primary key, which sometimes isn’t going to be unique, and creates overhead of searching a table for no good reason. And (2) it isn’t one statement, so there is overhead of sending and compiling 2 statements to the server, and receiving two responses from the server.

The Right Way

Here’s the most elegant thing I could come up with. Instead of using the Sequence down in the select clause, you put it up in the values clauses. The parent record grabs the nextval and the child records grab the currval. Like so:

INSERT ALL
  INTO ordered_list VALUES (master_sq.NEXTVAL, list_title)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 1, red_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 2, yellow_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 3, blue_id)
SELECT 'Primary colors' AS list_title
  ,    (SELECT id FROM color WHERE title = 'Red') AS red_id
  ,    (SELECT id FROM color WHERE title = 'Yellow') AS yellow_id
  ,    (SELECT id FROM color WHERE title = 'Blue') AS blue_id
FROM dual;

Now, even though this solution uses one statement, and doesn’t need pl/sql, it still has its drawbacks. For instance, if a trigger on my child table pulled from the sequence in the background, that could be bad.

Also, Oracle just happens to process these inserts sequentially, but SQL is not a sequential language. What if down the road they did the inserts in reverse order, or each on its own thread? Who knows how portable this answer is? On the other hand, it seems to work well, and these questions are the reasons we write unit tests and test things before we move to a new generation of Oracle’s server — right?

I really wish that we could have “selected” the sequence.nextval down below into a single row result set that didn’t keep hitting the sequence for a new value each time it was used, becuase then we wouldn’t have to worry about any of this. But, I guess you cant have your cake and eat it too.

The 1st Runner Up (Previously the Right Way)

This solution is not as good as the “Right way” because it requires a little PL/SQL magic, and it uses two statements, so it has a little more overhead.

-- You'll need to do some things (described below) in order for this to work
SELECT master_sq.NEXTVAL FROM dual;
 
INSERT ALL
  INTO ordered_list VALUES (list_id, list_title)
  INTO ordered_list_node VALUES (list_id, 1, red_id)
  INTO ordered_list_node VALUES (list_id, 2, yellow_id)
  INTO ordered_list_node VALUES (list_id, 3, blue_id)
SELECT getCurrVal('master_sq') AS list_id
  ,    'Primary colors' AS list_title
  ,    (SELECT id FROM color WHERE title = 'Red') AS red_id
  ,    (SELECT id FROM color WHERE title = 'Yellow') AS yellow_id
  ,    (SELECT id FROM color WHERE title = 'Blue') AS blue_id
FROM dual;

As you can see, I wasn’t able to overcome the need to execute a separate SQL statement for updating the sequence so it contains a unique value for us to use. But, I did eliminate the need to search the table in order to find the recently inserted id.

I use the first select statement to “claim” a new ID from the sequence that we will use when we insert the record with its children.

To get it to work, all you need to do is create a PL/SQL function to read the current value of the sequence rather than read it right there in the INSERT ALL statement. Here is the function I used…

CREATE OR REPLACE FUNCTION getCurrVal
( seq_name VARCHAR2 )
RETURN NUMBER IS
  return_value NUMBER;
BEGIN
  EXECUTE IMMEDIATE 'select ' || seq_name || '.currval from dual'
    INTO return_value;
  RETURN return_value;
END;
/

The Wrong Way

Before I came to the above solution, I tried this out, which works in the sense that it compiles and Oracle will run it without complaining, and records will be inserted. But, it fails in the sense that the parent table’s id that was pulled from the sequence gets pulled several times, and all the child records don’t actually get a reference to the correct parent.

--DON'T do this, it does bad things.
INSERT ALL
  INTO ordered_list VALUES (list_id, list_title)
  INTO ordered_list_node VALUES (list_id, 1, red_id)
  INTO ordered_list_node VALUES (list_id, 2, yellow_id)
  INTO ordered_list_node VALUES (list_id, 3, blue_id)
SELECT getNextVal('master_sq') AS list_id
  ,    'Primary colors' AS list_title
  ,    (SELECT id FROM color WHERE title = 'Red') AS red_id
  ,    (SELECT id FROM color WHERE title = 'Yellow') AS yellow_id
  ,    (SELECT id FROM color WHERE title = 'Blue') AS blue_id
FROM dual;

The fact that this happens is really quite disturbing. Not just because it is inconvenient.

Think about this with me. Oracle doesn’t just run the getNextVal PL/SQL function once when it selects all the values that will be spread out across the tables we are inserting into, it actually runs the PL/SQL function once for every row that reads the list_id field as part of its insert operation.

That could have unforeseen consequences if you don’t know that, and, it seems awfully performance heavy to be switching context between SQL and PL/SQL all the time like that. But, in the end, that is up to them to fix, this solution works, and seems to perform better than the other alternatives.

Example

Here is a full blown example from start to finish that illustrates using this method.

-- this sequence keeps a unique id across all tables that
-- are (or may some day be) listable so that the records they
-- reference are unique, even across multple tables.
CREATE SEQUENCE master_sq;
 
-- create tables for organizing listable items into ordered lists
CREATE TABLE ordered_list
  ( id     NUMBER
  , title  VARCHAR2(120)
  );
 
CREATE TABLE ordered_list_node
  ( list_id        NUMBER
  , list_index     NUMBER
  , item_id        NUMBER
  );
 
-- create an auto-sequencing id for the ordered_list table
CREATE OR REPLACE TRIGGER ordered_list_auto_sequence
  BEFORE INSERT ON ordered_list
  FOR EACH ROW WHEN (NEW.id IS NULL)
BEGIN
  SELECT master_sq.NEXTVAL INTO :NEW.id FROM dual;
END;
/   
 
--  some fun test data
CREATE TABLE color
  ( id NUMBER
  , title VARCHAR2(20)
  ); 
 
CREATE OR REPLACE TRIGGER color_auto_sequence
  BEFORE INSERT ON color
  FOR EACH ROW WHEN (NEW.id IS NULL)
BEGIN
  SELECT master_sq.NEXTVAL INTO :NEW.id FROM dual;
END;
/
 
-- add colors to our color table, in no particular order
INSERT ALL
  INTO color (title) VALUES ('Red')
  INTO color (title) VALUES ('Violet')
  INTO color (title) VALUES ('Green')
  INTO color (title) VALUES ('Blue')
  INTO color (title) VALUES ('Indigo')
  INTO color (title) VALUES ('Orange')
  INTO color (title) VALUES ('Yellow')
SELECT * FROM dual;  
 
-- make a ordered_list of the primary colors,
-- in the order people usually talk about them
SELECT master_sq.NEXTVAL FROM dual;
 
INSERT ALL
  INTO ordered_list VALUES (master_sq.NEXTVAL, list_title)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 1, red_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 2, yellow_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 3, blue_id)
SELECT 'Primary colors' AS list_title
  ,    (SELECT id FROM color WHERE title = 'Red') AS red_id
  ,    (SELECT id FROM color WHERE title = 'Yellow') AS yellow_id
  ,    (SELECT id FROM color WHERE title = 'Blue') AS blue_id
FROM dual;
 
-- make a ordered_list of colors,  in rainbow order
SELECT master_sq.NEXTVAL FROM dual;
 
INSERT ALL
  INTO ordered_list VALUES (master_sq.NEXTVAL, list_title)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 1, red_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 2, orange_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 3, yellow_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 4, green_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 5, blue_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 6, indigo_id)
  INTO ordered_list_node VALUES (master_sq.CURRVAL, 7, violet_id)
SELECT 'Rainbow colors' AS list_title
  ,    (SELECT id FROM color WHERE title = 'Red') AS red_id
  ,    (SELECT id FROM color WHERE title = 'Orange') AS orange_id
  ,    (SELECT id FROM color WHERE title = 'Yellow') AS yellow_id
  ,    (SELECT id FROM color WHERE title = 'Green') AS green_id
  ,    (SELECT id FROM color WHERE title = 'Blue') AS blue_id
  ,    (SELECT id FROM color WHERE title = 'Indigo') AS indigo_id
  ,    (SELECT id FROM color WHERE title = 'Violet') AS violet_id
FROM dual;
 
-- Now, lets see the results...
-- pull some data from our test lists
SELECT       clr.title AS color_title
FROM         ordered_list list
  JOIN       ordered_list_node node
    ON       node.list_id = list.id
  JOIN       color clr
    ON       node.item_id = clr.id
WHERE        list.title = 'Primary colors'
ORDER BY     node.list_index;
 
SELECT       clr.title AS color_title
FROM         ordered_list list
  JOIN       ordered_list_node node
    ON       node.list_id = list.id
  JOIN       color clr
    ON       node.item_id = clr.id
WHERE        list.title = 'Rainbow colors'
ORDER BY     node.list_index;

The output from those last two select statements look like this…

COLOR_TITLE
--------------------
Red
Yellow
Blue

3 rows selected.

COLOR_TITLE
--------------------
Red
Orange
Yellow
Green
Blue
Indigo
Violet

7 rows selected.
No Comments Posted in Oracle Database
Tagged , , , , , , , , ,
Inserting multiple rows in one SQL statement

Background

So, the other day, I opened up a MySQL dump file just to graze over it and see how they did things (you can call be weird if you want to, I prefer to think of it as “curious”). Anyway, I noticed something very cool that I thought was absolutely ingenious. They were inserting multiple rows in a single SQL statement, sort of like this…

INSERT INTO item (id, title)
    VALUES (1,'item1')
    , (2, 'item2')
    , (3, 'item3')
    ;

I instantly had tons of places (and reasons) I wanted to do this. First my reasons:

  1. Most of the overhead of a SQL
    statement (especially an insert) is time spent sending the statement
    to the server. Each statement has overhead, so doing several things in
    a single statement can significantly improve performance.
    This is generally true even if the statement is significantly more complex
    (so long as it isn’t sort intensive).
  2. Updating indexes is often the slowest part of inserting rows into a table,
    and inserting several at one time only rebuilds the index once (after all the rows are
    added) rather than once per row. This can be a significant performance boost.
  3. If you are making a create-database SQL script, or generating
    scripts to import data from an external source (like an excel file) you can
    decrease the total size of the script if you eliminate hundreds (or thousands) of
    INSERT INTO item (id, title) … and can tie them altogether.

So, I started using the multi row insert statements, and have really loved it. I was using them in MySQL and PostgreSQL, and they worked perfectly just as described above, so I sort of assumed it was a SQL standard. Then I tried using it on an Oracle database (11g) and it failed, which is odd because Oracle prides itself in being the first database to implement all the new SQL standards as soon as possible. It turns out this is an extension not a standard.

But, I decided it was worth knowing how to insert multiple rows in a single statement using each of the SQL dialects that I usually work with.

Of course, my ultimate goal isn’t just to see if I could do it with each engine, but to come up with a database neutral way to do it. I wanted to find the holy grail of SQL developers: the plain vanilla way that works everywhere.

This builds on a previous post where I show how to build a simple table with an auto incrementing primary key in each of several SQL dialects.

The Vanilla Way

This works in all the engines I evaluated (MySQL, Sqlite, PostgreSQL, Microsoft SQL Server, and, with minor tweaking it works in Oracle)

INSERT INTO item (title)
    SELECT 'title1'
    UNION ALL SELECT 'title2'
    UNION ALL SELECT 'title3'
    ;

The Better way (that isn’t universally supported)

This works in MySQL 5.1 and PostgreSQL 8.3 and Microsoft SQL Server 2008, but DOES NOT WORK in Oracle 11g, or Sqlite 3

INSERT INTO item (title)
    VALUES ('title1')
    , ('title2')
    , ('title3')
    ;

The Oracle Way

Of course, Oracle has its own way of doing things, so here is the preferred way to insert multiple records in oracle.

INSERT ALL
    INTO item (title) VALUES ('title5')
    INTO item (title) VALUES ('title6')
    INTO item (title) VALUES ('title7')
SELECT * FROM dual;

Comments on the Plain Vanilla Way

Using UNION ALL to build one big table with all the rows you want to INSERT is not elegant,
you have to look at it for a while and do a little bit of head scratching before you
say oh, and then you will probably say why in the heck did they do that?

But it works great, and has two of the three benefits from inserting multiple rows at once.

5 points for all database engines that can do it.

Oracle gets only 3 points because they still require that stupid from dual effectively ruining the possibility of having a truly vanilla way of doing things. That means, if your server is running Oracle, you have to insert it like this:

INSERT INTO item (title)
    SELECT 'title1' FROM dual
    UNION ALL SELECT 'title2' FROM dual
    UNION ALL SELECT 'title3' FROM dual
    ;

Comments on the Better Way

The better way is easy to read, easy to write, easy to maintain, easy to add more rows to the existing query with minial typing. It also has the added benefit of being faster.

In short, it is absolutely elegant. 5 more points for engines that support it.

Comments on the Oracle Way

The INSERT ALL method probably enjoys some speed optimization above using the UNION ALL method, which is probably worth a point (that is why Oracle gets a 4 instead of just 3). I just can’t bring myself to offer more points than that in this case.

INSERT ALL still has the flexibility of allowing (our should I say requiring) a field signature to be provided for each row, this added flexibility probably equates to extra processing time before the insertions take place, but I can’t look under the hood to see for myself so don’t quote me.

This is an example of Oracle taking a cool statement for inserting into multiple tables at the same time, and using it to satisfy the multiple rows in the same table too. I agree it should work (it makes sense that you might want to insert several lines in a child table as you insert one line in a parent table), but I don’t think that that is a good excuse for not supporting the Better way. In fact, I think it is a real cop out.

Why I used UNION ALL instead of UNION

UNION ALL will offer significantly better performance than UNION, because the UNION command assumes that if there are duplicates they should be removed (mathematically speaking, the union of two sets includes only one copy of the overlapped region.) If you use UNION ALL, it keeps both copies of the overlapped region.

Of course, you shouldn’t have any overlapped region, and you certainly don’t want the db engine to sort and then compare all the records to make sure you don’t have any duplicates before it moves forward. UNION ALL doesn’t do any of that sorting and filtering so it will have significant performance improvements over just using UNION.

Points are awarded as follows:

MySQL (v 5.1)
10 out of 10
PostgreSQL (8.3)
10 out of 10
Microsoft SQL Server (2008)
10 out of 10
Sqlite (3)
5 out of 10
Oracle (11g)
4 out of 10

Oracle, catch up already.

Does anyone want to place bets about whether Sqlite will support the better insert statement before Oracle does?

For that matter, does anyone want to place bets about whether Oracle will ever stop requiring the FROM DUAL statement that no one else seems to need?

Maybe I should blog about one or more of the features that oracle really shines in just so people don’t think I have an anti-oracle bias.

No Comments Posted in Microsoft SQL Server, MySQL, Oracle Database, PostgreSQL, Sqlite, Vanilla SQL
Tagged , , , , , , , , , ,
Comparison of Auto Incrementing Primary Key feature in several SQL Dialects


The Problem


The whole point in using SQL to interact with a database is for developers to be able to use the same basic language to interact with any database. The intentions were good but, as it turns out, even something as universally desired as having a primary key that automatically increments itself is not standardized, and, in some cases it isn’t even directly supported.

So, here is how you create an auto generated primary key in the current version of each of several popular SQL dialects. I will demonstrate creating a table with an auto indexed primary key, and how to add rows to it in a plain vanilla way (that works from any client). Unfortunately, there is no plain vanilla way to create the auto incremented primary key, so table creation must be done on a per database basis.

Jump to Solution

MySQL solution sing AUTO_INCREMENT
Sqlite solution using INTEGER PRIMARY KEY
Oracle solution using a PL/SQL Trigger
Microsoft SQL Sever solution using IDENTITY
Postgres solution using SERIALIZED
INSERT INTO statement that works for all these SQL dialects

The CREATE TABLE Statments that work

Presented in order by how many points I award the solution

MySQL 5.1

CREATE TABLE item
    ( id INTEGER PRIMARY KEY AUTO_INCREMENT
    , title VARCHAR(50)
    );

Points: 10 out of 10

Comments on the MySQL Way

It seems to me that adding an AUTO_INCREMENT attribute to a column just like a NOT NULL constraint or a DEFAULT property is a very straightforward way of handling the problem. It is extremely obvious what the AUTO_INCREMENT field should do, even without consulting the documentation.

I give them full points because the MySQL way is easy to read, easy to write, easy to remember, and it handles both kinds of the generic insert statements below.



Sqlite3

CREATE TABLE item
    ( id INTEGER PRIMARY KEY
    , title
    );

Points: 9.5 out of 10

Comments on the Sqlite Way

Sqlite generally ignores any data types assigned to a column, and assigns variable length string types to almost everything. The INTEGER PRIMARY KEY option is one of the few exceptions.

Its elegant, clean, not quite as obvious, but very well documented (its question #1 in their FAQ page).

I ding them half a point because there is no obvious way to have another column auto increment. I’ve never had a reason to need that, but it isn’t really an option, and the MySql solution could handle that if I ended up needing it some day. Furthermore, someone coming from another SQL dialect might not consider it intuitive that ALL primary keys are naturally auto generating.



Oracle 11g

CREATE SEQUENCE item_seq;
 
CREATE TABLE item
    ( id NUMBER
    , title VARCHAR2(50)
    );
 
CREATE TRIGGER item_auto_increment
BEFORE INSERT ON item
     gap_between NUMBER;
FOR EACH ROW
BEGIN
    IF NVL(:NEW.id, 0) = 0 THEN
        :NEW.id := item_seq.NEXTVAL;
    ELSE
        -- make sure the sequence isn't going to overlap the key value just provided
        IF :NEW.id > item_seq.CURRVAL THEN
            gap_between := :NEW.id - item_seq.CURRVAL;
            ALTER SEQUENCE item_seq
              INCREMENT BY gap_between;
 
            -- read the sequence so it jumps the gap
            gap_between := item_seq.NEXTVAL; -- should now be :new.id
 
            -- now set it back to increment 1 at a time
            ALTER SEQUENCE item_seq
              INCREMENT BY 1;
        END IF;
    END IF;
END;
/

note: The above trigger is not the standard trigger seen in most oracle books, and online tutorials, it is my own derived version. This is the standard auto-incrementing trigger you might find elsewhere.

--not my style, but works great
CREATE TRIGGER item_auto_increment
BEFORE INSERT ON item
     gap_between NUMBER;
FOR EACH ROW WHEN (NEW.id IS NULL)
BEGIN
    SELECT item_seq.NEXTVAL INTO :NEW.id FROM dual;
END;

Points: 8 out of 10

Comments on the Oracle Way

First off, let me say gross. In fact, let me say it again gross. You have to create your own trigger to pull from a sequence. Furthermore, you have to explicitly create the sequence (PostgreSQL was nice enough to do that for you and tell you it did it, I’m not sure how mysql does it, but I doubt it uses a trigger because it supported AUTO_INCREMENT before it supported triggers).

They SHOULD have let us do something nice like this…

 -- THIS DOESN'T WORK!!
CREATE TABLE item
    ( id NUMBER DEFAULT item_seq.NEXTVAL NOT NULL
    , title VARCHAR2(50)
    );

But no, it can’t be that easy. (you’ve got to love oracle)

I ding 2 points for sheer uglyness because having to do the whole thing yourself is tedious and error prone. On the other hand, once it is there, it works, and you can do all of what is needed as part of a table creation script, and once it is done, developers can use the table just as easily as the other SQL dialects discussed, so I won’t doc them too much for it.

The fact that you have to build the trigger yourself has one advantage: You can make it behave exactly the same way MySQL behaves, and so it contributes to the “vanilla” factor. Specifically, you can make it handle both NULL and 0 values the same way, and you can make sure that when a value is specified, our sequence can be updated to make sure we don’t try to reuse that value. Ugly, YES, but it works.

Why I like my trigger more than the industry standard

Upon careful inspection of my trigger, there are a lot of people who would argue that it is long and kind of pointless. The normal way uses the WHEN clause to only execute when a null is provided, mine doesn’t, so it is less efficient when an ID is provided. And, it has a lot more code, which means the database has to do more thinking each time I insert a record. Seems inefficient.

To them I say Yes, but… I pretty much always use the trigger to add the key, so it isn’t a big deal that it executes my trigger when I don’t. Furthermore, my trigger protects from the possibility that someone will supply an ID on a new record that is not in sequence, and that, months down the road will create a conflict with the key values generated by the sequence. That will cause a series of strange errors that no one can account for, and that I don’t want to be called about. Its an ounce of prevention in a critical location, and its worth the overhead it might cause on the few cases where people aren’t dong things the normal way.



Microsoft SQL Server 2008

CREATE TABLE item
    ( id INT PRIMARY KEY IDENTITY
    , title VARCHAR(50)
    );

Points:7 out of 10

Comments on the Microsoft Way

My only real comment here is that IDENTITY is not as obvious a name as AUTO_INCREMENT, but the use is simple enough, looks nice, works as expected, … I like it.

I ding a point for using a word that sounds more synonymous with PRIMARY KEY then it does with AUTO_INCREMENT the word IDENTITY has little or nothing to do with picking sequentially growing key values for your primary key.

I ding 2 points for not being able to use the second form of INSERT command (the one that uses the default signature for the table). This happens because they protect you from inserting anything into this column. I have nothing against protection, and in this case I would even agree that it is important protection. But, the end result of this protection is that you can’t use the second form of insert statement because you are specifying a column value (NULL) on a read only field.

Furthermore, often times, when you initially populate tables, you create a bunch of records with specific primary key values, during that phase of database creation, you don’t want the protection so you could create the column as IDENTITY_MODIFIABLE, but that eliminates the protection when you are done with the initial records and you do want some protection.



PostgreSQL 8.3

CREATE TABLE item
    ( id SERIAL PRIMARY KEY
    , title VARCHAR(50)
    );

Points: 6.5 out of 10

Comments on the Postgres Way

Its clean, straightforward enough, and makes sense. I like it but I don’t really love it, here’s why:

I dinged half a point for picking the keyword SERIAL. It is obviously not as good as AUTO_INCREMENT because in their own documentation they explain it by saying it works like AUTO_INCREMENT in “other” databases. I decided not to ding them too bad for this because SERIAL is a much better choice of words than IDENTITY. At least serial implies that it will be assigned sequential values. The first time I saw it, it made sense to me.

I ding one point for replacing the data type “INTEGER” with the pseudo data type “SERIAL” rather than using SERIAL as a modifier. I do this because it is good to know what kind of field it is so you can make your foreign keys match. The current implementation requires you to know what is going on behind the scenes in order to use it anyway. Furthermore, instead of adding one modifier, they created two pseudo types (the other is BIGSERIAL).

To be honest, I really like the idea of having the data type be SERIAL instead of INTEGER, but they way they implemented it ruins the reason I like the serial type. If SERIAL were its own data type, then Postgres could move it to a larger int size down the road and handle that for you when you upgrade. They could even substitute in a GUID or some other magic ID. BUT, instead, it makes it an integer, and you have to know what the data type is in order to reference it. There is no SERIAL_REFERENCE data type. So you hard code it to be an integer in other tables… so there is no benefit from a somewhat mysterious SERIAL datatype.

I ding two more points because the insert statement using the default table signature fails complaining that you have provided a value for the column, which is not allowed (even though NULL isn’t much of a provided value).



Plain Vanilla Insert Statement (works for all of the above)

INSERT INTO item (title)
    VALUES ('some item title');

Insert Statement that works for MOST of the above

INSERT INTO item
    VALUES (NULL, 'some item title');

Inserting based on the default signature for the table is a common (albeit a little bit quick and dirty) way to add the initial records to a table in the schema creation SQL script. This sort of insert will work with an auto generated key for some of the databases, but not all of them. I dinged the Microsoft implementation and the PostgreSQL implementation of SQL each two points for not handling it gracefully.

I think that supplying a NULL value for a column is exactly the same thing as not supplying a value for the column, and 3 of the 5 database engines seem to agree with me on that, Microsoft and Postgres need to catch up with the game here.

back to the top of this post

No Comments Posted in Microsoft SQL Server, MySQL, Oracle Database, PostgreSQL, Sqlite, Vanilla SQL
Tagged , , , , , , , , ,
QuickBooks SDK from Python 2.5

So, for a particular project, I found myself needing to get into a client’s QuickBooks file from a python script. I had previously done this sort of thing using a COM from another language using the QuickBooks SDK.

Being somewhat new to python, I had never even tried to use it to interface with Windows COM Objects before, but, I decided that I knew enough about COM that I could fill in the gaps on whatever documentation I found and so I decided to give it a shot. I downloaded and installed the QBSDK, from the Intuit site, and got started putting together a script that “dumps” some of the data into some export files.

Of course you are going to have to download and install the Python Win32 Extensions. (Yes, that means you can’t do this on Linux, but honestly, you can’t really run QB on Linux either, so I don’t feel too bad.)

Here is what I came up with:

import win32com.client
 
query_rqs = (
  'Host',
  'Company',
  'Account',
  'Entity',
  'Terms',
  'Class',
  'CustomerType',
  'VendorType',
  'JobType',
  'PaymentMethod',
  'ShipMethod',
  'SalesTaxCode',
  'Item',
  )
 
qbxml_header = """<?xml version="1.0" ?>
<!DOCTYPE QBXML PUBLIC '-//INTUIT//DTD QBXML QBD 1.0//EN'>
<QBXML>
<QBXMLMsgsRq onError="continueOnError">
"""
 
qbxml_footer = """
</QBXMLMsgsRq>
</QBXML>;
"""
 
qb = win32com.client.Dispatch("QBXMLRP.RequestProcessor")
qb.OpenConnection("Data Sucker", "Data Sucker")
ticket = qb.BeginSession("",0) 
 
for rq in query_rqs:
    print "Exporting %s(s)" % rq
    output_file = open("c:\\qb-export-data\\%s.xml" % rq, 'w')
    output_file.write(qb.ProcessRequest(ticket, "%s<%sQueryRq></%sQueryRq>%s" % (qbxml_header, rq, rq, qbxml_footer)))
    output_file.close()
print "Data Export Completed"
 
qb.EndSession(ticket)
qb.CloseConnection()
qb = None #drop our QB object

The First part where we define ‘query_rqs’ just makes a convenient list of the object types we are going to query from the QB company file. Requesting data is a uniform enough process that I decided to just loop through these names below.

To understand the part about the ‘qbxml_header’ and ‘qbxml_footer’ you simply have to understand that every exchange with the QB SDK is done in XML. this header/footer combination will be used to wrap every request we make, so I put it in a variable for easy use a little lower.

Then, I actually call use the COM API to create a COM object of type “QBXMLRP.RequestProcessor” the request processor is the primary object in the QBSDK. You use it to connect to QuickBooks, and then to send XML requests, and receive XML responses. The QBSDK does not have COM Objects for every QB object, instead it uses XML to pass properties back and forth about the internal QB objects. In this way, they have an API that doesn’t need to change much in order to support different versions and/or editions of their product.

Anyway, the next line opens up a connection and tells QuickBooks that the application called “Data Sucker” wants permission to connect.

If this is the first time that “Data Sucker” has requested access to the company file, QB needs to be open. It will prompt the user for permission.. .. .. and eventually you will connect (or be denied, depending on your user).

Then, we begin a session. the empty string parameter is the name of the company file we want access to. Empty string implies “whatever company is opened by the user right now.” Which was good enough for my purposes. I suppose I could have supplied a path to a QB file, but it wasn’t necessary in this case. I also elected to use Single User Mode (that is what the 0 in the second parameter means). If I am sucking data out of the company file, I don’t want it to be changing between requests — I want a snapshot that agrees with itself.

Then, I loop though all the requests (rq) that I listed up at the beginning of the file. I create an output file with the XML returned by each request that I made (one file per request).

After the loop, I tidy up a bit and look at the generated export files, and can see that life is good.

I have no idea whether or not this would work on a QB install without the QB SDK on the same PC. The Request Processor object might be installed as part of QuickBooks, and the QB SDK might only contain extra resources. If anyone tries this at home, it would be interesting to know if you can run the script without installing the QBSDK.

Also, I know that this only works on the Pro and Premier Editions of QuickBooks. The Basic Edition is intentionally not capable of interfacing with QBSDK — its an upgrade incentive.

As this was also an exercise in COM, I want to point out the last line of the script where I set qb = None. This is the only way I could think of to be sure I had decremented the object reference count and cause the COM engine to destroy the object. I don’t know if Python has issues with COM object destruction, but I do know that early releases of VB6 had memory leaks if you didnt explicitly destroy objects (in some scenarios) so I figured, better safe than sorry.

-Brent

3 Comments Posted in Python Programming Language, Quickbooks SDK
Tagged , , , , ,