Logical vs Physical ER Diagrams

Posted by Dan | Posted in Databases, MySQL, SQL Server, SQLite | Posted on 01-24-2010

0

Logical diagrams are to convey requirements only. Physical diagrams represent the actual data structure to support the requirements and take into account technical scalability and speed.

Edit: I hate the way I had to format this document for this blog post. If you want this tutorial better formatted, check out the Word document.

One-to-Many Relationship


Logical

On ER/Studio, two tables are created. The Store table has a primary key StoreID. The Manager table has a primary key, ManagerID and a foreign key, StoredID (which is mapped to StoreID from the Store table).

Physical

On SQL Server, two tables are created. The Store table has a primary key StoreID. The Manager table has a primary key, ManagerID and a foreign key, StoredID (which is mapped to StoreID from the Store table).

Manager Table:

If you allow NULLs for StoreID in the Manager table, then you’ll be able to have a Manager without a store. If you don’t allow NULLs (leave it unchecked), then you’ll have to have at least one Store assigned to a Manager.

Querying

The above states that one store can have many managers. Here’s some sample data what’s in the tables:

SELECT * FROM Manager

SELECT * FROM Store

Get all manager information with for all managers that belong to a store:

SELECT  Manager.ManagerID,  Manager.FirstName,  Manager.LastName,  Manager.StoreID,
        Store.[Name],  Store.Address,  Store.STATE,  Store.City,  Store.Zip
FROM    Manager
        INNER JOIN Store ON Manager.StoreID = Store.StoreID


Get all manager information with for all managers (even if they don’t have a store):

SELECT  Manager.ManagerID,  Manager.FirstName,  Manager.LastName,  Manager.StoreID,
        Store.[Name],  Store.Address,  Store.STATE,  Store.City,  Store.Zip
FROM    Manager
        LEFT OUTER JOIN Store ON Manager.StoreID = Store.StoreID


Notice the NULL for Steamboat Willie. He doesn’t have a store, so all Store related fields show as NULL.


Many-to-Many Relationship


In order to implement this physically, you need a join table. In this case, we use StoreManager. Logically, you only need only two entities (Store and Manager).

Logical

Physical


On SQL Server, three tables are created. The Store table has a primary key StoreID. The Manager table has a primary key, ManagerID. The table StoreManager has two foreign keys:  StoredID (which is mapped to StoreID from the Store table) and ManagerID (which is mapped to the ManagerID from the Manager table).

StoreManager Table:

If you allow NULLs for StoreID and ManagerID in the StoreManager table, then you’ll be able to have a Manager without a store. If you don’t allow NULLs (leave it unchecked for both), then you’ll have to have at least one Store assigned to a Manager.

Here’s some sample data what’s in the tables:

SELECT * FROM Manager


SELECT * FROM Store


SELECT * FROM StoreManager


Get all manager information associated with his store:

SELECT  Manager.ManagerID,  Manager.FirstName,  Manager.LastName,
        StoreManager.StoreID, StoreManager.ManagerID, Store.StoreID,
        Store.[Name],Store.Address, Store.STATE, Store.City, Store.Zip
FROM    Store
        INNER JOIN StoreManager ON Store.StoreID = StoreManager.StoreID
        INNER JOIN Manager ON StoreManager.ManagerID = Manager.ManagerID

Generate DateTime based on Time Offset

Posted by Dan | Posted in SQL Server | Posted on 01-17-2010

0

So let’s say you have a table with a column of type DateTime. Now you have to support timezones. The first thing you create is a table of time offsets (Google Time Offsets). The table would look like this:

Here’s schema for it. You can download the full script with the data as well.

CREATE TABLE [dbo].[TimeZones](
	[TimeZoneID] [INT] IDENTITY(1,1) NOT NULL,
	[Offset] [VARCHAR](15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[TimeLabel] [VARCHAR](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 CONSTRAINT [PK_TimeZones] PRIMARY KEY CLUSTERED 
(
	[TimeZoneID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

Now for the purpose of this tutorial, let’s create an event table. This will have a set of dates/times which we’ll use to add the timeoffsets to. The table looks like this:

and here’s the script:

CREATE TABLE [dbo].[TimeEvent](
	[TimeEventID] [INT] IDENTITY(1,1) NOT NULL,
	[EventName] [VARCHAR](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[EventDateTime] [DATETIME] NULL,
	[TimeZoneID] [INT] NULL,
 CONSTRAINT [PK_TimeEvent] PRIMARY KEY CLUSTERED 
(
	[TimeEventID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

Now let’s INNER JOIN the tables properly adding the timeoffset to the event datetime:

SELECT  tz.TimeZoneID,
        tz.Offset,        
        te.TimeEventID,
        te.EventName,
        te.EventDateTime AS 'OriginalEventDateTime',
 
        -- Hour - Extract Hour from Offset
        SUBSTRING(Offset, 0, CHARINDEX(':', Offset, 0)) AS 'Hour',
 
        -- Minute - Extract Minute from Offset        
        SUBSTRING(Offset, CHARINDEX(':', Offset, 0) + 1, LEN(Offset)) AS 'Minute',
 
        -- New Date with UTC Addition
        DATEADD(HOUR,
                CAST(SUBSTRING(Offset, 0, CHARINDEX(':', Offset, 0)) AS INT),
                DATEADD(MINUTE,
                        CAST(SUBSTRING(Offset, CHARINDEX(':', Offset, 0) + 1,
                                       LEN(Offset)) AS INT), EventDateTime -- DateTime change via Offset
                         )) AS 'EventDateTimeWithTimeZone'
FROM    TimeZones tz
        INNER JOIN TimeEvent te ON tz.TimeZoneID = te.TimeZoneID

Here’s the result:

Script Out Table Data in SQL Server 2008

Posted by Dan | Posted in Databases, SQL Server | Posted on 01-15-2010

0

SQL Server 2008 has a new feature where you can script out data in tables via INSERT statements. Just do the following:


1. Right click on the database and select "Generate Scripts..."


2. Select the database.


3. This is the hidden part. Set "Script Data" to true.


4. Select Tables.


5. Select Tables.


6. You can script out a table script per file or as a single file. I prefer ANSI text.


7. Hit Finish.



Add a Covered Index to tables to get rid of Bookmark Lookups

Posted by Dan | Posted in SQL Server | Posted on 01-11-2010

0

Add a Covered Index to tables to get rid of “bookmark lookups.” The idea of bookmarks: SQL Server uses Bookmarks when you use columns in your query that are not non-clustered/clustered indexes. So if we have something like this:

SELECT  AreaID,
        ContentID,
        ApprovalCodeID,
        isSubmissionPoint
 
FROM    ContentAreas
 
WHERE   AreaID = 31
        AND ContentID > 30000
        AND ModifierID > 65000

And let’s say that AreaID is a clustered index and ContentID is a non-clustered index. If ModifierID is not an index of any kind, it would function as a “bookmark lookup” in this query. To not use a “bookmark lookup,” use a Covered Index, which clumping the Modifier column with ContentID and turning both into a Covered Index (a kind of composite index).

Order of Indexing Matters

Posted by Dan | Posted in SQL Server | Posted on 01-11-2010

0

In SQL Server, the order that you declare Indexes matters! So take the following example:

And we run this query:

SELECT  CAST(ca.AreaID AS VARCHAR(4)),
        RemovalDate
FROM    ContentAreas ca WITH ( NOLOCK )
WHERE   ContentID = 232232
        AND removerID = 34343

If you are to run the following query, since you are using ContentID as part of the WHERE clause, you want to put the ContentID’s index at the top of the indexes/key in SSMS. It speeds up retrieval.  Also put RemoverID below that – test with viewing the execution plan first to see if it makes a difference. Remember, the more indexes you add, the more it takes to update the index.

Asynchronous Upload to Amazon S3 (S3)

Posted by Dan | Posted in ColdFusion, Databases, JavaScript, SQL Server, Technology, Virtualization, Web Browsers, Web Design | Posted on 01-01-2010

0

Requirements

Make sure you have an Amazon S3 Account (get it from tech). JavaScript is mandatory for this to work (to be able to POST to two different domains) upon user submission.

Summary of Problem

This is a summary outlining the solution used in a video uploader. It entailed a form that would leverage the user of our Amazon S3 (S3) account. In addition, because the video files could be large (and to avoid CF limitations), a critical requirement was to upload to that page S3 directly. At the same time, the form field information had to be saved to our database. This meant doing a double post – one to the XYZ domain, and the other to s3.amazon.com. This cross-domain POST could only be done via AJAX.

Here’s visualization:

As you can see, once the user clicks “Submit”, there’s an AJAX HTTP POST to XZY server to save the fields to the database, and then the JavaScript runs a form.submit() on the current form to submit the file via POST to S3.

Introducing Amazon S3

Amazon S3 (Simple Storage Service) is a cloud service whose sole purpose is to store files. To store files into it, one can use its REST API or Admin Web Console (AWS).

Screenshot of the AWS console, with Amazon S3 tab selected.

One gets to the AWS console via an account (can get it from tech) and going to http://aws.amazon.com/s3/ and clicking “Sign in.”

While S3 has many advantages, there are a set of drawbacks as well. To summarize, here’s a list:

Benefits:

  • The max upload size for a file is 5GB. Plenty.
  • For all the times I’ve tested, uploads of different file sizes, everything has gone super smoothly – like butter – so definitely reliable.
  • Amazon is super scalable (as you may already know), so parallel uploading from one or many users is really no problem.
  • Would not affect performance of our servers – there could be many uploads, and they would go fast, without slowing down any other web sites on servers.
  • The speed of the upload is limited to the user’s computer’s specs and internet provider – much faster than our servers.
  • Files can be made secure and unreadable, not just through obscurity – this is sometimes tricky to implement in ColdFusion.

Drawbacks:

To summarize, the reason for some of the drawbacks, is because it’s doing a POST request directly from one domain (us) to another (s3.amazonaws.com). It’s not being channeled through our CF servers.

There are two ways to interact with S3: the REST API, and doing a direct POST. With the REST API, the upload data has to be channeled through a server first before sending to Amazon – this was not what we were looking for, since our servers have issues with large files. So we looked into removing ourselves as the middleman and sending the data directly to S3 – via POST.

Here are the drawbacks, mainly three:

  • If S3 detected an error in the upload, e.g. if the file is too large, there’s no default error page, just a redirect to an XML document hosted on s3.amazonaws.com. There’s no way to set an error page – it’s on Amazon’s to-do list for future release. One can’t even customize the look and feel of the XML document you’re redirected to. Side note: if the upload was successful, it gets redirected to a page you specify (at least there’s some control here).
  • Progress bar reusable code is scare. There’s tons of code out there to do this, however, I could not find one that could cross-domain post. With traditional AJAX, you’re only allowed to do a POST/GET if the URL you’re using is the same domain as the caller page. One could get the code for a progress bar plugin (as there are tons out there) and rewrite it to do a POST and work with Amazon S3 – but that would take a considerate amount of work.
  • Lack of documentation. There’s not enough documentation for handling POST requests in the official Amazon Developer docs, which makes troubleshooting difficult. Doing POST submits is a relatively new feature of Amazon S3, compared to the rest of their APIs.

So the largest hurdle is to code functionality to get around the error page, since some JavaScript magic has to be put in place. That would be another day or so of work just for that, I believe. I already have some code in place that I put together while testing. If we left it as-is, when the user uploads, and if there was an error, the user would see something like this:

Which would, of course, be nonsensical. If the file was too large, they would see a message that the file was too large within the message tags. The user would then have to hit back to return to the form.

We can try another way, probably the easiest. When the user hits submits, it starts showing the animated spinner as it’s uploading. Also, we can tell the user that if he encounters an error page, just hit the back button. Also, keep in mind that there’ll be validation in place before the upload to check for file extension, at the very least. The only edge case to seeing that XML error message is if the file they submitted is over the limit *AND*  they have JavaScript turned off (that overrides the JavaScript file extension validation).

Creating a Basic HTML that POSTs a File to Amazon

Step 1 – Create a Bucket / Folder / Object:
The first thing we need to do a is create a Bucket on S3. To do this the easy way, go to the AWS: https://console.aws.amazon.com/s3/home and create a bucket:

Buckets are where you store your objects (i.e. your files of any format). You can create a folder for further organization:

As you can see here, there are 4 folders here. We can double-click on step1_new_submissions folder and see the objects that are contained within this:

You can right-click on one of those objects (files) and click “Properties”:

Notice that a Properties panel will expand below. To the right, you have three tabs: Details, Permissions, Metadata.

If you click on the Permissions Tab you’ll notice that by default the file that was selected has following permissions set by user USERNAME:

Go back to the details tab and click on the Link:

You’ll notice that you’ll be taken to an XML document in your browser that has the following:

It’s because you have not let it public access. To give it public access, you click on the Permissions tab again, and click “Add more permissions” , set the Grantee to “Everyone” and choose Open/Download and Save.

You can also set the make multiple objects public. Select an object, hold the SHIFT key, then select the other object to select the objects in between. Select “Make Public”:

You can also upload an object manually via the “Upload” button:

Then click “Add more files”

As the file starts uploading, you’ll see the bottom panel show details about the file transfer:

Step 2: Setting up the HTML Form

The S3 REST API is very flexible, as long as you execute the proper method from your application, while at the same time sending the file over from your server to S3 (via a REST method with the correct URI). Traditionally, it would look like this:

Notice how there’s a middle-man that serves as the captor of the data submitted by the form and the file. Then, it sends it long to S3. The middle-man here is crucial. Double the total bandwidth is spent here – the bandwidth to go from the user’s machine to the web server (in this case CF), and then the bandwidth spent transferring the file to S3.

The advantage to this layout is that because the web server acts as a middle-man server, it can modify the data, change its filename, and slice-and-dice anything within the file because the file submitted has to go through it first. Once the middle-man is done, then it sends it to the S3. Drawback is that there’s wasted resources from the middle-man, not to mention there may be limitations on the middle-man to handle large files > 1GB .

As a solution, S3 has a POST method solution where you can POST the file directly to S3:

Setting up the Form tag

Let’s see how we can cross-domain (a domain other than ours) to S3. Rather than doing the following (to post to the same domain):

<form action=”#CGI.SCRIPT_NAME#” method=”post” enctype=”multipart/form-data”>

We do the following:

<form action=”http://s3.amazonaws.com/PastaVideos” method=”post” enctype=”multipart/form-data”>

Where “PastaVideos” is the name of the bucket.

The format of the object URI is as follows:

Step 3: Setting up the Other Form Fields

This is where things get interesting. In order to set up an HTML form that can upload straight to S3, there’s a set of required input fields in the form. They are as follows:


Optional Form Fields

IMPORTANT: If you add any other additional form fields, it will throw an error. If there is in fact a need to add extra form fields, which will be pasted to another server, then you must append the prefix “x-ignore-“. Let’s say for example I have three input fields I want S3 to ignore, then do as follows:

<input type="text" name="x-ignore-lastname" tabindex="2" class="textfield">
<input type="text" name="x-ignore-address1" tabindex="3" class="textfield">
<input type="text" name="x-ignore-address2" tabindex="4" class="textfield">
<input type="text" name="x-ignore-city" tabindex="5" class="textfield">

This is completely legal and will not throw errors.

Grabbing x-ignore- fields in ColdFusion

If you want to grab these form variables via ColdFusion, do something like

Form.x-ignore-lastname

Will not suffice because of the dashes. You’ll have use the bracket/quotes format:

Form[“x-ignore-lastname”]

to grab them.

Also to check for existence or set a default value,

<cfparam name=”form.x-ignore-lastname” default=”parker” />

Or

<cfparam name=”form[“x-ignore-lastname” default=”parker” />

will not work.

You’ll have to use StructKeyexists( Form, “x-ignore-termsagree” ) to check for existence.

HTML Form Example

Putting all variables together from the previous table, we get something something like as follows:

<input type="hidden" name="key" value="step1_new_submissions/9AAAAAAA-D633-0944-9FBCCCCC6CFB161B_${filename}" />  
<input type="hidden" name="acl" value="private" />
<input type="hidden" name="AWSAccessKeyId" value="0N16468ABC47JDAQ2902" />
<input type="hidden" name="policy" value="eyJleHBpcmF0aW9uIjogIjIwMTgtMTAtMjFUMDA6MDA6MDBaIiwKICAiY29uZGl0aW9ucyI6IFsgCiAgICB7ImJ1Y2tldCI6lyZWN0IjogImh0dHA6Ly90ZXN0LXBhc3RhdmlkZW9zLm1pbGxlbm5pdW13ZWIuY29tL3RoYW5rcy5jZm0ifQogIF0KfQ==" />
<input type="hidden" name="signature" value="2AAAA/BhWMg4CCCCC32fzQ=" />
<input type="hidden" name="content-type" value="video/mov" />
<input type="hidden" name="success_action_redirect" value="http://XYZ.com/thanks.cfm" />      
<!--- Ignore All This Stuff... --->
<input type="text" id="x-ignore-firstname" name="x-ignore-firstname" value="peter" />
<input type="text" name="x-ignore-lastname" value="parker" />

Using Amazon’s HTML POST Form Generator

Because setting up the above HTML for the form could be tricky, Amazon has a tool that easily generates the HTML for the above code.

The following is a screenshot of the tool. The form can be found at: http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html


So the first thing you do is:

1. Fill out the IDs:

Where AWS ID = Access Key ID   and AWS Key = Secret Access Key

2. The next thing we’ll do is fill in the POST URL:

3. The third step is the trickiest step. This is a JSON document that must adhere to the JSON spec. By default, there’s already a default boilerplate JSON document there. Let’s analyze what it means:

Let’s use one a real one from the test-pastavideos.XYZweb.com page:

You’ll notice that it has content-length-range, which checks the max size, in this case being 1 GB, and it also redirects to the index.cfm page when successful.

After you copy and paste that JSON policy, press “Add Policy”. Notice how the fields in the section “Sample Form Based on the Above Fields” has been populated.

4. The fields may look something like this:

5. Now click on Generate HTML:

Note that with the HTML code above, whatever the user uploads will be renamed to “testfile.txt” on S3. To retain the user’s filename, you have to switch it to value=”${filename}”

6. You then copy that generated HTML and paste it into your page. Add any necessary optional fields with the x-ignore-prefixes.

That should give you a basic template for uploading to S3.

NOTE: You cannot change the values of any <input> fields except except the key. More about this in the next section.

Assigning the a unique ID to the object

Keep in mind of these items when assigning a unique filename:

  • You cannot change the user’s filename in JavaScript – once the user selects a file from his computer, you cannot append a GUID because JavaScript will not let you set the value of a file textbox, you can only read.
  • You cannot change append the GUID prefix after the form is submitted, because the filename will go to S3’s server, and there’s no way to run conditional logic once it’s on S3.

To get around these limitations, you generate the GUID or rather a ColdFusion UUID, then append it to your filename. Let’s take a look at an example of the pasta video form:

First let’s show the JSON policy document:

{"expiration": "2018-10-21T00:00:00Z",
  "conditions": 
  [ 
    {"bucket": "PastaVideos"}, 
    ["starts-with", "$key", "step1_new_submissions/"],
    {"acl": "private"},
    ["starts-with", "$Content-Type", "video/mov"],
    ["content-length-range", 0, 10737774],
    {"success_action_redirect": "http://pastavideos.XYZ.com/index.cfm"}
  ]
}

And now the HTML / ColdFusion code:

<cfset Data.videoUUID  = CreateUUID() />        
<form id="videoform" action="http://www.XYZ.com/index.cfm" method="get" enctype="multipart/form-data"> 
 
<!--- Start of Amazon S3 specific variables. --->
<input type="hidden" name="key" value="#Data.key##Data.videoUUIDXX#_${filename}" />   
<input type="hidden" name="acl" value="#Data.acl#" />

In the code above, when the HTML is rendered, it will already have filename S3 will use when it’s finished uploading. Remember, this is not the value of the <input type=”file” /> box.  So the user uploads, it will look like this on the AWS Console:

Now why is the action set to http://www.XYZ.com/index.cfm and method set to get? The HTML has this set, but when the page is loaded, JavaScript immediately runs and changes action to http://s3.amazonaws.com/PastaVideos and method to post:

<cfoutput>
  <!--- Only change the form variables if JavaScript is turned on.  --->  
  $( "##videoform" ).attr( "action", "#Variables.postURL#" ); 
  $( "##videoform" ).attr( "method", "post" );     
</cfoutput>

This is so that if JavaScript is turned off, it doesn’t POST to S3.

Other Sources of Information

Amazon S3 POST Example in HTML

AWS

Helpful Resources

Test Form

Documentation Home Page

Workflow using WS3 policies

Helpful Posts

Asynchronous Upload to Amazon S3 (S3)

Last Updated: 11/29/2010

Author: Dan Romero

Table of Contents

Summary of Problem.. 2

Introducing Amazon S3. 2

Benefits: 3

Drawbacks: 3

Creating a Basic HTML that POSTs a File to Amazon. 5

Step 1 – Create a Bucket / Folder / Object: 5

Step 2: Setting up the HTML Form.. 9

Setting up the Form tag. 10

Step 3: Setting up the Other Form Fields. 11

Optional Form Fields. 13

Grabbing x-ignore- fields in ColdFusion. 13

HTML Form Example. 14

Using Amazon’s HTML POST Form Generator. 14

Assigning the a unique ID to the object. 18

Other Sources of Information. 19

Requirements

Make sure you have an Amazon S3 Account (get it from tech).

JavaScript is mandatory for this to work (to be able to POST to two different domains) upon user
submission.

Summary of Problem

This is a summary outlining the solution used in a video uploader. It entailed a form that would leverage the user of our Amazon S3 (S3) account. In addition, because the video files could be large (and to avoid CF limitations), a critical requirement was to upload to that page S3 directly. At the same time, the form field information had to be saved to our database. This meant doing a double post – one to the XYZ domain, and the other to s3.amazon.com. This cross-domain POST could only be done via AJAX.

Here’s visualization:

As you can see, once the user clicks “Submit”, there’s an AJAX HTTP POST to
XZY server to save the fields to the database, and then the JavaScript runs a
form.submit() on the current form to submit the file via POST to S3.

Introducing Amazon S3

Amazon S3 (Simple Storage Service) is a cloud service whose sole purpose is to store files. To store files into it, one can use its REST API or Admin Web Console (AWS).

Screenshot of the AWS console, with Amazon S3 tab selected.

One gets to the AWS console via an account (can get it from tech) and going to http://aws.amazon.com/s3/ and clicking “Sign in.”

While S3 has many advantages, there are a set of drawbacks as well. To summarize, here’s a list:

Benefits:

The max upload size for a file is 5GB. Plenty.

For all the times I’ve tested, uploads of different file sizes, everything has gone super smoothly – like butter – so definitely reliable.

Amazon is super scalable (as you may already know), so parallel uploading from one or many users is really no problem.

Would not affect performance of our servers – there could be many uploads, and they would go fast, without slowing down any other web sites on servers.

The speed of the upload is limited to the user’s computer’s specs and internet provider – much faster than our servers.

Files can be made secure and unreadable, not just through obscurity – this is sometimes tricky to implement in ColdFusion.

Drawbacks:

To summarize, the reason for some of the drawbacks, is because it’s doing a POST request directly from one domain (us) to another (s3.amazonaws.com). It’s not being channeled through our CF servers.

There are two ways to interact with S3: the REST API, and doing a direct POST. With the REST API, the upload data has to be channeled through a server first before sending to Amazon – this was not what we were looking for, since our servers have issues with large files. So we looked into removing ourselves as the middleman and sending the data directly to S3 – via POST.

Here are the drawbacks, mainly three:

If S3 detected an error in the upload, e.g. if the file is too large, there’s no default error page, just a redirect to an XML document hosted on s3.amazonaws.com. There’s no way to set an error page – it’s on Amazon’s to-do list for future release. One can’t even customize the look and feel of the XML document you’re redirected to. Side note: if the upload was successful, it gets redirected to a page you specify (at least there’s some control here).

Progress bar reusable code is scare. There’s tons of code out there to do this, however, I could not find one that could cross-domain post. With traditional AJAX, you’re only allowed to do a POST/GET if the URL you’re using is the same domain as the caller page. One could get the code for a progress bar plugin (as there are tons out there) and rewrite it to do a POST and work with Amazon S3 – but that would take a considerate amount of work.

Lack of documentation. There’s not enough documentation for handling POST requests in the official Amazon Developer docs, which makes troubleshooting difficult. Doing POST submits is a relatively new feature of Amazon S3, compared to the rest of their APIs.

So the largest hurdle is to code functionality to get around the error page, since some JavaScript magic has to be put in place. That would be another day or so of work just for that, I believe. I already have some code in place that I put together while testing. If we left it as-is, when the user uploads, and if there was an error, the user would see something like this:

Which would, of course, be nonsensical. If the file was too large, they would see a message that the file was too large within the message tags. The user would then have to hit back to return to the form.

We can try another way, probably the easiest. When the user hits submits, it starts showing the animated spinner as it’s uploading. Also, we can tell the user that if he encounters an error page, just hit the back button. Also, keep in mind that there’ll be validation in place before the upload to check for file extension, at the very least. The only edge case to seeing that XML error message is if the file they submitted is over the limit *AND*  they have JavaScript turned off (that overrides the JavaScript file extension validation).

Creating a Basic HTML that POSTs a File to Amazon

Step 1 – Create a Bucket / Folder / Object:
The first thing we need to do a is create a Bucket on S3. To do this the easy way, go to the AWS: https://console.aws.amazon.com/s3/home  and create a bucket:

Buckets are where you store your objects (i.e. your files of any format). You can create a folder for further organization:

As you can see here, there are 4 folders here. We can double-click on step1_new_submissions folder and see the objects that are contained within this:

You can right-click on one of those objects (files) and click “Properties”:

Notice that a Properties panel will expand below. To the right, you have three tabs: Details, Permissions, Metadata.

If you click on the Permissions Tab you’ll notice that by default the file that was selected has following permissions set by user USERNAME:

Go back to the details tab and click on the Link:

You’ll notice that you’ll be taken to an XML document in your browser that has the following:

It’s because you have not let it public access. To give it public access, you click on the Permissions tab again, and click “Add more permissions” , set the Grantee to “Everyone” and choose Open/Download and Save.

You can also set the make multiple objects public. Select an object, hold the SHIFT key, then select the other object to select the objects in between. Select “Make Public”:

You can also upload an object manually via the “Upload” button:

Then click “Add more files”

As the file starts uploading, you’ll see the bottom panel show details about the file transfer:

Step 2: Setting up the HTML Form

The S3 REST API is very flexible, as long as you execute the proper method from your application, while at the same time sending the file over from your server to S3 (via a REST method with the correct URI). Traditionally, it would look like this:

Notice how there’s a middle-man that serves as the captor of the data submitted by the form and the file. Then, it sends it long to S3. The middle-man here is crucial. Double the total bandwidth is spent here – the bandwidth to go from the user’s machine to the web server (in this case CF), and then the bandwidth spent transferring the file to S3.

The advantage to this layout is that because the web server acts as a middle-man server, it can modify the data, change its filename, and slice-and-dice anything within the file because the file submitted has to go through it first. Once the middle-man is done, then it sends it to the S3. Drawback is that there’s wasted resources from the middle-man, not to mention there may be limitations on the middle-man to handle large files > 1GB .

As a solution, S3 has a POST method solution where you can POST the file directly to S3:

Setting up the Form tag

Let’s see how we can cross-domain (a domain other than ours) to S3. Rather than doing the following (to post to the same domain):

<form action=”#CGI.SCRIPT_NAME#” method=”post” enctype=”multipart/form-data”>

We do the following:

<form action=”http://s3.amazonaws.com/PastaVideos” method=”post” enctype=”multipart/form-data”>

Where “PastaVideos” is the name of the bucket.

The format of the object URI is as follows:

Step 3: Setting up the Other Form Fields

This is where things get interesting. In order to set up an HTML form that can upload straight to S3, there’s a set of required input fields in the form. They are as follows:

Tag
Type
Name
Value

input
hidden
key
The location of the object you’ll be uploading. You can consider it as the concatenation of the folder(s) and the filename (don’t use the bucket name):

step1_new_submissions/4059C3_${filename}

The ${filename} is the original filename of the file the user is uploading. The “4059C3_” is a made up ID that is concatenated to the ${filename} that will add uniqueness to the objects in the bucket, if multiple people are uploading to it.

input
hidden
acl
The access control list. Can be set to:

private – Lets the public user upload a file, but not be able to access it once he uploads. To make it accessible, one has to go into the AWS console and change the rights.

public-read – Lets the public user see it after he has uploaded it or anyone else see it.

input
hidden
AWSAccessKeyId
To get this key id, you need to access it by going to:

Then security credentials:

Get the Access Key ID. This id is also called the AWSAccessKeyId.

You should also grab the “Secret Access Key” by clicking on the “Show” on the adjacent column:

Keep the Secret Access Key private! Only the Access Key ID can be made public.

input
hidden
policy
Policy is a Base64 encoded JSON document that outlines the privileges and details of the files being uploaded. More details about this in the next section.

input
hidden
signature
The signature is the policy, HMAC-encrypted using the Secret Access Key.

input
hidden
content-type
The content type is the what kind of mime content the file that will pass through the form will be.

input
hidden
success_action_redirect
This is the URL of where to go when the upload succeeds. It could be any URL.  Also, when redirected, it will add the three additional URL variables:

bucket=PastaVideos

key=step1_new_submissions%2A192DCAE1-D625-0944-9FBCCD5C6CCB161B_ajax5-loader.mov

etag=%22356060aa56ce8955d38ed8c58661497a%22

Optional Form Fields

IMPORTANT: If you add any other additional form fields, it will throw an error. If there is in fact a need to add extra form fields, which will be pasted to another server, then you must append the prefix “x-ignore-“. Let’s say for example I have three input fields I want S3 to ignore, then do as follows:

<input type=”text” name=”x-ignore-lastname” tabindex=”2″ class=”textfield”>

<input type=”text” name=”x-ignore-address1″ tabindex=”3″ class=”textfield”>

<input type=”text” name=”x-ignore-address2″ tabindex=”4″ class=”textfield”>

<input type=”text” name=”x-ignore-city” tabindex=”5″ class=”textfield”>

This is completely legal and will not throw errors.

Grabbing x-ignore- fields in ColdFusion

If you want to grab these form variables via ColdFusion, do something like

Form.x-ignore-lastname

Will not suffice because of the dashes. You’ll have use the bracket/quotes format:

Form[“x-ignore-lastname”]

to grab them.

Also to check for existence or set a default value,

<cfparam name=”form.x-ignore-lastname” default=”parker” />

Or

<cfparam name=”form[“x-ignore-lastname” default=”parker” />

will not work.

You’ll have to use StructKeyexists( Form, "x-ignore-termsagree" ) to check for existence.

HTML Form Example

Putting all variables together from the previous table, we get something something like as follows:

<input type="hidden" name="key" value="step1_new_submissions/9AAAAAAA-D633-0944-9FBCCCCC6CFB161B_${filename}" />

<input type="hidden" name="acl" value="private" />

<input type="hidden" name="AWSAccessKeyId" value="0N16468ABC47JDAQ2902" />

<input type="hidden" name="policy" value="eyJleHBpcmF0aW9uIjogIjIwMTgtMTAtMjFUMDA6MDA6MDBaIiwKICAiY29uZGl0aW9ucyI6IFsgCiAgICB7ImJ1Y2tldCI6lyZWN0IjogImh0dHA6Ly90ZXN0LXBhc3RhdmlkZW9zLm1pbGxlbm5pdW13ZWIuY29tL3RoYW5rcy5jZm0ifQogIF0KfQ==" />

<input type="hidden" name="signature" value="2AAAA/BhWMg4CCCCC32fzQ=" />

<input type="hidden" name="content-type" value="video/mov" />

<input type="hidden" name="success_action_redirect" value="http://XYZ.com/thanks.cfm" />

<!--- Ignore All This Stuff... --->
<input type="text" id="x-ignore-firstname" name="x-ignore-firstname" value="peter" />

<input type="text" name="x-ignore-lastname" value="parker" />

Using Amazon’s HTML POST Form Generator

Because setting up the above HTML for the form could be tricky, Amazon has a tool that easily generates the HTML for the above code.

The following is a screenshot of the tool. The form can be found at: http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html

So the first thing you do is:

1.       Fill out the IDs:

Where AWS ID = Access Key ID   and AWS Key = Secret Access Key

2.       The next thing we’ll do is fill in the POST URL:

3.       The third step is the trickiest step. This is a JSON document that must adhere to the JSON spec. By default, there’s already a default boilerplate JSON document there. Let’s analyze what it means:

Let’s use one a real one from the test-pastavideos.XYZweb.com page:

You’ll notice that it has content-length-range, which checks the max size, in this case being 1 GB, and it also redirects to the index.cfm page when successful.

After you copy and paste that JSON policy, press “Add Policy”. Notice how the fields in the section “Sample Form Based on the Above Fields” has been populated.

4.       The fields may look something like this:

5.       Now click on Generate HTML:

Note that with the HTML code above, whatever the user uploads will be renamed to “testfile.txt” on S3. To retain the user’s filename, you have to switch it to value=”${filename}”

6.       You then copy that generated HTML and paste it into your page. Add any necessary optional fields with the x-ignore-prefixes.

That should give you a basic template for uploading to S3.

NOTE: You cannot change the values of any <input> fields except except the key. More about this in the next section.

Assigning the a unique ID to the object

Keep in mind of these items when assigning a unique filename:

You cannot change the user’s filename in JavaScript – once the user selects a file from his computer, you cannot append a GUID because JavaScript will not let you set the value of a file textbox, you can only read.

You cannot change append the GUID prefix after the form is submitted, because the filename will go to S3’s server, and there’s no way to run conditional logic once it’s on S3.

To get around these limitations, you generate the GUID or rather a ColdFusion UUID, then append it to your filename. Let’s take a look at an example of the pasta video form:

First let’s show the JSON policy document:

{"expiration": "2018-10-21T00:00:00Z",

"conditions": [

{"bucket": "PastaVideos"},

["starts-with", "$key", "step1_new_submissions/"],

{“acl”: “private”},

["starts-with", "$Content-Type", "video/mov"],

["content-length-range", 0, 10737774],

{“success_action_redirect”: “http://pastavideos.XYZ.com/index.cfm”}

]

}

And now the HTML / ColdFusion code:

<cfset Data.videoUUID  = CreateUUID() />

<form id=”videoform” action=”http://www.XYZ.com/index.cfm” method=”get” enctype=”multipart/form-data”>

<!— Start of Amazon S3 specific variables. —>

<input type=”hidden” name=”key”

value=”#Data.key## Data.videoUUIDXX#_${filename}” />

<input type=”hidden” name=”acl” value=”#Data.acl#” />

In the code above, when the HTML is rendered, it will already have filename S3 will use when it’s finished uploading. Remember, this is not the value of the <input type=”file” /> box.  So the user uploads, it will look like this on the AWS Console:

Now why is the action set to http://www.XYZ.com/index.cfm and method set to get? The HTML has this set, but when the page is loaded, JavaScript immediately runs and changes action to http://s3.amazonaws.com/PastaVideos and method to post:

<cfoutput>

<!— Only change the form variables if JavaScript is turned on.  —>

$( “##videoform” ).attr( “action”, “#Variables.postURL#” );

$( “##videoform” ).attr( “method”, “post” );

</cfoutput>

This is so that if JavaScript is turned off, it doesn’t POST to S3.

Other Sources of Information

Amazon S3 POST Example in HTML

http://aws.amazon.com/code/Amazon%20S3/1093?_encoding=UTF8&jiveRedirect=1

AWS

http://aws.amazon.com/developertools/

Helpful Resources

http://wiki.smartfrog.org/wiki/display/sf/Amazon+S3

http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?AccessPolicyLanguage_UseCases_s3_a.html

http://docs.amazonwebservices.com/AmazonS3/latest/gsg/

http://docs.amazonwebservices.com/AmazonS3/2006-03-01/dev/index.html?UsingHTTPPOST.html

http://docs.amazonwebservices.com/AmazonS3/2006-03-01/dev/index.html?HTTPPOSTExamples.html

http://docs.amazonwebservices.com/AmazonS3/2006-03-01/dev/index.html?Introduction.html#S3_ACLs

Test Form

http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html

Documentation Home Page

http://aws.amazon.com/documentation/s3/

Workflow using WS3 policies

http://docs.amazonwebservices.com/AmazonS3/latest/dev/

Helpful Posts

http://aws.amazon.com/articles/1434?_encoding=UTF8&jiveRedirect=1

https://forums.aws.amazon.com/message.jspa?messageID=89017

https://forums.aws.amazon.com/message.jspa?messageID=88188

Python and SQL Server

Posted by Dan | Posted in Python, SQL Server | Posted on 12-27-2009

0

Setting up Python to connect to SQL Server was relatively easy. First, you select a DB API driver. I chose pyodbc because I saw a Python article on Simple-Talk. There are two simple steps:

  1. Install Pywin32. Get the latest. It’s a dependency for pyodbc.
  2. Install pyodbc. Get it for the version of Python you’re using.

Once you’ve done this, you can query your SQL Server db as so:

import pyodbc
 
connection = pyodbc.connect('DRIVER={SQL Server};SERVER=192.168.0.5;DATABASE=MyAwesomeDB;UID=sa;PWD=password')
cursor = connection.cursor()
 
cursor.execute("select * from states")
 
for row in cursor:
  print row.StateID, row.Abbreviation, row.Name

For more snippets and a tutorial, check out the documentation.

Now let’s try something more interesting. Let’s try doing some inserts and see how long it takes.

import win32api
import uuid
import pyodbc 
 
connection = pyodbc.connect('DRIVER={SQL Server};SERVER=192.168.0.5;DATABASE=MrSkittles;UID=sa;PWD=password')
cursor = connection.cursor()
 
_start = win32api.GetTickCount()
 
for i in range( 0, 10000 ):  
  # Let's insert two pieces of data, both random UUIDs. 
  sql = "INSERT INTO Manager VALUES( '" + str( uuid.uuid4() ) + "', '" + str( uuid.uuid4() ) + "' )"  
  cursor.execute( sql )
  connection.commit()
 
_end = win32api.GetTickCount()
_total = _end - _start
 
print "\n\nProcess took", _total * .001, "seconds"

After some tests, 10,000 records took roughly 20-30 seconds. 1,000,000 records took 30 to 40 minutes. A bit slow, but it’s not a server machine. My machine is a Core Duo, 1.8Ghz x 2, at ~4GB with PAE on WindowsXP, but I ran this on a VMware VM with 1GB and SQL Server 2005 w/Windows Server 2003. The table was a two column table both varchar(50). On a server machine, it should be a helluva lot faster.

IIS Logs Scripts

Posted by Dan | Posted in IIS, Python, SQL Server | Posted on 12-12-2009

0

While working with some IIS logs, I decided to start practicing my Python. I put together some handy Python functions to work with IIS Log files. These will come in handy. On a 3GB, 2.5GHz, running WinXP machine, these functions take about 3 seconds to process a 180MB Text file. Python code could be optimized to be faster if you’re dealing with larger sized files.

#!/usr/bin/env python
 
# An IIS log file can have various log properties. Everytime you add new columns to log for
# in IIS, it creates a new row full of columns.
import re
import os
 
MainLogDelimiter = "#Software: Microsoft Internet Information Services 6.0"
TestFile         = "C:\\Dan\\IIS-Log-Import\\Logs\\not-the-same.txt"
BigTestFile      = "C:\\Dan\\IIS-Log-Import\\Logs\\ex090914\\ex090914.log"
LogsDir          = "C:\\Dan\\IIS-Log-Import\\Logs"
 
def SearchForFile( rootpath, searchfor, includepath = 0 ):
 
  # Search for a file recursively from a root directory.
  #  rootpath  = root directory to start searching from.
  #  searchfor = regexp to search for, e.g.:
  #                 search for *.jpg : \.exe$                     
  #  includepath = appends the full path to the file
  #                this attribute is optional
  # Returns a list of filenames that can be used to loop
  # through.
  #
  # TODO: Use the glob module instead. Could be faster.  
  names = []
  append = ""
  for root, dirs, files in os.walk( rootpath ): 
    for name in files:
      if re.search( searchfor, name ):
        if includepath == 0:
          root = ""          
        else:          
          append = "\\"
        names.append( root + append + name )        
  return names  
 
 
def isSameLogProperties( FILE ):
  # Tests to see if a log file has the same number of columns throughout
  # This is in case new column properties were added/subtracted in the course
  # of the log file.
  FILE.seek( 0, 0 )
  SubLogs = FILE.read().split( MainLogDelimiter )
 
  # SubLogs[0] Stores the number of different log variations in the log file  
  SubLogs[0] = len( SubLogs ) - 1    
 
  # Grab the column names from the log file, separated by space
  columns = re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[1], re.IGNORECASE | re.MULTILINE ).group(1)   
  LogSameProperties = True
 
  for i in range( 2, SubLogs[0] + 1 ):
    # If there are columns
    if ( len( columns ) > 0 ):    
      if ( columns != re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[i], re.IGNORECASE | re.MULTILINE ).group(1) ):        
        LogSameProperties = False
        break  
 
  return LogSameProperties
 
 
def getFirstColumn( FILE ):
  # This gets the columns from a log file. It returns only the first columns, and ignores another column
  # row that may exist in case new columns were added/subtracted in IIS. 
  # input: FILE
  # output: 1 single element List
  FILE.seek( 0, 0 )
  names = []
  # Grab the column names from the log file, separated by space
  names.append( re.search( "^#Fields:\s([\w\-()\s]+)$", FILE.read().split( MainLogDelimiter )[1], re.IGNORECASE | re.MULTILINE ).group(1).strip() )
  return names
 
 
def getAllColumns( FILE ):
  # This gets all the columns from a log file. 
  # input: FILE
  # output: List
  FILE.seek( 0, 0 )  
  names = []
  SubLogs = FILE.read().split( MainLogDelimiter )    
  # SubLogs[0] Stores the number of different log variations in the log file  
  SubLogs[0] = len( SubLogs ) - 1        
  for i in range( 1, SubLogs[0] + 1 ):        
    names.append( re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[i], re.IGNORECASE | re.MULTILINE ).group(1).strip() )  
  return names  
 
 
# EXAMPLE:
# Loop through all the IIS log files in the directory
# for file in SearchForFile( LogsDir, "\.txt$", 1 ):  
LogFile = open( file, "r" )
if ( isSameLogProperties( LogFile ) ):
  print file, "the same"
else:
  print file, "not the same"
LogFile.close()

Bulk Import Ignoring Identity Column

Posted by Dan | Posted in Databases, SQL Server | Posted on 11-25-2009

0

Ever have to bulk import a text file (e.g. from Excel, or tabular delimited rows) into a table that had the first column be an identity auto-incrementing primary key? Yes, you could create a format file that skips the identity column, so that the first column of your text file doesn’t go into the identity column of your table. This MSDN page shows more about it.

The quick way though, is to create a view of that table and omit the identity column when you create the view. In this manner, your first column in the text file won’t map to the identity column and throw one of those delicious BCP errors.

Import MySQL Data into SQL Server

Posted by Dan | Posted in Databases, MySQL, SQL Server | Posted on 10-05-2009

0

Today I needed to analyze some forum data from vBulletin running MySQL. The table on MySQL had 60,000 records. Because my playing field is SQL Server and not MySQL, and I needed to slice and dice the data, I needed a way to get the data onto SQL Server. Because of some security restrictions, I could not set up a linked server on SQL Server. I don’t have remote access to the Linux box either. I tried exporting from SQLYog, but CSV data could not be properly delimited and failed when I did a database import via the SSIS import wizard (the table has a lot of flexability to use any character and is often abused by spammers). What did I do?

I only had 4 columns to import for the table. So I ran a select statement returning one column ordered by the id. Then I copied and pasted into an Excel spreadsheet. I did this for all four rows. Because Excel doesn’t use delimiters, but rather cells to separate, I didn’t have to worry about data breaking. Then after that, I did an import via the SSIS import wizard. Ta-da, I can now slice and dice my data. There are probably more efficient ways to do this, but I needed a quick solution and this did it.