SQLCompare via Command Console

Posted by Dan | Posted in Databases, SQL Server | Posted on 01-24-2010

0

If you haven’t heard of SQLCompare by RedGate, you’re missing out. It’s an amazing product. In summary, you can do the following (which I use it for 99% of the time):

  • Compare Schema / TSQL code from Different Databases
  • Syc Schema / TSQL code across two databases
  • Generate TSQL code from your comparison so you can use it for deployments
  • Generate reports on SQL Changes

Most of those things can be done via the GUI version of the tool. The product also comes with a command line version, which you can use it in your build script.

To script out your entire database via the command console, you can do the following:

SQLCompare /force /database1:YOURDATABASENAME /username1:sa /password1:password /server1:YOURDATABASESERVERNAME /makescripts:c:\x

To compare two databases and generate a report via the command console, you can do the following. The reason there’s so many switches is because you need to enter the database name and credentials for the two databases.

SQLCompare /force /database1:DB1NAME /username1:sa /password1:password /server1:SERVER1NAME /database2:DB2NAME /username2:sa /password2:password /server2:SERVER2NAME /report:c:\report.html /reporttype:Interactive

Logical vs Physical ER Diagrams

Posted by Dan | Posted in Databases, MySQL, SQL Server, SQLite | Posted on 01-24-2010

0

Logical diagrams are to convey requirements only. Physical diagrams represent the actual data structure to support the requirements and take into account technical scalability and speed.

Edit: I hate the way I had to format this document for this blog post. If you want this tutorial better formatted, check out the Word document.

One-to-Many Relationship


Logical

On ER/Studio, two tables are created. The Store table has a primary key StoreID. The Manager table has a primary key, ManagerID and a foreign key, StoredID (which is mapped to StoreID from the Store table).

Physical

On SQL Server, two tables are created. The Store table has a primary key StoreID. The Manager table has a primary key, ManagerID and a foreign key, StoredID (which is mapped to StoreID from the Store table).

Manager Table:

If you allow NULLs for StoreID in the Manager table, then you’ll be able to have a Manager without a store. If you don’t allow NULLs (leave it unchecked), then you’ll have to have at least one Store assigned to a Manager.

Querying

The above states that one store can have many managers. Here’s some sample data what’s in the tables:

SELECT * FROM Manager

SELECT * FROM Store

Get all manager information with for all managers that belong to a store:

SELECT  Manager.ManagerID,  Manager.FirstName,  Manager.LastName,  Manager.StoreID,
        Store.[Name],  Store.Address,  Store.STATE,  Store.City,  Store.Zip
FROM    Manager
        INNER JOIN Store ON Manager.StoreID = Store.StoreID


Get all manager information with for all managers (even if they don’t have a store):

SELECT  Manager.ManagerID,  Manager.FirstName,  Manager.LastName,  Manager.StoreID,
        Store.[Name],  Store.Address,  Store.STATE,  Store.City,  Store.Zip
FROM    Manager
        LEFT OUTER JOIN Store ON Manager.StoreID = Store.StoreID


Notice the NULL for Steamboat Willie. He doesn’t have a store, so all Store related fields show as NULL.


Many-to-Many Relationship


In order to implement this physically, you need a join table. In this case, we use StoreManager. Logically, you only need only two entities (Store and Manager).

Logical

Physical


On SQL Server, three tables are created. The Store table has a primary key StoreID. The Manager table has a primary key, ManagerID. The table StoreManager has two foreign keys:  StoredID (which is mapped to StoreID from the Store table) and ManagerID (which is mapped to the ManagerID from the Manager table).

StoreManager Table:

If you allow NULLs for StoreID and ManagerID in the StoreManager table, then you’ll be able to have a Manager without a store. If you don’t allow NULLs (leave it unchecked for both), then you’ll have to have at least one Store assigned to a Manager.

Here’s some sample data what’s in the tables:

SELECT * FROM Manager


SELECT * FROM Store


SELECT * FROM StoreManager


Get all manager information associated with his store:

SELECT  Manager.ManagerID,  Manager.FirstName,  Manager.LastName,
        StoreManager.StoreID, StoreManager.ManagerID, Store.StoreID,
        Store.[Name],Store.Address, Store.STATE, Store.City, Store.Zip
FROM    Store
        INNER JOIN StoreManager ON Store.StoreID = StoreManager.StoreID
        INNER JOIN Manager ON StoreManager.ManagerID = Manager.ManagerID

Generate DateTime based on Time Offset

Posted by Dan | Posted in SQL Server | Posted on 01-17-2010

0

So let’s say you have a table with a column of type DateTime. Now you have to support timezones. The first thing you create is a table of time offsets (Google Time Offsets). The table would look like this:

Here’s schema for it. You can download the full script with the data as well.

CREATE TABLE [dbo].[TimeZones](
	[TimeZoneID] [INT] IDENTITY(1,1) NOT NULL,
	[Offset] [VARCHAR](15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[TimeLabel] [VARCHAR](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 CONSTRAINT [PK_TimeZones] PRIMARY KEY CLUSTERED 
(
	[TimeZoneID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

Now for the purpose of this tutorial, let’s create an event table. This will have a set of dates/times which we’ll use to add the timeoffsets to. The table looks like this:

and here’s the script:

CREATE TABLE [dbo].[TimeEvent](
	[TimeEventID] [INT] IDENTITY(1,1) NOT NULL,
	[EventName] [VARCHAR](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[EventDateTime] [DATETIME] NULL,
	[TimeZoneID] [INT] NULL,
 CONSTRAINT [PK_TimeEvent] PRIMARY KEY CLUSTERED 
(
	[TimeEventID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

Now let’s INNER JOIN the tables properly adding the timeoffset to the event datetime:

SELECT  tz.TimeZoneID,
        tz.Offset,        
        te.TimeEventID,
        te.EventName,
        te.EventDateTime AS 'OriginalEventDateTime',
 
        -- Hour - Extract Hour from Offset
        SUBSTRING(Offset, 0, CHARINDEX(':', Offset, 0)) AS 'Hour',
 
        -- Minute - Extract Minute from Offset        
        SUBSTRING(Offset, CHARINDEX(':', Offset, 0) + 1, LEN(Offset)) AS 'Minute',
 
        -- New Date with UTC Addition
        DATEADD(HOUR,
                CAST(SUBSTRING(Offset, 0, CHARINDEX(':', Offset, 0)) AS INT),
                DATEADD(MINUTE,
                        CAST(SUBSTRING(Offset, CHARINDEX(':', Offset, 0) + 1,
                                       LEN(Offset)) AS INT), EventDateTime -- DateTime change via Offset
                         )) AS 'EventDateTimeWithTimeZone'
FROM    TimeZones tz
        INNER JOIN TimeEvent te ON tz.TimeZoneID = te.TimeZoneID

Here’s the result:

Script Out Table Data in SQL Server 2008

Posted by Dan | Posted in Databases, SQL Server | Posted on 01-15-2010

0

SQL Server 2008 has a new feature where you can script out data in tables via INSERT statements. Just do the following:


1. Right click on the database and select "Generate Scripts..."


2. Select the database.


3. This is the hidden part. Set "Script Data" to true.


4. Select Tables.


5. Select Tables.


6. You can script out a table script per file or as a single file. I prefer ANSI text.


7. Hit Finish.



Add a Covered Index to tables to get rid of Bookmark Lookups

Posted by Dan | Posted in SQL Server | Posted on 01-11-2010

0

Add a Covered Index to tables to get rid of “bookmark lookups.” The idea of bookmarks: SQL Server uses Bookmarks when you use columns in your query that are not non-clustered/clustered indexes. So if we have something like this:

SELECT  AreaID,
        ContentID,
        ApprovalCodeID,
        isSubmissionPoint
 
FROM    ContentAreas
 
WHERE   AreaID = 31
        AND ContentID > 30000
        AND ModifierID > 65000

And let’s say that AreaID is a clustered index and ContentID is a non-clustered index. If ModifierID is not an index of any kind, it would function as a “bookmark lookup” in this query. To not use a “bookmark lookup,” use a Covered Index, which clumping the Modifier column with ContentID and turning both into a Covered Index (a kind of composite index).

Order of Indexing Matters

Posted by Dan | Posted in SQL Server | Posted on 01-11-2010

0

In SQL Server, the order that you declare Indexes matters! So take the following example:

And we run this query:

SELECT  CAST(ca.AreaID AS VARCHAR(4)),
        RemovalDate
FROM    ContentAreas ca WITH ( NOLOCK )
WHERE   ContentID = 232232
        AND removerID = 34343

If you are to run the following query, since you are using ContentID as part of the WHERE clause, you want to put the ContentID’s index at the top of the indexes/key in SSMS. It speeds up retrieval.  Also put RemoverID below that – test with viewing the execution plan first to see if it makes a difference. Remember, the more indexes you add, the more it takes to update the index.

Python and SQL Server

Posted by Dan | Posted in Python, SQL Server | Posted on 12-27-2009

0

Setting up Python to connect to SQL Server was relatively easy. First, you select a DB API driver. I chose pyodbc because I saw a Python article on Simple-Talk. There are two simple steps:

  1. Install Pywin32. Get the latest. It’s a dependency for pyodbc.
  2. Install pyodbc. Get it for the version of Python you’re using.

Once you’ve done this, you can query your SQL Server db as so:

import pyodbc
 
connection = pyodbc.connect('DRIVER={SQL Server};SERVER=192.168.0.5;DATABASE=MyAwesomeDB;UID=sa;PWD=password')
cursor = connection.cursor()
 
cursor.execute("select * from states")
 
for row in cursor:
  print row.StateID, row.Abbreviation, row.Name

For more snippets and a tutorial, check out the documentation.

Now let’s try something more interesting. Let’s try doing some inserts and see how long it takes.

import win32api
import uuid
import pyodbc 
 
connection = pyodbc.connect('DRIVER={SQL Server};SERVER=192.168.0.5;DATABASE=MrSkittles;UID=sa;PWD=password')
cursor = connection.cursor()
 
_start = win32api.GetTickCount()
 
for i in range( 0, 10000 ):  
  # Let's insert two pieces of data, both random UUIDs. 
  sql = "INSERT INTO Manager VALUES( '" + str( uuid.uuid4() ) + "', '" + str( uuid.uuid4() ) + "' )"  
  cursor.execute( sql )
  connection.commit()
 
_end = win32api.GetTickCount()
_total = _end - _start
 
print "\n\nProcess took", _total * .001, "seconds"

After some tests, 10,000 records took roughly 20-30 seconds. 1,000,000 records took 30 to 40 minutes. A bit slow, but it’s not a server machine. My machine is a Core Duo, 1.8Ghz x 2, at ~4GB with PAE on WindowsXP, but I ran this on a VMware VM with 1GB and SQL Server 2005 w/Windows Server 2003. The table was a two column table both varchar(50). On a server machine, it should be a helluva lot faster.

IIS Logs Scripts

Posted by Dan | Posted in IIS, Python, SQL Server | Posted on 12-12-2009

0

While working with some IIS logs, I decided to start practicing my Python. I put together some handy Python functions to work with IIS Log files. These will come in handy. On a 3GB, 2.5GHz, running WinXP machine, these functions take about 3 seconds to process a 180MB Text file. Python code could be optimized to be faster if you’re dealing with larger sized files.

#!/usr/bin/env python
 
# An IIS log file can have various log properties. Everytime you add new columns to log for
# in IIS, it creates a new row full of columns.
import re
import os
 
MainLogDelimiter = "#Software: Microsoft Internet Information Services 6.0"
TestFile         = "C:\\Dan\\IIS-Log-Import\\Logs\\not-the-same.txt"
BigTestFile      = "C:\\Dan\\IIS-Log-Import\\Logs\\ex090914\\ex090914.log"
LogsDir          = "C:\\Dan\\IIS-Log-Import\\Logs"
 
def SearchForFile( rootpath, searchfor, includepath = 0 ):
 
  # Search for a file recursively from a root directory.
  #  rootpath  = root directory to start searching from.
  #  searchfor = regexp to search for, e.g.:
  #                 search for *.jpg : \.exe$                     
  #  includepath = appends the full path to the file
  #                this attribute is optional
  # Returns a list of filenames that can be used to loop
  # through.
  #
  # TODO: Use the glob module instead. Could be faster.  
  names = []
  append = ""
  for root, dirs, files in os.walk( rootpath ): 
    for name in files:
      if re.search( searchfor, name ):
        if includepath == 0:
          root = ""          
        else:          
          append = "\\"
        names.append( root + append + name )        
  return names  
 
 
def isSameLogProperties( FILE ):
  # Tests to see if a log file has the same number of columns throughout
  # This is in case new column properties were added/subtracted in the course
  # of the log file.
  FILE.seek( 0, 0 )
  SubLogs = FILE.read().split( MainLogDelimiter )
 
  # SubLogs[0] Stores the number of different log variations in the log file  
  SubLogs[0] = len( SubLogs ) - 1    
 
  # Grab the column names from the log file, separated by space
  columns = re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[1], re.IGNORECASE | re.MULTILINE ).group(1)   
  LogSameProperties = True
 
  for i in range( 2, SubLogs[0] + 1 ):
    # If there are columns
    if ( len( columns ) > 0 ):    
      if ( columns != re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[i], re.IGNORECASE | re.MULTILINE ).group(1) ):        
        LogSameProperties = False
        break  
 
  return LogSameProperties
 
 
def getFirstColumn( FILE ):
  # This gets the columns from a log file. It returns only the first columns, and ignores another column
  # row that may exist in case new columns were added/subtracted in IIS. 
  # input: FILE
  # output: 1 single element List
  FILE.seek( 0, 0 )
  names = []
  # Grab the column names from the log file, separated by space
  names.append( re.search( "^#Fields:\s([\w\-()\s]+)$", FILE.read().split( MainLogDelimiter )[1], re.IGNORECASE | re.MULTILINE ).group(1).strip() )
  return names
 
 
def getAllColumns( FILE ):
  # This gets all the columns from a log file. 
  # input: FILE
  # output: List
  FILE.seek( 0, 0 )  
  names = []
  SubLogs = FILE.read().split( MainLogDelimiter )    
  # SubLogs[0] Stores the number of different log variations in the log file  
  SubLogs[0] = len( SubLogs ) - 1        
  for i in range( 1, SubLogs[0] + 1 ):        
    names.append( re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[i], re.IGNORECASE | re.MULTILINE ).group(1).strip() )  
  return names  
 
 
# EXAMPLE:
# Loop through all the IIS log files in the directory
# for file in SearchForFile( LogsDir, "\.txt$", 1 ):  
LogFile = open( file, "r" )
if ( isSameLogProperties( LogFile ) ):
  print file, "the same"
else:
  print file, "not the same"
LogFile.close()

Bulk Import Ignoring Identity Column

Posted by Dan | Posted in Databases, SQL Server | Posted on 11-25-2009

0

Ever have to bulk import a text file (e.g. from Excel, or tabular delimited rows) into a table that had the first column be an identity auto-incrementing primary key? Yes, you could create a format file that skips the identity column, so that the first column of your text file doesn’t go into the identity column of your table. This MSDN page shows more about it.

The quick way though, is to create a view of that table and omit the identity column when you create the view. In this manner, your first column in the text file won’t map to the identity column and throw one of those delicious BCP errors.

Import MySQL Data into SQL Server

Posted by Dan | Posted in Databases, MySQL, SQL Server | Posted on 10-05-2009

0

Today I needed to analyze some forum data from vBulletin running MySQL. The table on MySQL had 60,000 records. Because my playing field is SQL Server and not MySQL, and I needed to slice and dice the data, I needed a way to get the data onto SQL Server. Because of some security restrictions, I could not set up a linked server on SQL Server. I don’t have remote access to the Linux box either. I tried exporting from SQLYog, but CSV data could not be properly delimited and failed when I did a database import via the SSIS import wizard (the table has a lot of flexability to use any character and is often abused by spammers). What did I do?

I only had 4 columns to import for the table. So I ran a select statement returning one column ordered by the id. Then I copied and pasted into an Excel spreadsheet. I did this for all four rows. Because Excel doesn’t use delimiters, but rather cells to separate, I didn’t have to worry about data breaking. Then after that, I did an import via the SSIS import wizard. Ta-da, I can now slice and dice my data. There are probably more efficient ways to do this, but I needed a quick solution and this did it.