Welcome to The Next Byte.
The Gentle Care and Feeding of Log files:
- What are log files and why do we care?
- How are log files configured?
- What else do we need to know about log files?
In the beginning, SQL existed over all the realms of information. SQL saw the need to store and retrieve that information.
So SQL created a file and called it Data.
But SQL noticed how, after days and giga-CPU cyles, Data was empty and unfilled.
Data was lonely. All information was still unstored.
SQL said “I will cure this emptiness.” So SQL created a second file and called it Log. SQL joined Log to Data.
SQL blessed this union and named it Database. SQL then told Data to be fruitful and multiply. SQL enabled Log to fill Data with all manners of content; with char and varchar, with text, decimal, numeric, with int and bigint. Content multiplied in both type and amount even unto Booleans and BLOBs. Data was happy, and SQL was pleased.
Purpose of log files
All SQL Server databases have a minimum of two files:
- A data file which is the ultimate repository of all data posted into the data base. This is also generally the source of retrieved data.
- A log file. This is the initial repository of all data contained in the data base. That is, when data is first passed from the user to the data base, it is written to the log file. All data is written as a component of either an explicit or an implicit transaction. Most transactions have more than one element. When the first element is inserted into the log file, it remains there unposted to the Data file. SQL monitors the element(s) until the transaction is completed. This could occur in one of several ways, e.g. the Commit instruction is executed. When the transaction is complete, it is so marked by SQL. After the transaction is marked as completed, the data is then eligible to be written to the data file.
The reason for temporarily storing data in a log file is two-fold:
- To ensure the data which is part of a multi-data transaction does not get posted to the data file until the entire related transaction is complete. Data would lose integrity if part of a transaction was posted and part was lost. This loss could occur if the transaction was cancelled/rollback by the user or by an equipment failure before the transaction was completed. Had the first data element been immediately posted, say a debit amount, then the corresponding element, the credit, would not be posted and the data base would contain inaccurate data.
- To enable data to be restored to data posted at a specific point-in-time, as opposed to either restoring all or none of a backup file. Data stored in log files can be backed up to a restorable backup file. This backup file can, in turn be partially restored to a full backup restoration if desired. You might want to do this if a full backup was made at noon, and at 12:30 the main inventory table was inadvertently deleted. The backup of the transaction log could be restored to 12:29, thereby leaving the inventory table intact.
Characteristics of Log files:
Data written to log files is written in a serial manner. That is, the drive head writes the data out in a single line, becoming concentric when each drive lap completes. This shouldn’t be thought of as stringing you along as this is the fastest way to write data to a disk. Data written to a data file is usually much slower, as the drive head has to seek and find the location within each logical table where to put the data. Consequently log files are usually not part of a RAID 5 structure as that would significantly slow down the processing. If RAID is used, then the log partition is usually RAID 1.
For performance reasons, we typically place logs on their own partition. If RAID is used, we have one partition as RAID 1, and the others as RAID 5. Another reason to place logs on their own partition is that, as explained above, data must be written first to the log, and then to the data file. If logs are placed on the same partition as the data files, the same write-head which writes the log must also write to the data partition. If logs are placed on their own partition, logs and data can be written in parallel. This doesn’t mean the same data being written to the log can be written to, in parallel, the data file. Rather, completed transactions already existing in the log file can be written to the data file simultaneously with newer, uncompleted transactions being written to the log file.
Kinda two for the price of one.
Configuring log files:
Setting the database backup mode:
As described above, data is constantly written to the log file. This causes the log file to grow until the data is deleted.
Data can be deleted one of two ways:
- By backing up the log file to a backup file. As soon as this completes, all completed transactions in the file are purged. This occurs when the database is configured for a “FULL” backup mode.
- By periodic, automatic purging of completed transactions via a process called “check pointing”. This occurs when the database is configured for the “SIMPLE” backup mode. These purges occur every few seconds in somewhat of a random fashion. The purge isn’t random. Its timing is. SQL Server checkpoints when it has nothing else to do.
Problems can arise if the database is configured for the “FULL” mode, but the log is never backed up. It just gets bigger and bigger. If the transaction log fills up all the space it’s authorized, or all the available space on the drive, the database will lock up. No data will be accepted until the log can expand again. If the log fills because it has reached its maximum specified size, then increase the maximum size to allow the log to grow. It is more difficult if the drive has run out of space and cannot be expanded.
When this occurs, the solution is simple, really. Just change the backup mode to “SIMPLE” in SQL Server System Manager. Next right click the database, select “tasks”, select Shrink, select files, and shrink the log file. I’ve never had problems doing this.
(This is true of SQL Server 2008 R2. Older versions get more difficult, but as I know all of you have either upgraded to 2008 R2 or 2012, let’s not bore you with those details. I’ll describe how to handle the older versions if you email me. Otherwise just upgrade, wink wink)
There’s one occasion where this won’t work. That occurs when the database is processing a large open transaction. It cannot delete an open transaction. If the log fills during a long transaction, then the process must be killed. If the transaction cannot be identified or killed, then the database server should be restarted. This will throw the transaction into rollback. It usually takes as long to rollback as it did to grow in the first place. But to date, there’s no other choice I’m aware of.
When a database server is installed, several initial configurations should be done to ensure the best performance from log files:
- Initial creation size
- Auto growth type
- Auto growth amount
- Maximum size
When SQL Server creates a log file, it needs to know the initial size. The default is 1 megabyte. This default can be set to whatever value desired. The impact of this size is the number of times the log file has to be grown to reach the needed size. Each time a log is grown, the database is impacted for several reasons:
Logs cannot utilize zero initialization. Each time the log grows:
- The data base is locked until the new space is zeroed out. The more times a log grows, the more times the database is locked.
- SQL generates between 4 and 16 virtual logs (VLFs). These virtual logs are minutely inspected whenever the data base server is rebooted. This inspection takes time. This time can be as much as 45 minutes if a server has say 25,000 VLF. That’s time the user spends waiting for the recovery to complete. No one can access a database while it’s in recovery.
I usually set the initial size to 100 megabytes. I also set the auto growth amount to 100 megabytes. This means that the log will be grown only 100th of the times it would have been if the defaults had been used. On potentially very large databases, say 300 gigs or more, I set both initial size and growth to 1 gigabyte. Each time an increment is added, at most, 16 VLF are added. If a log grows to 1 terabyte via the defaults, it means the database was locked a million times, and hundreds of thousands of VLF were created. If it grows at 1 gigabyte, then the database was locked only 1,000 times and only gained thousands of VLF.
While the locked time for growing at 1 meg is much less than the locked time for 1 gig, there will be one million start up and shutdown times added to the expansion period. There will be only 1,000 at 1 gigabyte. I suspect the customers experiencing the 1 million expansions thought they had a very slow data base when they probably didn’t. They just had a data base using the out-of-package defaults.
SQL Server enables the auto growth to be computed in either a set percentage of the existing size or in a specific size, such as 100 megabytes. Most DBA use the specific size to better control the amount of expansion. 10% of a 100 megabyte file is 10 megabytes. 10% of a 10 gigabyte log is 1 gig, and that might not be available.
SQL Server also enables the DBA to configure a maximum possible size for the log file. The desirable size depends on how big the data file is expected to be, what type of backup mode the data base is in, and what type of processing is expected to occur. At Fpweb.net, I usually set the max size at 3 gigs. This is normally a very large amount of space, considering we set our data bases in SIMPLE mode. This mode means the log is cleared every few seconds so it shouldn’t grow very large.
Many DBAs suggest setting the initial size of logs (and data files as well) to the maximum probable size for the data base. In the SharePoint world, I have met few people who can accurately do that. They either way overestimate the size, buying much wasted space, or have nowhere near enough space, resulting in constant additions and expansion.
Life is so unpredictable.
Other things we should be aware of:
A given database can have as many as 16 log files. Adding log files usually doesn’t help in processing. If the database needs to be in “FULL” backup mode, and more space is needed than the drive can provide, then opening an additional log file on another partition can alleviate this problem.
Logs cannot be shrunk smaller than the original size they were opened at–rules.
If a data base has multiple log files, then any of those created after the first file can be emptied and deleted if no longer needed.
When a user requests data which hasn’t been posted to the data file, then the data might be retrieved from the log file. This depends on the transaction isolation level. This is a subject for at least one Next Byte. So I won’t go into it now—how’s that for a tease?
When a data base is backed up, the transaction log becomes part of the backup file. It will not be a separate backup file. However, new data added to during the backup isn’t included in the backup file being generated. The new data is stored in a different part of the transaction log. It stays there while the old completed transactions are purged and will be included in the next log backup.
It is possible to restart a data base whose log file has been corrupted. But this is time-consuming and problematic.
It is possible, but very tedious to read the data in a log file.
Log files can only be restored to the database of which it is a member of the logical backup set. A logical backup set is comprised of all the transaction log backups, and differential backups made in chronological sequence since a given full backup.
Here is a schema showing the flow of data noting the entry into the log and from the log to the data file.
- Explained what transaction logs are, and why we should care about them.
- How log files are configured and some of the configuration considerations.
- Described other interesting tidbits about log files.
…Until the Next Byte.
Please review my other posts on Microsoft SharePoint and SQL Server for more useful information on the subject.