Discussing SQL Server Files and Filegroups
- What are they?
- Why are they important?
- What do I need to know about them?
All will be revealed as we begin…
I was working industrially last week when in walks Chris, the company President. After the niceties, he asked:
“Don, I was reading an article this morning about SQL Server where it described the various types of files within SQL Server, data files, log files, more log files, blobs, binaries, and text files, among others. I started thinking (dangerous!). We often talk about files. Just what is a file? What files does SQL Server have? How do I treat one file differently than another? And what is a filegroup—several files hooked together?”
I’m paraphrasing of course because he talks for much longer… but always eager to impart knowledge, I replied:
“No, Chris. A filegroup is a virtual entity used by DBAs to handle several files as if they were a single file. Chris, a file is merely a location on a storage device. It has a name and is locatable. It contains either text that can be read, or machine language that can be executed. For the most part, SQL has Data files which contain data that can be read, executables that run the program, log files that contain data until it is posted into the data files, and log text files which can be read to understand what processing is going on.
Filegroups on the other hand, are not physical objects. Filegroups enable moving or backing up or restoring some data files as a group, as opposed to being processed individually. You cannot see a filegroup. You can only tell which filegroup a file belongs to. Say you have a database which contains 12 data files grouped into three filegroups of four each. You could back up or restore just four files, if that would fill your need, instead of all 12 files. Hopefully this would save you lots of time.”
“Don, filegroups won’t save me any time; I’m not a DBA. So I can get a single file per database, or can you share files between databases? I doubt I’d ever need more than one file per database.”
“No, Chris. Each database gets a minimum of two files, one data and one log. Files cannot be share between databases. Every data file belongs to a filegroup. A SQL Server database can have as many as 32 thousand files, but few ever have more that six or eight. A database can have more than one filegroup. Only data files are in filegroups. Log files have no filegroups.”
“Why Don? Sounds wasteful to me”.
We pause to take sips from our coffees. All this talking can make you quite parched.
“Don, just why would you ever want more than one data file per database?”
“Performance, Chris, performance. SQL opens threads for each partition or hard drive in the database. The more threads used by the server, the better the performance. Files cannot straddle partitions or drives. So each subsequent partition available to a database requires a different data file be created on it to benefit from the additional thread.”
“Don, so many rules. Are all data files alike?”
“No, Chris. That would be too simple. Files can have different initial sizes, growth rates and maximum sizes. Typically, here at Fpweb.net, we open files at 100 megs unless we know they’re going to grow into the 10’s of gigabytes. Then we create them at one gigabyte. We grow them at 100 megabytes per step, unless we believe they’ll grow, again, into the 10’s of gigabytes. Then we grow them at one gigabyte per instance. A company just has to try various amounts and see what works best for them.”
“Why grow the files, Don? Why don’t you just make them the size they should be when you first create them?”
“Chris, I’m just not smart enough to guess what size a given database in SharePoint will grow to. We have 15 site databases in our farm. One has 100 megabytes of data. Three are over 50 gigabytes, each. When we put the farm in, no one ever thought we’d have more than 20 gigabytes of content for all the sites.”
“Obviously you’re not executive material, Don. You should learn to dream ahead and think big. Well, does it take long to grow the files?”
I raise an eyebrow at his sarcasm.
“Chris, that depends. When either data or log files are expanded, the new disk area has to be cleaned up by writing zeros to all the bit locations before any data is written to the bit. This is called ‘zeroing out’. When log files are expanded that new area must be zeroed out before any of the area is used. If the new area is large, say a gigabyte, zeroing out can take a minute or more. During this time the database is locked. Users often think they are experiencing poor performance when the problem is really the log is expanding.
Data files also expand, but they don’t have to take as much time. There is a permission granted by the operating system which enables data files to do the zero out before each bit is first written to. This is called, uniquely enough, ‘zero initialization’.
However, this permission is not granted by the SQL Server. It is granted to the SQL Server service log in account. One cannot even directly determine within SQL Server whether the log in account has the permission.”
“ OK, I’ll bite. Don, how can you determine if you have the permission?”
“Chris, we cheat, and simply create a large test database. Say one of 10 gigs, and watch how long it takes to create. If it takes more than three minutes, we probably don’t have the permission. We then ask the system administrator to grant the ‘Perform Volume Maintenance Tasks’ permission to the log in.”
“Sounds simple, Don. Everyone should have that permission. Just make it so. Anything else I should know about files?”
“Well, there’s the question of whether to enable auto-shrinking. The answer is no. The database file tries to shrink virtually every time it deletes data, and then needs to expand as soon as more data is added. It locks the database each time this happens so performance goes downhill immediately. We could also talk about tempdb, and table-partitioning but these will take some time.”
“Don, it’s late, I’m tired, and maybe a little bored… Maybe some other time.”
Chris leaves and I return to my work, harboring more knowledge to share… on the next Byte.