Paul Nicholls Stuff

21May/080

File Management

File management is a key function of the Operating System. The OS must effectively manage reading and writing data from different storage media, using different file systems and different disk access methods. The OS may also choose to cache data to improve performance of disks.

The file management system is the system which controls access to the storage devices. Some typical examples are NTFS (Windows NT File System), FAT (File Allocation Table) and EXT (extended File System). The file management system has four main functions:

  • To allocate space on storage devices to data. Space is usually divided into slots or blocks called allocation units. The system must also deallocate space when a file is deleted.
  • To track which allocation units store which files, since a file may be spread across units.
  • To control access rights and permissions.
  • To map logical addresses to physical addresses.

The file management system must allow all kinds of attributes of files to be stored, including the file's name, location in a directory structure, size, permissions, unique identifier, and information such as the time it was last accessed and modified.

A 'file' may be structured in a number of ways, and ultimately this is determined by a combination of both the Operating System and the user application. The file could have no structure, and just be a sequence of binary data, alternatively is could have a series of fixed or variable length fields with defined field and record separators. Alternatively, there could be a very complex structure indicating a formatted document which will require the correct interpretation with an application to understand.

When stored on a disk however, a file is generally stored on fixed-size blocks - a block is the smallest unit of storage which can be addressed in a file I/O operation, which can lead to internal fragmentation where space is wasted in the end of the last block of a file.

The operating system may additionally use disk space as an extension of main memory, known as swap-space. This an be a "file" on the filesystem, or a separate disk partition. The kernel uses swap maps to track use of the swap-space and will allocate data to swap-space if it is forced out of physical memory.

File Access

Files can be accessed from a storage media in two key ways:

  • Sequential. This method should be used in high hit-rate application such as payroll; this is where a high proportion of the records are needed at one time, so the tape/seek can run all the way though getting all the data in one go. It is not efficient if only a few records need to be accessed.
  • Direct. Direct files are used when fast access to individual records are needed. For example on a network the table of user names and passwords might be a random file. It may also be used in a booking system where fast access to individual records is important

File Allocation

File allocation is the process of allocating a file to blocks on the storage media.

Contiguous Allocation

Under this allocation scheme each file will occupy a set of contiguous blocks on the disk. It is a very simple method to apply, since the file allocation schema will only need to remember the starting block and the number of blocks that the file occupies. It is however, very wasteful of space and does not allow files to grow.

Linked Allocation

In this system, a file is part of a linked list of disk blocks. Blocks may be scattered anywhere on the disk (depending on available space), and each block points to the next one in the linked list. This is very simple to store (only the start address needs to be remember) but does make it difficult to have random-access to a file (access where you only want to consider part of an individual file).

Indexed Allocation

With this system, each file has an index block that contains an array of the disk block addresses, this is quite a complex system to set up as each file needs to have this index block and an index table may index all of the information from all of the blocks. The method does, however, allow random access and reduces fragmentation.

Disk Scheduling

The Operating System must use the hardware available to it efficiently, for disk drives this means having a fast access time and disk bandwidth. The time it takes to access a file has two major components, the seek time (the time for the disk arm to move to the head of the correct cylinder) and the rotational latency (the time waiting for the disk to rotate to the desired sector). The disk bandwidth is the total number of bytes transferred divided by the total time between the first request and the completion of the last byte of transfer.

Disk scheduling is controlled by an algorithm, much like processor scheduling:

  • First Come First Served (FCFS), which follows the order of the queue and is slow but fair.
  • Shortest Seek Time first (SSTF), it finds the item with the lowest seek time (given the current head location) and serves it first. It is similar to shortest-job-first and suffers the same pitfall that it can cause starvation of jobs.
  • SCAN, the disk arm moves from one end of the disk to the other servicing requests as it moves across the disk, until it reaches the other side where it reverses and the process continues.
  • C-SCAN (circular-SCAN), this is like SCAN, however the disk arm does not service requests on the return trip but rather jumps back to the start of the disk again.
  • LOOK, is like SCAN only rather than moving right to the end of the disk it only goes as far as the last request and then reverses direction.
  • C-LOOK, is like C-SCAN only rather than moving to the end of the disk, it only goes as far as the last request, then jumps to the location of the first-request from the start of the disk.

Disk Management

Before a disk can be used low-level formatting has to take place, this is the process of dividing the disk into sectors that the disk controller can read and write. The operating system will also need to hold its records of the data structure of the disk on it, including details of the logical formatting (such as "Drives") of the disk.

Bad blocks are the result of damage to the disk from its moving parts, and are inevitable, the disk controller will maintain a list of bad blocks to ensure they are not used.

Disks can be used cooperatively, this system is known as RAID [See Using RAID for more].

File & Object Protection

Computer systems consist of a collection of objects: hardware and software. Each object has a unique name and can be accessed through a well-defined set of operations. The goal of protection is to ensure that each object is accessed correctly and only by those processes that are permitted to do so. This will ensure that there are no violations of access restrictions, and ensure resource usage is consistent with stated policies.

References:
  • Janet Lavery - Durham University Computer Systems, Operating Systems Lecture 8, 2007
  • Janet Lavery - Durham University Computer Systems, Operating Systems Lecture 9, 2007
  • anet Lavery - Durham University Computer Systems, Operating Systems Lecture 10, 2007