Saturday, January 7, 2012

Command 'fsck ' in Solaris.

Checking and Repairing Unix File system with fsck

fsck is a Unix utility for checking and repairing file system inconsistencies . File system can become inconsistent due to several reasons and the most common is abnormal shutdown due to hardware failure , power failure or switching off the system without proper shutdown. Due to these reasons the superblock in a file system is not updated and has mismatched information relating to system data blocks, free blocks and inodes .

fsck – Modes of operation :

fsck operates in two modes interactive and non interactive :
Interactive – fsck examines the file system and stops at each error it finds in the file system and gives the problem description and ask for user response whether to correct the problem or continue without making any change to the file system.

Non interactive :fsck tries to repair all the problems it finds in a file system without stopping for user response useful in case of a large number of inconsistencies in a file system but has the disadvantage of removing some useful files which are detected to be corrupt .

If file system is found to have problem at the booting time non interactive fsck is run and all errors which are considered safe to correct are corrected. But if still file system has problems the system boots in single user mode asking for user to manually run the fsck to correct the problems in file system.

Running fsck :
fsck should always be run in a single user mode which ensures proper repair of file system . If it is run in a busy system where the file system is changing constantly fsck may see the changes as inconsistencies and may corrupt the file system .

If the system can not be brought in a single user mode fsck should be run on the partitions ,other than root & usr , after unmounting them . Root & usr partitions can not be unmounted . If the system fails to come up due to root/usr files system corruption the system can be booted with CD and root/usr partitions can be repaired using fsck.

command syntax
fsck [ -F fstype] [-V] [-yY] [-o options] special
-F fstype type of file system to be repaired ( ufs , vxfs etc)
-V verify the command line syntax but do not run the command
-y or -Y Run the command in non interactive mode – repair all errors encountered without waiting for user response.
-o options Three options can be specified with -o flag
b=n where n is the number of next super block if primary super block is corrupted in a file system .
p option used to make safe repair options during the booting process.
f force the file system check regardless of its clean flag.
special – Block or character device name of the file system to be checked/repaired – for example /dev/rdsk/c0t3d0s4 .Character device should be used for consistencies check & repair
fsck phases
fsck checks the file system in a series of 5 pages and checks a specific functionality of file system in each phase.
** phase 1 – Check Blocks and Sizes
** phase 2 – Check Pathnames
** phase 3 – Check Connectivity
** phase 4 – Check Reference Counts
** phase 5 – Check Cylinder Groups
fsck error messages & Corrective action :

1. Corrupted superblock – fsck fails to run
If the superblock is corrupted the file system still can be repaired using alternate superblock which are formed while making new file system .
the first alternate superblock number is 32 and others superblock numbers can be found using the following command :
newfs -N /dev/rdsk/c0t0d0s6
for example to run fsck using first alternate superblock following command is used
fsck -F ufs -o b=32 /dev/rdsk/c0t0d0s6

2. Link counter adjustment
fsck finds mismatch between directory inode link counts and actual directory links and prompts for adjustment in case of interactive operation. Link count adjustments are considered to be a safe operation in a file system and should be repaired by giving ‘y’ response to the adjust ? prompt during fsck.

3. Free Block count salvage
During fsck the number of free blocks listed in a superblock and actual unallocated free blocks count does not match. fsck inform this mismatch and asks to salvage free block count to synchronize the superblock count. This error can be corrected without any potential problem to the file system or files.

4. Unreferenced file reconnection
While checking connectivity fsck finds some inodes which are allocated but not referenced – not attached to any directory . Answering y to reconnect message by fsck links these files to the lost+found directory with their inode number as their name .
To get more info about the files in lost+found ‘file’ command can be used to see the type of files and subsequently they can be opened in their applications or text editors to find out about their contents. If the file is found to be correct it can be used after copying to some other directory and renaming it.

Booting Process Explained

Booting Process in Solaris

Understanding the booting process is important in the sense that you can get a clear idea when a system faces a booting problem if you are familiar with the booting sequence and steps involved. You can thereby isolate a booting phase and quickly resolve the issues.
Booting process in Solaris can be divided in to different phases for ease of study . First phase starts at the time of switching on the machine and is boot prom level , it displays a identification banner mentioning machine host id serial no , architecture type memory and Ethernet address This is followed by the self test of various systems in the machine.
This process ultimately looks for the default boot device and reads the boot program from the boot block which is located on the 1-15 blocks of boot device. The boot block contains the ufs file system reader which is required by the next boot processes.

The ufs file system reader opens the boot device and loads the secondary boot program from /usr/platform/`uname –i`/ufsboot ( uname –i expands to system architecture type)
The boot program above loads a platform specific kernel along with a generic solaris kernel
The kernel initialize itself and load modules which are required to mount the root partition for continuing the booting process.

The booting process undergoes the following phases afterwards :
1) init phase
2) inittab file
3) rc scripts & Run Level

1. INIT phase
Init phase is started by the execution of /sbin/init program and starts other processes after reading the /etc/inittab file as per the directives in the /etc/inittab file .
Two most important functions of init are
a) It runs the processes to bring the system to the default run level state ( Run level 3 in Solaris , defined by initdefault parameter in /etc/inittab )
b) It controls the transition between different run levels by executing appropriate rc scripts to start and the stop the processes for that run level.

2. /etc/inittab file
This file states the default run level and some actions to be performed while the system reaches up to that level. The fields and their explanation are as follows :
S3:3:wait:/sbin/rc3 > /dev/console 2>&1 < /dev/console
S3 denotes a identification if the line
3 is run level
wait is action to be performed
/sbin/rc3 is the command to be run.
So the fields in the inittab are
Identification : run level : action : process
The complete line thus means run the command /sbin/rc3 at run level 3 and wait until the rc3 process is complete.
The action field can have any of the following keywords :
Initdefault : default run level of the system
Respawn : start and restart the process if it stops.
Powerfail : stop on powerfail
Sysinit : start and wait till console in accessible .
Wait : wait till the process ends before going on to the next line.

3. RC scripts & Run Levels
Rc scripts performs the following functions :
a) They check and mount the file systems
b) Start and stop the various processes like network , nfs etc.
c) Perform some of the house keeping jobs.
System goes in to one of the following run level after booting depending on default run level and the commands issued for changing the run level to some other one.
0 Boot prom level ok> or > prompt in Sun.
1 Administrative run level . Single user mode
2 Multiuser mode with no resource sharing .
3 Multiuser level with nfs resource sharing
4 Not used
5 Shutdown & power off (Sun 4m and 4u architecture )
6 Reboot to default run level
S s Single user mode user logins are disabled.
Broadly speaking the running system can be in any of the folloing state
Single user – Minimum processes running , user logins disabled and root password is required to gain access to the shell .
Multiuser - All system processes are running and user logins are permitted
Run level of a desired state is achieved by a number of scripts executed by the rc program the rc scripts are located in /etc/rc0.d , /etc/rc1.d , /etc/rc2.d , /etc/rc3.d & /etc/rcS.d directories . All the files of a particular run level are executed in the alphanumeric order .Those files beginning with letter S starts the processes and those beginning with K stops the processes.
These files are hard linked to the files in /etc/init.d in order to provide a central location for all these files and eliminating the need to change the run level in case these scripts needs to be run separately . The files in /etc/init.d directory are without any S , K and numeric prefix instead a stop / start argument has to be supplied whenever these scripts are to be executed .
By default system has a number of rc scripts needed for run level transition but sometimes it becomes necessary to start some custom scripts at the booting time and turn them off at the shutdown . Custom scripts can be put in any of the required rc directory but following major considerations has to be kept in mind :
* The sequence number of the file should not conflict with other files.
* The sevices needed should be available by previously executed scripts.
* File should be hard linked to the /etc/init.d directory .
* The system looks for only those files beginning with letter K & S , any thing else is ignored , therefore, to make a file inactive simply changing uppercase K or S to lower case will cause system to ignore it .