Security, networking and system integration: November 2013

Saturday, November 30, 2013

Linux Security script to determine DDOS origin location

In computing, a denial-of-service attack (DoS attack) or distributed denial-of-service attack (DDoS attack) is an attempt to make a machine or network resource unavailable to its intended users. Although the means to carry out, motives for, and targets of a DoS attack may vary, it generally consists of efforts to temporarily or indefinitely interrupt or suspend services of a host connected to the Internet.

On various Nix server setups we are always exposed to the DDOS attacks from various other similar setups or intended use. Often in some cases our server is used as a Botnet machine to exploit resources on other systems.

These attacks can be verified from the shell in a form of many open sockets from one or more IP addresses. Often these open sockets are more than 150 , which is not normal. Many IT people are using a DDOS prevention scripts to ban those IP addresses. I stumbled upon a request from a friend to write a script that will tell us the Country of attack origin. This was always missing in our troubleshooting.

So I have written a small and useful script, that is a combination of often used Netstat and Whois commands that can be found online. Also similar script code can be found on the internet, and people can adjust the code to their needs. I needed a script that will associate and display the origin of country and the IP socket combination.

Code

#!/bin/bash

{

cat=$( netstat -ntu | grep ':' | awk '{print $5}' | sed 's/::ffff://' | cut -f1 -d ':' | sort | sort -nr | less);

for i in $cat; do

Country=$( whois $i | grep -i Country | awk '{print $2}' );

echo "Land+IP= $Country $i ";

done;

}

end of code.

To elaborate more on code I will explain the details. I am using the Bash shell scripting, which is very common. This code is using a Netstat command from the classic and tuned ddos Deflate script that is common for fighting DDOS attacks.

netstat -ntu | grep ':' | awk '{print $5}' | sed 's/::ffff://' | cut -f1 -d ':' | sort | sort -nr | less

This command gives us the output of IP sockets and we print them out using the AWK for text processing. I have attached the sed switch to replace the empty addresses and space with null ffff value.

The sort and less switches are helpful for sorting and properly displaying the addresses. I have put this into on CAT function and defined this concencated output with a $ sign as a variable.

The variable is further used for a loop that is needed for the WHOIS command which will tel use the Country of origin. Classic for loop is using a i for the increment value.

Country=$( whois $i | grep -i Country | awk '{print $2}' );

If we use the whois command with the grep function for the Contry it will only display us the Country of origin. So I have used this command with the incremented concencated display in the loop.

Simple enough we get a display of current IP sockets with Country of origin:

Land+IP= BA 71.222.xxx.xxx

Land+IP= IT 88.138.xxx.xxx

Land+IP= BA 61.38.xxx.xxx

So this output will generate all the Sockets, and if we see many exact same sockets from one Country we can pinpoint the location and origin from the attack. Script can be more fine tuned so everyone is welcome.

Feel free to code.

Monday, November 18, 2013

Configure Corefig as a free managament tool for Hyper-V 2012

Reading a lot of articles online I have found that Hyper-V 2012 has some cool new features that are free to use. Some of them are in battle with VMware, like HA and the new SMB 3.0 protocol. I have installed a nested Hyper-V 2012 under the VMware setup to test the management tools.

One tool I have found as a useful collection of PowerShell scripts is the Corefig. Here are some steps to use with this management tool and how to configure it.

On a fresh install of the Hyper-V Hypervisor one should enable the Remote Managament under the initial powershell options. To copy the files from the downloaded Corefig site firewall should be disabled on the Hyper-V 2012. This can be done via a simple command using netsh.

The next step is to download the Corefig.zip file and copy it to the Hyper-V hypervisor. One can copy the files using the Netbios protocol, and simply typing the \\hyper-vsrv location and creating a folder under the root of the Hypervisor called Corefig.

The link for the Corefig installation can be found here.

After extracting the files , we should start the Powershell script to initialize the Corefig installation process and to make it as a startup service. This can be done via a simple command:

CD C:\COREFIG

POWERSHELL .\COREFIG.PS1

Soon after that we have an instance of the Corefig started and can use all of the managament functions it offers us. A simple screenshot will show the GUI of the tool.

We can easily change the network settings, a small Control Panel utilities, and general Hyper-V settings. One great security tool is to manage the firewall via the GUI is easy for creating the first initial rules. I have disabled the firewall for testing purposes.

Acording to Microsoft this tools is verified to work with these setups:

Verified: Microsoft Windows Server 2012 (Core Installation)
Verified: Microsoft Windows Server 2012 (Complete GUI Installation)
Verified: Microsoft Hyper-V Server 2012

Feel free to use the tool and comment on it.

Windows Server 2008 PKI Single Tier CDP

In cryptography, a PKI is an arrangement that binds public keys with respective user identities by means of a certificate authority (CA). The user identity must be unique within each CA domain. The third-party validation authority (VA) can provide this information on behalf of CA. The binding is established through the registration and issuance process, which, depending on the assurance level of the binding, may be carried out by software at a CA or under human supervision.

Active Directory Certificate Services (AD CS) is an Identity and Access Control security technology that provides customizable services for creating and managing public key certificates used in software security systems that employ public key technologies.

A system or systems where the CRL (Certificate Revocation List) is placed for retrieval by Relying Parties or others throughout the PKI environment. A CDP should be referenced in each Certificate so that Relying Parties can readily check the CRL before relying on the Certificate. Most CDPs are accessible via HTTP or LDAP.

In this small setup we have a Windows Server 2008 R2 with following rolles installed :

Active Directory Certificate Services
Active Directory Domain Services
DNS Server
Web Server (IIS)

We have a client Windows 7 desktop machine that is joined to the domain. We want to test if the machine has got a certificate for negotiating the authentification and other domain procedures. And also we want to ensure that the AutoEnrollement is turned so that every other machine in the domain will do this automatically.

After installing the roles we should create a Certificate Authority policy file as a template for all the other certificates and save it under the c:\windows folder as a CaPolicy.inf.

[Version]

Signature="$Windows NT$"

[PolicyStatementExtension]

Policies=InternalPolicy

[InternalPolicy]

OID= 1.2.3.4.1455.67.89.5

Notice="Legal Policy Statement"

URL=http://pki.corp.local/cps.txt

[Certsrv_Server]

RenewalKeyLength=2048

RenewalValidityPeriod=Years

RenewalValidityPeriodUnits=10

LoadDefaultTemplates=0

AlternateSignatureAlgorithm=1

To see the location of the CDP we point ourselves to the Start>AdministrativeTools>Certification Authority

No to ensure that all the PCs in the Active Directory domain called corp.local enroll these certificates we should modify the default domain Group Policy. This can be done via the gpmc.msc policy command.

To review the Cetificate Enrollment we should checkout the local GPO settings on the client machine. This can be done using the MMC console on the Client Windows 7 machine. The Snapint is the Certificate Authority to manage all the local certificates.

We can see that we have enrolled the certificate from the DC1 that is our Domain Controller. And the last thing to see is the purpose of the certificate.

We can see that we have got the All Issuance Policies certificate installed. This also means that the Windows 7 recognized the OID numbers from the CaPolicy.inf file. To research further one can use the Microsoft Technet for other CA purposes.

Feel free to comment.

Wednesday, November 13, 2013

Compile Source of Apache/MySQL/PHP on a Linux VPS

Linux IT Engineers often use the Debina APT, or RedHat YUM repositories for an quick and easy install of the services on their servers. But , in some cases we often need to test the latest packages. For example the latest version of MySQL is 5.7 and we cannot get it via the apt, we have to manually download and install in on our Virtual Private Server. Then we can configure it to our production enviroment.

I have configured the VPS server with the 12.04 LTS versions. I prefer the LTS version because of the support for the security and long term update.

First we start with creating the sources folder and downloading the Apache packages that are needed for our web server:

sudo mkdir /usr/src/sources

wget http://httpd.apache.org/dev/dist/httpd-2.4.2.tar.gz

tar xvfz httpd-2.4.2.tar.gz

After downloading the httpd packages and extracting them we can move on further in our process. We now need the APR utilities and the APR package itself. The APR stands for Apache Portable Runtime.

wget http://apache.spinellicreations.com//apr/apr-1.4.8.tar.gz

tar -xzf apr-1.4.8.tar.gz

rm apr-1.4.8.tar.gz

cd apr-1.4.8/

sudo apt-get install make

sudo ./configure

sudo make

sudo make install

Then we need the APR Utils to be configured.

wget http://mirrors.axint.net/apache//apr/apr-util-1.4.1.tar.gz

tar -xvzf apr-util-1.4.1.tar.gz

cd apr-util-1.4.1

./configure --with-apr=/usr/local/apr

make

make install

cd ..

Now we can return to the HTTPD folder to compile and install the Apache:

cd /usr/local/sources/httpd-2.4.2

./configure --enable-file-cache --enable-cache --enable-disk-cache --enable-mem-cache --enable-deflate --enable-expires --enable-headers --enable-usertrack --enable-ssl --enable-cgi --enable-vhost-alias --enable-rewrite --enable-so --with-apr=/usr/local/apr/

make

make install

cd ..

To startup the Web server we will create a soft link to a startup script and copy the startup script to the init.d folder for startup options.

ln -s /usr/local/apache2/bin/apachectl /usr/bin/apachectl

cp /usr/local/apache2/bin/apachectl /etc/init.d

update-rc.d apachectl defaults

Now we can reboot the server and check if it is running. And we can see that the daemon is running.

root@ubsrv1:~# ps aux | grep httpd

root 1063 0.0 0.2 0:00 /usr/local/apache2/bin/httpd -k start

daemon 1065 0.0 0.2 0:00 /usr/local/apache2/bin/httpd -k start

daemon 1066 0.0 0.2 3 0:00 /usr/local/apache2/bin/httpd -k start

Next what we should do is to continue with the PHP support and installation.

cd /usr/src/sources

wget http://us2.php.net/get/php-5.5.5.tar.gz/from/ar2.php.net/mirror

tar xfvz php-5.5.5.tar.gz

cd php-5.5.5

./configure --prefix=/var/www/ --with-apxs2=/var/apache2/bin/apxs --with-config-file- path=/var/www/php --with-mysql

make

make install

The --prefix folder depends on the folder where you installed the apache, you can change this to your needs. And the final step is to install the MySQL server.

groupadd mysql

useradd -r -g mysql mysql

cd /usr/src/sources

wget http://dev.mysql.com/get/Downloads/MySQL-5.6/mysql-5.6.14.tar.gz/from/http://cdn.mysql.com/

tar zxvf mysql-5.6.14.tar.gz

ln -s /usr/src/sources/mysql-5.6.14 /usr/local/mysql

cd /usr/local/mysql

chown -R mysql .

chgrp -R mysql .

scripts/mysql_install_db --user=mysql

chown -R root .

chown -R mysql data

Here we have a simply longer procedure. First we create a user and group. Then download the source files, extract them, create a soft link and run the mysql_install_db script. Add the permissions to the folders and that is all.

Now we can restart the server and everything should work fine. If not check out some cool tutorials on Ubuntu community.

Source http://community.ubuntu.com/

Monday, November 11, 2013

Linux SWAP Partition as twice the RAM size - why ?

Linux divides its physical RAM (random access memory) into chucks of memory called pages. Swapping is the process whereby a page of memory is copied to the preconfigured space on the hard disk, called swap space, to free up that page of memory. The combined sizes of the physical memory and the swap space is the amount of virtual memory available.

Swapping is necessary for two important reasons. First, when the system requires more memory than is physically available, the kernel swaps out less used pages and gives memory to the current application (process) that needs the memory immediately. Second, a significant number of the pages used by an application during its startup phase may only be used for initialization and then never used again. The system can swap out those pages and free the memory for other applications or even for the disk cache.

To see a VPS machine with 512 MB of physical RAM and the ratio of SWAP space we can use free -m

The picture shows us that we have total of 490 MB of physical ram and twice the size of the SWAP memory on the hdd (which is not yet used) of 991 MB. As this server has small amount of free memory (only 76MB) I had to investigate further. I have used the HTOP utility to see the real memory consumer.

The process with the ID of 1310 is using some memory resources dynamically. We cannot see the real proccess name, because it is assigned to multiple instances of one application. To investigate further we should use the PID number to see which service is killing the VPS machine.

This can be done with the PMP command : pmap -x 1310

The output shows us that the SAMBA libraries are attached to the PID. Simply stopping the SAMBA service I will free up some memory.

To get back on the inital question , I will try to have a short explanation of the SWAP size. The memory hirearchy presented to application by the Linux system is arranged in few levels:

Processor/CPU registers - bits in size
L1 Cache - kbits in size
L2 Cache - MBs in size
L3 cache - 100 of MBs in size
RAM - GBs in size

The data that is used in application loading is moved to RAM, and some of that data need the L3 cache also. The used application data is often moved to the faster L2 and L1 cache memory. As we can see the data is moved from the RAM up to the last CPU register table. In this order we can get that the SWAP data should be between 1.5 and 2 times the Actual RAM is.

This is the main reason why we should create the SWAP twice the RAM. And applications should not allocate that data to other HDDs, especially large ones, because of the slow I/O operations. If your RAM is free then there is no use of swap partition.

Feel free to comment.

Sunday, November 10, 2013

Linux service security - Deny Hosts

DenyHosts is a log-based intrusion prevention security tool for SSH servers written in Python. It is intended to prevent brute-force attacks on SSH servers by monitoring invalid login attempts in the authentication log and blocking the originating IP addresses.

DenyHosts checks the end of the authentication log for recent failed login attempts. It records information about their originating IP addresses and compares the number of invalid attempts to a user-specified threshold. If there have been too many invalid attempts it assumes a dictionary attack is occurring and prevents the IP address from making any further attempts by adding it to /etc/hosts.deny on the server.

To install and configure the DenyHosts we should use the EPEL repository. A simple BASH command:

yum --enablerepo=epel install denyhosts

After a successfull installation we should take a first look at the configuration file to allow certain secure IP addresses to log into the server console.

nano /etc/hosts.allow

I have added a Local Area Network IP address to have access to the SSH service. All other IP addresses are blocked by default to log into the ssh server.

Optionally and IT admin can use the /etc/denyhosts.conf file to create email alerts if a user tries to log on to the server from a different IP address.

To comply to the setting we should now restart the denyhosts service and add it as a startup script.

chkconfig denyhosts on

service denyhosts start

To see the logs on tried and failled logins , or a simulated attack we should tail a log file:

tail -f /var/log/secure

We see that we have an Accepted password from our IP address that we allowed.

If you’ve list of static IP address that you want to whitelist permanently. Open the file /var/lib/denyhosts/allowed-hosts file. Whatever IP address included in this file will not be banned by default.

Feel free to comment.

Thursday, November 7, 2013

Soft File links in Linux

A symbolic or “soft” link points to a file by name. When the kernel comes upon a symbolic link in the course of looking up a pathname, it redirects its attention to the pathname stored as the contents of the link. The difference between hard links and symbolic links is that a hard link is a direct reference, whereas a symbolic link is a reference by name. Symbolic links are distinct from the files they point to.

Symbolic links operate transparently for most operations: programs that read or write to files named by a symbolic link will behave as if operating directly on the target file. However, programs that need to handle symbolic links specially (e.g., backup utilities) may identify and manipulate them directly.

To create a symbolic link in Unix or Linux, at the shell prompt, enter the following command:

ln -s {target-filename} {symbolic-filename}

So let us make a simple example of a index.php file in the web server directory.

The file /home/guru/index.php could then be moved elsewhere without causing the symbolic link to stop working (not that moving this directory is advisable).

It is a common mistake to think that the first argument to ln -s is interpreted relative to your current working directory. However, it is not resolved as a filename by ln; it’s simply a literal string that becomes the target of the symbolic link.

After creating the symbolic link, it may generally be treated as an alias for the target. Any file system management commands (e.g., cp, rm) may be used on the symbolic link. Commands which read or write file contents will access the contents of the target file. The rm (delete file) command, however, removes the link itself, not the target file.

HTOP Interactive process viewer on Linux

Htop is an interactive system-monitor process-viewer written for Linux. It is designed to replace the Unix program top. It shows a frequently updated list of the processes running on a computer, normally ordered by the amount of CPU usage. Unlike top, htop provides a full list of processes running, instead of the top resource-consuming processes. Htop uses color and gives visual information about processor, swap and memory status.

Users often deploy htop in cases where Unix top does not provide enough information about the systems processes, for example when trying to find minor memory leaks in applications. Compared to top, it provides a more convenient, cursor-controlled interface for killing processes.

It is a very simple forward package installation on Centos server.

yum install htop

And that is all to it. You start the the application just by typing the htop command in shell. Let us take a look at the interface.

You can use a cool features for filtering and killing the processes you think that are using up the resources on the server. I often use the SortBy command to sort the processes by either the CPU or the MEMORY usage.

Have fun!!!

Source HTOP

Squid & SquidGuard proxy on Centos

Squid & SquidGuard proxy on Centos Server

Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. Squid has extensive access controls and makes a great server accelerator. It runs on most available operating systems, including Windows and is licensed under the GNU GPL.

Squid is used by hundreds of Internet Providers world-wide to provide their users with the best possible web access. Squid optimises the data flow between client and server to improve performance and caches frequently-used content to save bandwidth. Squid can also route content requests to servers in a wide variety of ways to build cache server hierarchies which optimise network throughput.

In this short blog I will configure the proxy server with the guard functions on a Centos 6.4 Server machine. There are three proxy scenario setups:

Proxy server - The web browser on the client is configured to point to the proxy server's IP address.
Transparent Proxy Server - The router sends all traffic on defined ports, to the transparent proxy server, this way clients cannot bypass the proxy server
Reverse Proxy Server (Cache) - The reverse proxy server or cache server is placed in-front-of or prior-to the web server in order to speed up delivery of frequently requested pages and to protect the web server by creating a layer of separation and redundancy

I will setup the first simple scenario where I will point my Firefox to a certain ip address and the port on which the proxy server is listening.

To start there are three simple command to install squid, setup the startup script and start the service:

yum install squid

service squid start

chkconfig squid on

Let me test if the Squid is listening on the default port which is port 3128.

[root@centos-dc squid]# netstat -antp | grep squid

tcp 0 0 :::3128 ::::* LISTEN 7312/(squid)

All the configuration of the Squid server is kept in the file /etc/squid/squid.conf. You can for instance change the port on which the Squid is listening by modifying the lines:

# Squid normally listens to port 3128

http_port 3128

For the initial test I have configured my client browser to point to this port and the IP address of the Centos server. I am using Firefox for the test and the settings can be found at: Edit-preferences > advanced > network tab > connection settings > manual proxy configuration.

After some random web page browsing I would like to see the parsing of the Squid proxy. This can be achieved in looking the insides of the log file of squid. Location of the file is /etc/var/log/squid/access.log

In the log file are recorded some POST and GET methods from the browser, so we can see that the proxy is working fine. We can now move on on installing the SquidGuard function of the proxy server.

SquidGuard is a URL redirector used to use blacklists with the proxysoftware Squid. There are two big advantages to squidguard: it is fast and it is free. SquidGuard is published under GNU Public License.

If one has installed a EPEL repository files that the installation is sraightforward. For more details on EPEL follow this link EPEL

yum install squidGuard

After a succesfull installation we can download the the latest BlackList file and copy it into the folder /var/squidguard/blacklists to be effective in URL filtering.

wget http://squidguard.mesd.k12.or.us/blacklists.tgz

After we download the blacklist file we must unpack it inside a folder called blacklist.

tar -zxvf blacklists.tar.gz

You will see a lot of add, spyware and other virus domains and URLs that will be blocked from the user perspective. After adding this file we should compile the SquidGuard module.

squidGuard -b -d -C all

We should also add the permissions for the squid account to the folder blacklists.

chown -R squid /var/squidGuard/blacklists

To use the Squidguard a config line to the squid.conf file must be added.

url_rewrite_program /usr/bin/squidGuard

Now only what is left is to restart the squid service.

service squid reload

ANd this is all to it. After this you can test the settings by trying to browse adult content and see if the content is banned.

Feel free to comment.

Wednesday, November 6, 2013

QOS Traffic Marking and Directing

Packet classification is pivotal to policy techniques that select packets traversing a network element or a particular interface for different types of QoS service. For example, you can use classification to mark certain packets for IP Precedence and you can identify others as belonging to a Resource Reservation Protocol (RSVP) flow.

Access-lists can be used to identify traffic for classification, based on address or port. However, a more robust solution is Cisco’s Network-Based Application Recognition (NBAR), which will dynamically recognize standard or custom applications, and can classify based on payload.

In this scenario I have 3 routers. Router CE1 as a client router should mark traffic from different subnets and attach the IP Precedence (CoS) as a 3bit field in the Layer 2 frame. Marking Ethernet frames is accomplished using the 3-bit 802.1p Class of Service (CoS) field. The CoS field is part of the 4-byte 802.1Q field in an Ethernet header, and thus is only available when 802.1Q VLAN frame tagging is employed.

The second router ISP is configured with a service policy that will check the CoS precedence marking and then convert them with DSCP markings. Differentiated Service Code Point (DSCP) – uses the first six bits of the ToS field. When using DSCP, the ToS field is often referred to as the Differentiated Services (DS) field.

The third router DC1 will have an inbound policy configured to match the DSCP classified traffic and log the changes, with a simple rule to drop some packets from a particular subnet.

Let us start with some basic configuration of the CE1 router. And on the two other routers we setup only basic routing and interface addressing commands.

hostname CE1

interface Loopback0

ip address 5.5.5.5 255.255.255.0

interface Loopback1

ip address 6.6.6.6 255.255.255.0

interface Loopback2

ip address 7.7.7.7 255.255.255.0

interface FastEthernet0/0

ip address 192.168.1.1 255.255.255.252

service-policy output TRAFFIC

ip route 0.0.0.0 0.0.0.0 192.168.1.2

access-list 5 permit 5.5.5.5

access-list 6 permit 6.6.6.6

access-list 7 permit 7.7.7.7

The ISP router command scripts:

hostname ISP

!

interface FastEthernet0/0

ip address 192.168.1.2 255.255.255.0

interface FastEthernet1/0

ip address 172.16.1.1 255.255.255.252

service-policy output TRANSLATE

ip route 5.5.5.0 255.255.255.0 192.168.1.1

ip route 6.6.6.0 255.255.255.0 192.168.1.1

ip route 7.7.7.0 255.255.255.0 192.168.1.1

ip route 99.99.99.0 255.255.255.0 172.16.1.2

And the DC1 router , the initial connectivity command scripts:

hostname DC1

interface Loopback0

ip address 99.99.99.99 255.255.255.0

interface FastEthernet0/0

ip address 172.16.1.2 255.255.255.252

service-policy input DIRECT

ip route 0.0.0.0 0.0.0.0 172.16.1.1

As we can see on further scripts, we have configure three access lists that are used to capture the loopback traffic. Now we should configure the class-maps from those access-groups. And then a service-policy to se the correct precedence on the particular traffic.

class-map match-all Loop0

match access-group 5

class-map match-all Loop1

match access-group 6

class-map match-all Loop2

match access-group 7

policy-map TRAFFIC

class Loop0

set precedence 0

class Loop1

set precedence 1

class Loop2

set precedence 2

This policy will be applied to the output interface of the CE1 router. Next policy will be the input policy of the ISP router that will remap the CoS to DSCP. I will apply it to the interface connected to the DC1 router. These policies can also be applied to the SVI interfaces. The policy is called TRANSLATE.

class-map match-all PR2

match precedence 2

class-map match-all PR0

match precedence 0

class-map match-all PR1

match precedence 1

policy-map TRANSLATE

class PR0

set dscp af11

class PR1

set dscp af12

class PR2

set dscp af13

And finally the policy on the DC1 router will capture the AFxx DSCP markings and drop some packets, and the other ones will be logged.

class-map match-all AF12

match dscp af12

class-map match-all AF13

match dscp af13

class-map match-all AF11

match dscp af11

policy-map DIRECT

class AF11

drop

class AF12

class AF13

To test the settings I will throw away a couple of ping from the CE1 router to the DC1 router. We can notice that the traffic that is classified with AF11 is dropped. And there are many features that can be done. So let us see the results.

To see the QOS statistics for each classified subnet on the DC1 router, we can use the show policy interface command to see if the chain is finished.

And we have the final results. For every class-map we have different packet count, as I was expecting. This is because I have sent different ping requests to see if all of the packets are captured diferently, and they were. So CoS and DSCP chain is working great.

Feel free to comment.

Tuesday, November 5, 2013

How to replace a failed drive on NetApp Filer

Disk failures are very common in storage environment and as a storage administrator we come across this situation very often, how often that depends how much disks your storage systems is having. More disks you manage more often you come across this situation.

Disks that are to be replaced , are depending on a cool NetApp AutoSupport feature, where an IT engineer can use the auto support to claim a new drive. Netapp opens a ticket and sends a new drive from the local representative almost within business hours.

Now first let us see how to find out if the drive has failed. For this blog post I have used a testing simulator and simulated a failed drive. I have use a spare drive, and not the one that is used for data or parity. First visually we can see that the drive has an orange Amber started on the LED notification physically on a filer.

Let us see the output from the disk show command >>>

We can also look at the Volume status using the vol status -f command >>>

The status of the drive is admin removed for one reason, I have done this to simulate a failed drive.

The drive that we are interested in is named: v4.16 If the LED is not lighting on the failed drive we can issue a few command in the advanced mode to see the drive.

priv set advanced

led_on <disk id identified above>

led_off <disk id identified above>

priv set

Now we can replace the old drive with the new on and wait for about two minutes for synchronization before inserting the new one. When new drive is in place run the following command to check whether the ID of the disk you have just fitted is owned or not.

disk show -n

If disk auto assign is enabled it’ll be assigned to the FILER head which had the failed disk, if not you will have to do it manually.

disk assign <disk id> in our case the disk id is v4.16

If it won’t accept the command, it might have been auto assigned to the wrong controller/system. You can clear the assignment from the disk using the following command then try again.

disk assign <disk id> -s unowned -f

The replaced disk will now be assigned as a spare disk to replace the spare which was used when the original failed. You can check Status of this using following command:

aggr status -s

We should then check if the disk autoassign feature is on. In this test scenario it is on by default.

This completes this little tutorial on how to change a failed drive in NetApp filer.

Feel free to comment.

Saturday, November 2, 2013

Linux Kernel messages logging

Debugging the kernel is not necessarily rocket science; in fact it can be achieved using very simple and straight forward techniques and some time, patience and perseverance. This page describes a trick and techniques to help debug the kernel. One important thing is to analyze the kernel messages. They can be seen during the startup, but it is better to record them in a file.

Once the kernel starts, there isn't much to do except watch for potential problems. For RHEL, you will see a Red Hat Enterprise Linux screen with a slow-spinning icon. If you want to watch messages detailing the boot process scroll by, press the Esc key.

At this point, the kernel tries to load the drivers and modules needed to use the hardware on the computer. The main things to look for at this point (although they may scroll by quickly) are hardware failures that may prevent some feature from working properly. Although much more rare than it used to be, there may be no driver available for a piece of hardware, or the wrong drive may get loaded and cause errors.

We will use a couple of commands to capture the kernel messages in a file and then view them using the Less command.

The output of the complete file is too large for this blog screen, so I will use the Less and Tail command to show the last lines output that kernel filled inside the file.

We can see that this particular VM machine has no IPV6 routers present. This can be detected using the ND IPV6 builtin protocol. Many other important messages like: supported CPUs, BIOS version, APIC, and other hardware detection issues can be seen in this file.

What you want to look for are drivers that fail to load or messages that show certain features of the hardware failed to be enabled. As soon as the kernel is done initially detecting hardware and loading drivers,

it passes off control of everything else that needs to be done to boot the system to the init process.

Source http://www.redhat.com/

Friday, November 1, 2013

Data Storage High Availability

Storage High Availability with NetApp

As customers and service providers consolidate more and more applications and workloads onto shared storage infrastructures, it is challenging to maintain an infrastructure that is “always available.” Increased workloads and utilization drive a higher duty cycle, putting pressure on the storage architecture. A broader group of users and applications requires increased coordination of downtime for storage management and hardware upgrades/refreshes to prevent unintended outages. And because many different users, groups, or customers with different needs may be using the shared storage infrastructure at the same time, the impact of a failure proportionally increases.

To reduce the cost and complexity of protecting your IT environment from downtime and data loss with NetApp. NetApp storage is designed with high availability, flexibility, and efficiency in mind. A suite of capabilities within the NetApp FAS platform protects against component failures and even entire system/data center failures to keep your critical business operations running. These functions work in tandem with NetApp storage efficiency technologies to reduce capacity and operational costs so that you can provide high availability (HA) for more of your environment.

HA Pair controller configuration provides data availability by transferring data service of an unavailable controller to the surviving partner. Transfer of data service is often transparent to end users and applications, and the data service is quickly resumed with no detectable interruption to business operation.

Alternate Control Path (ACP) provides outofband management on disk shelves that use serialattached SCSI (SAS) technology. ACP is completely separate from the SAS data path and enhances data availability by enabling the storage controller to nondisruptively and automatically reset a misbehaving component.

ACP technologies are a piece of great and logical craftmenship of th NetApp Filers that delivers full performance boost to the Enterprise. This is done simply by dividing the data and the redundancy paths via separate lanes. We can take a look at the logical diagram:

Controller-to-stack connections: Each storage system controller is connected to each stack of disk shelves through a dedicated Ethernet port:

Controller 1/A always connects to the top shelf IOM A square port in a stack.

Controller 2/B always connects to the bottom shelf IOM B circle port in a stack.

In essence, you daisy all IOMs of all shelves and then connect two remaining ports to both controllers. If you have single controller, you connect just one port. How exactly you daisy chain does not really matter, but keeping suggested order makes it easier to support.

The essence is in the redundancy. So now let us try to explain the HA Pairs and the benefits from them.

What is a HA Pair

An HA pair is two storage systems (nodes) whose controllers are connected to each other either directly or, in the case of a fabric-attached MetroCluster, through switches and FC-VI interconnect adapters.

You can configure the HA pair so that each node in the pair shares access to a common set of storage, subnets, and tape drives, or each node can own its own distinct set of storage. The nodes are connected to each other through a NVRAM adapter, or, in the case of systems with two controllers in a single chassis, through an internal interconnect. This allows one node to serve data that resides on the disks of its failed partner node. Each node continually monitors its partner, mirroring the data for each other’s nonvolatile memory (NVRAM or NVMEM).

Benefits of HA Pair

HA pairs provide fault tolerance and the ability to perform nondisruptive upgrades and maintenance.

Configuring storage systems in an HA pair provides the following benefits:

• Fault tolerance

When one node fails or becomes impaired a takeover occurs, and the partner node continues to

serve the failed node’s data.

• Nondisruptive software upgrades

When you halt one node and allow takeover, the partner node continues to serve data for the

halted node while you upgrade the node you halted.

• Nondisruptive hardware maintenance

When you halt one node and allow takeover, the partner node continues to serve data for the

halted node while you replace or repair hardware in the node you halted.

Source www.netapp.com

NetApp aggregate creation

With disk drives continuing to increase in size, providing the resiliency to protect critical data becomes more challenging. While disks have gotten larger, their overall reliability has remained about the same. Larger disk size means that the time needed to reconstruct a failed disk using Redundant Array of Independent Disks (RAID) parity information has become significantly longer, raising the possibility of a second disk failure or other error occurring before reconstruction can complete. The likelihood of bit and block errors also increases proportionally with the increased media size, making the chances of this type of event during reconstruction a distinct possibility and increasing the chances of a double failure that could disrupt business and cause data loss in single parity RAID implementations.

NetApp pioneered the development of its unique dual-parity RAID implementation, RAID-DP®, to address this resiliency problem. While other dual-parity RAID 6 implementations exist, RAID-DP is the only one that provides protection against double disk failures in the same RAID group with no significant decreases in performance.

An aggregate is made up of RAID Groups. You cannot split a RAID Group between aggregates, but you can have multiple RAID Groups that make up a single aggregate. Always use RAID-DP, which is an implementation of RAID-6 that uses striped data drives and 2 disks reserved solely for parity and diagonal parity. This allows you to lose two disks per RAID Group without losing data. As the ratio of data disks to parity disks goes up, your space efficiency goes up, but also the risk of losing 3 disks in a RG increases. There are also performance implications for a high data disk to parity disk ratio.

First steps in creating an Aggregate is to add it from the web GUI in the FilerView section:

So then we start the wizard:

Next step is to give the aggregate a name and always check double parity. This is a cool redundant feature from NetApp that gives you security in a Raid Group:

Raid Group sizes should not exceed 16 disks. An aggregate can contain multiple RAID groups. If I had created an aggregate with 24 disks, then Data ONTAP would have created two RAID groups. The first RAID group would be fully populated with 16 disks (14 data disks and two parity disks) and the second RAID group would have contained 8 disks (6 data disks and two parity disks). This is a perfectly normal situation.

I have skipped two steps: disk selection and disk type, that should be left to default. The next thing to be chosen is the size of the disks. I have chosen 1020MB as the filer recommended.

This aggregate is for testing purposes, so in a production environment you should create it as big as the projects need it. I have added for starters only 3 drives, which can be dynamically changed.

The final step is to commit the creation of aggregates. It you do not get any errors here that everything is fine.

And we have a second aggr1 created and ready for use:

To be continued. I hope you got a glimpse of the NetApp architecture. Some next posts will be used to define the logical organization of NetApp. Things like FlexVolumes, NFS and CIFS shares and Qtrees.

Feel free to comment.