Showing posts with label sysadmin. Show all posts
Showing posts with label sysadmin. Show all posts

Thursday, September 29, 2016

Be more intelligent with your `sleep`--or-- how to overengineer your scripts

The universal solution to waiting for something to be ready in shell scripting is the `sleep` command.

Here, we're waiting for a dir to be created (say from a `yum install httpd` going on in another terminal), so we can `ls` the contents, perhaps as part of a script that configures httpd.

sleep 6
ls /etc/init.d/httpd

But is there a better way? What if the directory exists almost immediately? You've wasted nearly 6 seconds unnecessarily, which if you do this a lot in your script, adds a bunch of time.

We need a way to continuously query if the resource exists. And it needs to be with a command that sets an error exit code if the resource is not found. Let's use `stat` to do this.

Leveraging the fact that `stat` will set a non-zero exit code on failure

while true; do
  stat /etc/init.d/httpd
  if [ $? -eq 0 ]; then # check if the stat was successful
    break
  fi
done
ls /etc/init.d/httpd

'true' is a command, whose output is nothing and whose exit status is 0. In terms of performance we'd probably like not to call a binary, but `true` is a shell built-in. The no-op operator `:` accomplishes the same thing, ex: `while :; do`.

We can simplify the if statement by using the && operator which executes the following command break, if stat exits without error (sets a 0 status)

while true; do
  stat /etc/init.d/httpd && break
done
ls /etc/init.d/httpd

Instead of stat, we can use the `test` command (aliased as `[`). Here we check for a file using -f.

while true; do
  if [ -f /etc/init.d/httpd ]; then break; fi
done
ls /etc/init.d/httpd

However, in these examples, if the file never exists, the loop will never exit.

So, instead we can define a timeout and a shorter `sleep` interval, and a counter (i) to track the iterations:

i=0
while [ "$i" -lt 6 ]; do
  if [ -f /etc/init.d/httpd ]; then break; fi
  sleep 1
  (( i++ )) # built-in arithmetic
done
ls /etc/init.d/httpd

Quote i for protection. Use the break keyword to escape the while loop.

Alternatively, use Bash's built-in arithmetic:
i=0
while (( i < 6 )); do
  if [ -f /etc/init.d/httpd ]; then break; fi
  sleep 1
  (( i++ ))
done
ls /etc/init.d/httpd

This is a good start. Calling test (even as a builtin) in an infinite loop is also wasteful. If you are in bash, you can use the [[ keyword, which has the added benefit of protecting you against unquoted variables in a comparison.

i=0
while (( i < 6 )); do
  if [[ -f /etc/init.d/httpd ]]; then break; fi

  sleep 1
  (( i++ ))
done
ls /etc/init.d/httpd

There is a bug here. ls will run no matter whether the file was found or not.

So now we exit the loop, but how do we notify the caller that it failed? The `break` statement does not return a non-zero. As far as the shell is concerned, the loop completed. We can use a RETVAL variable that we set explicitly to 0 when it succeeds, and 143 to mean "file not found".

i=0
while (( i < 6 )); do
  if [[ -f "$path" ]];then
      RETVAL=0
  else
      RETVAL=143
  fi
    
  [[ "$RETVAL" -eq 0 ]] && break

  sleep 1
  (( i++ ))
done
[[ "$RETVAL" -eq 0 ]] && ls /etc/init.d/httpd || echo "ERR: file never created"

I quote variables in the [[ ]] here even though it's not actually required.

Which works fine, but we can simplify by simply unsetting RETVAL if there's a success. The test is expressed with [[ -z $xxx ]]. Also it is best practice to send error messages to STDERR (file descriptor --or fd-- #2), using >&2.

i=0
while (( i < 6 )); do
  if [[ -f "$path" ]];then
    unset RETVAL
  else
    RETVAL=143
  fi
    
  [[ -z "$RETVAL" ]] && break

  sleep 1
  (( i++ ))
done
[[ -z "$RETVAL" ]] && ls /etc/init.d/httpd || echo "ERR: file never created" >&2

Now, let's abstract the variables, and set an exit status.

i=0; path="/etc/init.d/httpd"; timeout=6
while (( i < $timeout )); do
  if [[ -f "$path" ]]; then
    unset RETVAL
  else
    RETVAL=143
  fi
    
  [[ -z "$RETVAL" ]] && break

  sleep 1
  (( i++ ))
done
[[ -z "$RETVAL" ]] && ls "$path" || (echo "ERR: file never created" >&2; exit $RETVAL )

Notice that the `exit` is called from a subshell, as if it is called in the current context, it will exit your interactive shell which is annoying and undesirable. There is no way to use return in this context since this is not yet in a function.

Testing with a bogus file.

$ path=/etc/bogusfile
$ while (( i < $timeout )); do
if [[ -f "$path" ]]; then
>   unset RETVAL
> else
>   RETVAL=143
> fi
> [[ -z "$RETVAL" ]] && break
> sleep 1
> (( i++ ))
> done

$ [[ -z "$RETVAL" ]] && ls "$path" || (echo "ERR: file never created"; exit $RETVAL)
ERR: file never created
$ echo $?
143

Now put it in a reusable function!

_testforfile () {
  local i=0
  local timeout="$1"
  local path="$2"

  while (( i < $timeout )); do
    if [[ -f "$path" ]];then
      unset RETVAL
    else
      local RETVAL=143
    fi

    [[ -z "$RETVAL" ]] && break

    sleep 1
    (( i++ ))
  done
  [[ -z "$RETVAL" ]] && ls "$path" || (echo "ERR: file never created" >&2; return "$RETVAL" )
}

Note the use of the `local` keyword so we don't have our custom variables pollute the invoking environment, and the change of exit to return. This function accepts two parameters as input. Call it like so:

_testforfile 6 /etc/init.d/httpd

Now, we can enhance this by making sure that at least the first argument is numeric, and even set that to a default value of 5 if it was not provided.

_testforfile () {
  local i=0
  local timeout="${2:-5}"
  local path="$1"

  local re='^[0-9]+$'
  if ! [[ $timeout =~ $re ]]; then
    echo "ERR: Timeout was not a number" >&2
    return 1
  fi

  while (( i < $timeout )); do
    if [[ -f "$path" ]];then
      unset RETVAL
    else
      local RETVAL=143
    fi
    
    [[ -z "$RETVAL" ]] && break
    
    sleep 1
    (( i++ ))
  done
  [[ -z "$RETVAL" ]] && ls "$path" || (echo "ERR: file never created" >&2; return "$RETVAL" )
}

re is a regular expression used in conjunction with the =~ operator.

In action:
$ _testforfile 47d /etc/hosts
ERR: Timeout was not a number

And maybe some tests in a future post.

Thursday, September 22, 2016

Kill an X application when you have multiple X servers running

Mistakenly started a screensaver on my Chromoting session. Needless to say, a real pain when you have to unlock your local workstation AND the remote end after a minute of inactivity.

Of course, now you have duplicate screensavers running, and it's not apparent in the process listing which one belongs to which X server. I certainly don't want to kill the real screensaver on :0, leaving my local console unlocked!

$ ps ax|grep screensave
  6166 ?        S     16:30 xautolock -time 1 -locker xscreensaver-command -lock -detectsleep -corners -+00 -cornerdelay 1
 17200 ?        S      9:25 xscreensaver -nosplash
 25747 ?        S      7:13 xautolock -time 1 -locker xscreensaver-command -lock -detectsleep -corners -+00 -cornerdelay 1
 56143 pts/17   SN+    0:00 grep --color=auto screensave
140723 ?        S      0:07 xscreensaver -nosplash

Seeing as the DISPLAY var is a part of the environment where the X application was invoked, it should be in:

/proc/<PID>/environ

cat /proc/<PID>/environ gives a whole block of unbroken text (unsure why this is not broken), but if you look closely you'll find the DISPLAY variable in there.

The awk command was borrowed from elsewhere on the Internet. It basically says, break things apart using '=' as a field separator (FS) and \0 as a return separator (RS). I tried investigating this further, seeing as '\0' is the null return used in some GNU utils including xargs, went down the following path:

This seemed like the most straightforward way to break it out:
/bin/echo -e $(cat /proc/137572/environ)

However the output was still not broken into lines. Oh well.

Note the use of /bin/echo as echo is usually a shell built-in.

Anyway, let's use awk like I mentioned before, by substituting the PIDs I found above.


PID=17200
$ awk 'BEGIN{FS="="; RS="\0"}  $1=="DISPLAY" {print $2; exit}' /proc/$PID/environ
:0

Well that's not it.

PID=140723
$ awk 'BEGIN{FS="="; RS="\0"}  $1=="DISPLAY" {print $2; exit}' /proc/$PID/environ
:20

Bingo.

Tie it together loosely:
for PID in $(pgrep xscreensa); do echo -n $PID; awk 'BEGIN{FS="="; RS="\0"}  $1=="DISPLAY" {print $2; exit}' /proc/$PID/environ;done
6166:0
25747:20

Now I can quickly tell which processe below to which DISPLAY.


Friday, January 16, 2015

VPN from a misconfigured cafe using NAT and Linux network namespaces (netns)

Recently I found myself at a cafe that had a wifi connection that was using the whole 10.0.0.0/8 subnet, meaning all addresses from 10.0.0.1-10.255.255.254. This was set up by a professional networking company. In my opinion, someone needs to re-do their CCNA.

So what this means is that if your corporate network is, say, on 10.34.0.0, you will be unable to route traffic easily over the VPN.

I am told there are 2 ways of getting around this

  1. Use network namespaces and NAT'ing to run your chosen applications in their own namespace that is NAT'ed through your real connection
  2. Use iptables prerouting if you know which subnets you are trying to get to on the other side of the VPN.
  3. Convince your coffee shop to use a sane network architecture
I chose #1 for now, and this guide goes over that.

Let's Get Started

Add the network namespace and confirm that it was created:

# ip netns add vpn_nat
# ip netns list

Add virtual ethernet interfaces (peers)
# ip link add name veth0 type veth peer name veth1

Move one of those peers into the vpn_nat namespace
# ip link set veth1 netns vpn_nat

In the namespace context, set up the network
# ip netns exec vpn_nat ifconfig lo up
# ip netns exec vpn_nat ifconfig veth1 192.168.148.2/24 up
# ip netns exec vpn_nat route add default gw 192.168.148.1

The eagle-eyed reader will notice that I am pointing to a gateway that doesn't exist! We fix that like so:
# ifconfig veth0 192.168.148.1/24 up

Test that the vpn_nat namespace can reach veth0

Execute ping in the namespace context vpn_nat:
# ip netns exec vpn_nat ping 192.168.148.1
PING 192.168.148.1 (192.168.148.1) 56(84) bytes of data.
64 bytes from 192.168.148.1: icmp_seq=1 ttl=64 time=0.088 ms
64 bytes from 192.168.148.1: icmp_seq=2 ttl=64 time=0.041 ms

The next step is to connect the veth0 to your physical network either using NAT or bridging. This requires the masquerading kernel module, but I believe it gets loaded automatically.
# sysctl net.ipv4.ip_forward=1
# iptables -t nat -A POSTROUTING -s 192.168.148/24 -d 0.0.0.0/0 -j MASQUERADE

Verify the routing tables

# iptables -t nat -L -n

Ping a google address in the namespace context

#  ip netns exec vpn_nat ping www.google.com

Verify the routing table in the netns

# ip netns exec vpn_natroute

Run your application in the namespace

I am running as an unprivileged user
$  ip netns exec vpn_nat firefox

Undoing

# iptables -t nat -D POSTROUTING 1

References

http://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/
http://how-to.wikia.com/wiki/How_to_set_up_a_NAT_router_on_a_Linux-based_computer
http://www.opencloudblog.com/?p=66

Saturday, April 5, 2014

Storage Debate: NAS build or buy?

You're a storage guru. You eat storage for breakfast, lunch, and dinner. Storage fears you. I/O buses cower in your presence.

Okay, perhaps you aren't this intense. You don't have a VNX storage array in your bedroom. You aren't any lesser of a man or woman because of that.

But you still need to store and backup your stuff! So you've decided that you want to do it in-house. What are some considerations? Read on to find out.

Well, what about the cloud?

The cloud is in the full throes of popularity, but with the recent NSA and privacy violations from a multitude of companies, it's no wonder that many people, like myself, prefer to keep their data in their own hands.

In the arena of self-hosted data, it's a balance between cost, functionality, and complexity.

Here, I'll look at the difference between Network Attached Storage devices--which are network attached appliances that simply let you store files--and a home-built PC file server.

Cutting Costs

The geeks among you may be saying, "why not just buy a computer with lots of storage bays and create an array and share it? I mean, a NAS is a computer!"

Of course, this is all true. It's also true that a PC is cheaper to build, will be faster than the mostly ARM-based NASes on the market today, and can also do double duty with other roles and services. A PC contains commodity hardware that is both affordable, and readily available.

When it comes to cost, a PC is almost always cheaper.

However, this does not factor in how valuable your time is or whether your intent is to learn the ins-and-outs of storage configuration, which can be a tiresome road to travel.

In my case, I do not have the luxury of spending days tweaking a storage server, or logging in every week to kill runaway processes. I do not have the interest currently in researching all of the best-practice configuration file flags for critical file sharing services.

Future-proof & easy maintenance

NAS RAIDs are standard software RAIDs running on Linux's MD infrastructure. The MD subsystem is tried and true, but lacks some neat features offered at a bit of a higher level--that a NAS will not support. One such feature is the Logical Volume Manager of Linux, which allows very flexible allocation and de-allocation of storage into logical or virtual storage volumes. I use LVM quite frequently to grow shared storage by replacing drive, with almost zero downtime.

Do you really want your NAS to be a full-blown computer? Subject to routine upgrade and maintenance needs, viruses. Do you want to manage a server? Are you a sysadmin?

I've compared Synlogy, QNAP, and Thecus, the 3 underdogs of storage. (and NETGEAR, but I would relegate their NAS offerings to do duty as very well-built doorstops). My current favorite is QNAP.

The boxes are well-built, performant, and QNAP participates in the community. The hardware has gotten so slick, with such cool features as HDMI output, that in the hardware, they've actually pre-installed the XBMC Media Center! I applaud QNAP for going with a best-of-breed existing solution rather than trying to home-bake their own multimedia management solution.  That would have simply ended up being terrible and cause them lots of negative press. XBMC is almost universally recognized as one of the best unified entertainment centers among both the commercial and open-source offerings. It really is that good, and the documentation is even better, the XBMC wiki is second to none.

As I hear, Synology is offering similar features, but if you are one to consider who was there first (me), then you choose that vendor to encourage them to be ahead of the curve.

Joining a domain

If you are building your own storage server and want to integrate it into a domain, you are on your own in trying to get Kerberos and Samba talking Active Directory to your Domain Controllers. It's a secret sauce that Microsoft has kept the recipe under wraps for quite awhile.

Don't get me wrong, it's certainly possible, and many smart people can get this running very quickly on a Linux box, but I am not one of those people.

Finesse and Purposeful Duty

Any PC can be cheap, but a power-mising, living-room-quiet, compact mini PC starts to get into NAS cost territory anyway. A small ITX motherboard, compact case, and other PC components are expensive.

In that respect, a NAS is really great value!

But, even if you thought the PC was the cheaper solution, you've still gotta configure the RAID, keep track of what drives in which bays are on which SATA ports and hope you don't destroy your array if you remove the wrong drive when you go to do maintenance.

Add to this that most NASes are hot-plug ready, meaning zero downtime if you are just swapping out a drive on a RAID5 volume for example.

Plug and Play Apps

Most NAS devices these days come with cool addons, one of which is XBMC as mentioned earlier. I use my QNAP to stream webcams and create time lapses. I use it as a remote music player, since it has a nice streaming interface. I rsync my backups to it, I use it to tunnel SSH ports into my home network as a pseudo-VPN. I use it for VPN (OpenVPN). I use it to Torrent things and time them so I am not torrenting during peak evening hours when the Internet is slow.

You can plug in a USB audio stick and stream tunes straight from the NAS into your stereo system. It talks to your USB UPS and will shut down gracefully.

There are so many neat things you can use a low power, embedded Linux system for, it's quite amazing.

What to do with your hard-earned money?

Give it to QNAP or Synology, browse their respective forums, and enjoy easier and safer access to your files, along with a slew of great add-on apps.

Monday, September 30, 2013

Xerox and Konica-Minolta administrator web interface lockout when control panel in use

I've worked with a fair number of all-in-one professional copiers for business, such as Xerox Workcentre, Konica Minolta BizHub, Canon imageRunner, and Ricoh Aficio.

In this new area of network-everything, a strange area emerges where traditional office appliance service companies are offering devices that are heavily IT-integrated, and yet they do not know the intricacies of managing devices in this way. Therefore I find myself taking administrative control of these appliances from an IT standpoint and letting them manage the physical hardware.

It is super convenient to be able to login to a web UI and manage a diverse set of options on these devices. However, during my troubleshooting, I've run into an issue with more than one manufacturer where you cannot change settings in the Web UI when the printer is processing a job or a user is at the front panel.

The Konica-Minolta front panel is exceptionally bad: it will prevent a user from making copies if it detects ANY activity in the web UI, and the time out is quite long.

The Xerox machines, while quite fast--and with by far the most configurable options--are plagued by the opposite effect, where the front panel overrides the web UI administrator. You cannot apply any settings when the printer thinks it has a user at the front panel. In my experience this detection mechanism is very poorly written and the user will insist that they are not at the control panel even though the web UI reports this. Add to this that the timeout before the printer thinks no one is working is set very long, and cannot be changed in the web UI itself and you get an unproductive 15 minutes for a task that should have taken 2.

This must be frustrating for administrators working in universities where a library printer may never not have a user at the keypad. I also work with medical clinics where the machine is in use 98% of the time during working hours.

When I get a service call for one of these machines, like when the scan-to-email function is not working, my first goal is to use VPN or SSH tunneling to get at the device's web UI to check the settings. And having your hands tied like this is immensely frustrating, especially when it interferes with even benign operations like adding an address book contact. The device manufacturers should know better!

Considering cheap Brother devices do not have such interlocks on web UI administration, I'm hard pressed to reward Xerox and Konica-Minolta in particular with kudos, despite how nice the devices themselves are.

I expect the device to trust me as a sysadmin to change settings while the device is in use. It's just poor or lazy design.

Perhaps some of these settings can be changed over SNMP which gets around this limitation?

As a side note, my personal picks are Ricoh Aficios and Xerox Workcentres. However, I do not know if the Aficio line suffers from this interlock problem.

Good luck to all you sysadmins out there managing office printers!

References

http://forum.support.xerox.com/t5/Hardware/Unable-to-enter-Administrator-Mode/td-p/740