Showing posts with label metrics. Show all posts
Showing posts with label metrics. Show all posts

Thursday, December 24, 2015

Getting started with data logging on the Raspberry Pi

I've got a friend who just got a Raspberry Pi and wants to try doing some projects. One of the first things that he wants to do is track temperature and humidity in his house, which is a really good place to start because it's not TOO difficult. Great place to get your feet wet playing with "physical computing".

So, he got the Pi, got an OS on it, got it booted up, and then said "I have no idea what to do with it." So I thought a little, and realized, if the goal is temp logging, there's a really, really easy place to start.

I sent him this:

Open a terminal and copy this into a file called "log-cpu-temp": (do you know vi? if not, use nano)

#!/bin/bash

eval $(date +'now=%s date=%y%m%d')

echo "cpu.temp,$(< /sys/class/thermal/thermal_zone0/temp),$now" > \
    $HOME/cputemp-$date.csv


Then, make it executable:

chmod +x log-cpu-temp

Test it by running it:

./log-cpu-temp

It should create a file in your home directory cputemp-151223.csv containing the current CPU temperature in millideg C - something along the line of "35780" which means 35.780 deg C.

Once it's running, then add it to your crontab:

crontab -e

and add to the bottom:

* * * * * /home/pi/log-cpu-temp

Once you save and exit, it'll log the CPU temperature to the CSV file every minute.

Welcome to data logging!

Wednesday, December 23, 2015

Update on using Graphite with FreeNAS

A while back, I posted on using Graphite with FreeNAS. Well, there have been some changes with the recent versions, and this makes integrating Graphite with FreeNAS even easier, so it's time for an update. This applies to FreeNAS-9.3-STABLE.

FreeNAS collects metrics on itself using collectd. This is a nice program which does nothing but gather metrics, and gather them well. FreeNAS gathers basic metrics on itself - cpu, disk performance, disk space, network interfaces, memory, processes, swap, uptime, and ZFS stats, and logs it to RRD databases which can be accessed via the Reporting tab. However, as nice as that is, I much prefer the Graphite TSDB (time-series database) for storing and displaying metrics.

Previously, I was editing the collectd.conf directly, but since the collectd.conf is dynamically generated, and I'd have to add the same block of code each time that happened, I decided to move my additions to the collectd.conf into files stored on my zpool. I then just use the include directive added to the end of the native collectd.conf to call out those files. So, at this point, all I add to the native collectd.conf is this line:

Include "/mnt/sto/config/collectd/*.conf"

This makes my edits really easy, and allows me to create a script to check for it and fix it if necessary - more on that later.

In the /mnt/sto/config/collectd/ directory, I have several files - graphite.conf, hostname.conf, ntpd.conf, and ping.conf.

The graphite.conf loads and defines the write_graphite plugin:

LoadPlugin write_graphite
<Plugin "write_graphite">
  <Node "graphite">
    Host "graphite.example.net"
    Port "2003"
    Protocol "tcp"
    LogSendErrors true
    Prefix "servers."
    Postfix ""
    StoreRates true
    AlwaysAppendDS false
    EscapeCharacter "_"
  </Node>
</Plugin>

It's worth mentioning that some of the other TSDBs out there accept Graphite's native plain-text format, so this could be used with them just as well. Or, if you had another collectd host, you could use collectd's "network" plugin to send to those.

The hostname.conf redefines the hostname. The native collectd.conf uses "localhost", and that does no good when logging to a graphite server which is receiving metrics from many hosts, so I force it to the hostname of my FreeNAS system:

Hostname "nas"

In order for this to not break the Reporting tab in FreeNAS (not that I use that anymore with the metrics in Graphite) I first need to move the local RRD databases to my zpool by chcking "Reporting Database" under the "System Dataset" in the "System tab:



I then go to the RRD directory, move "localhost" to "nas", and then symlink nas to localhost:

lrwxr-xr-x   1 root  wheel       3 May 19  2015 localhost -> nas
drwxr-xr-x  83 root  wheel      83 Dec 20 10:23 nas

This way, redefining the hostname in collectd causes the RRD data to be written to the "nas" directory, but when the GUI looks for the "localhost" directory, it still finds what it's looking for and displays the metrics properly.

The ntpd.conf enables ntpd logging, which I use to monitor the time offsets on my FreeNAS box on my Icinga2 monitoring host:

LoadPlugin ntpd
<Plugin "ntpd">
        Host "localhost"
        Port 123
        ReverseLookups false
</Plugin>


Finally, ping.conf calls the Exec plugin to echo a value of "1" all the time:

LoadPlugin "exec"
<Plugin "ntpd">
  Exec "nobody:nobody" "/bin/echo" "PUTVAL nas/collectd/ping N:1"
</Plugin "ntpd">


I use this on my Icinga2 server to check the health of the collectd data, and have a dependency on this check for all the other Graphite-based checks. This way, if collectd breaks, I get alerted on collectd being broken - the actual problem. This prevents a flurry of alerts on all the things I'm checking from Graphite, which makes deciphering the actual problem more difficult.

So, I define the Graphite writer, I change the hostname so the metrics show up on the Graphite host with the proper servers.nas.* path, and I add two more groups of metrics to the default configuration. These configuration files are stored on my zpool, so even if my FreeNAS boot drive craps out (which actually happened last week) and I have to reload the OS from scratch, I don't lose these files.

Since I'm only adding one line to the bottom of the collectd.conf file, it becomes very easy to check for my additions, and if necessary, add them. I have a short script which I run via cron: (the "Tasks" tab in the FreeNAS GUI)

#!/bin/bash

# Set the file path and the line I want to add
conf=/etc/local/collectd.conf
inc='Include "/mnt/sto/config/collectd/*.conf"'

# Fail if I'm not running as root
if (( EUID ))
then
  echo "ERROR: Must be run as root. Exiting." >&2
  exit 1
fi

# Check to see if the line is in the config file
if grep -q Include $conf
then
    : All good, exit quietly.
else
    : Missing the include line! Add it!
    echo "$inc" >> $conf
    service collectd restart
    logger -p user.warn -t "collectd" \
         "Added Include line to collectd.conf and restarted."

    echo "Added include to collectd.conf" | \
         mail -s "Collectd fixed on NAS" mymyselfandi@example.com
fi


If I reboot my FreeNAS system, the collectd.conf gets reverted. Not a huge problem if I can wait no more than 30 minutes for my cron job to run, but in 9.3, I can do even better. I can call the script at boot time as a postinit script from the Init/Shutdown Scripts section of "Tasks":

 

This way, when I boot the system, it runs the check script, which sees the missing Include line, adds it automatically, and restarts collectd so it resumes logging to my Graphite server.

This setup has proven to be wonderfully reliable, and unless/until native Graphite support is added to FreeNAS, should keep on working.

Tuesday, October 20, 2015

Logging output from a DHT22 temp/humidity sensor to Graphite via collectd on the Raspberry Pi

Update: 11/29/2015 - Since I initially wrote this, I'm deciding that this should be in the "just because you can, doesn't mean you should" category. I am finding it much easier to query the sensor and then submit the metrics to Graphite directly using the plaintext protocol. You can do it with collectd, but it just introduces far too many complications to really be worth it.

This is a bit of a fringe case, but if I don't write down what I just learned, I'll totally forget it. I got a DHT22 temperature and humidity sensor, and unlike the 1-wire DS18B20 temperature sensor, there isn't a convenient kernel module so I can't just read from the /sys filesystem. Thankfully, Adafruit has a nice guide to using the DHT22 with the Raspberry Pi, and they've got a GitHub repository with the code needed to query the sensor.

Of course, due to my love of Graphite, I need to immediately get my DHT22 not only working, but logging to Graphite, because METRICS. (funny how that word used to annoy me) I could simply modify the Adafruit code to output in Graphite plaintext format, but since I use collectd for gathering my host-based metrics anyway, let's have it do the work and submit everything to graphite together.

I could modify the python script to output in the collectd format, and call that with the Exec plugin, but since the python code needs to be run as root, I decided to keep it pretty minimal, and write a shell wrapper around it, because I know shell. ^_^

The biggest problem I ran into was getting my data to log, even though the script was working, and I discovered this limitation: with the exec data format, the identifier - the path of the metric being collected - has to have a very specific format: hostname/exec-instance/type-instance. "hostname" is pretty obvious, and is defined by COLLECTD_HOSTNAME. (as documented in the Exec plugin docs) exec-instance just has to be a unique instance name, and with this being the only exec plugin I'm running, uniqueness is easy. The last entry, type-instance has to have "type" being a valid type as defined in /usr/share/collectd/types.db, and "instance" again is any unique name. Once I changed my metric path identifier to match this standard, my stuff started logging.

Here's my modified Adafruit python script to gather the data from the DHT22:
https://github.com/ChrisHeerschap/lakehouse/blob/master/dht.py

Here's the shell script wrapper called by collectd:
https://github.com/ChrisHeerschap/lakehouse/blob/master/dht-collectd

And, here's what it looks like when the data gets into Graphite, referencing the DHT22 against a DS18B20 1-Wire sensor:


Saturday, August 22, 2015

Logging weather data with Raspberry Pi and collectd

I have a Raspberry Pi that I'm using to track temperature inside a house that isn't occupied year-round. I have a DS18B20 1-wire temperature sensor sticking out of the case of the Pi, but I also have it connecting to an Ambient Weather WS-1400-IP weather station, tracking the inside and outside temperatures. Initially, I was using curl with some bash tomfoolery to get the values, and alert if necessary. (inside temperature too cold, risk of pipes freezing, etc) This works, and works well, but recently I've been doing a bunch of work with Graphite and Collectd as part of a monitoring system at work. While Graphite is really, really awesome (I have several blog posts I want to do on it) I've been more and more impressed with the flexibility of Collectd. I've even got a Graphite server running at home with Collectd running on all my systems (except laptops, at the moment) just because it's cool and I'm a metrics nerd.

Well, I took another look at the Raspberry Pi that I have in this remote location, and decided that I wanted to set up a simple collectd to start logging some of the metrics on this system. Since I don't have a Graphite server there, I decided I'd just write the metrics to CSV files, so I'd have them, and if I needed, I could import those into my home Graphite server.

I started out with just some basic metrics, cpu, df, disk, load, memory, ntpd, ping, and swap. Got it up and running, looked good, started logging to the CSV files in my home directory. Cool. I then used the collectd APCUPSD plugin to grab metrics from the UPS that's connected to the system. Easy peasy.

Then I remembered coming across this awesome post a while back where the writer uses the collectd cURL plugin to read both CPU temps and connected 1-Wire sensors. I thought that was just cooler than the Poconos in the middle of February, so I had to go ahead and add the following config:

<Plugin curl>
  <Page "CpuTemp">
    URL "file:///sys/class/thermal/thermal_zone0/temp"
      <Match>
        Regex "([0-9]*)"
        DSType "GaugeLast"
        Type "temperature"
        Instance "CPUTemp"
      </Match>
  </Page>

  <Page "1WireTemp">
    URL "file:///sys/bus/w1/devices/28-0004330af2ff/w1_slave"
      <Match>
        Regex "(-+[0-9][0-9][0-9]+)"
        DSType "GaugeLast"
        Type "temperature"
        Instance "Room"
      </Match>
  </Page>
</Plugin>

I adjusted the file path to the 1Wire sensor, as the writer mentions "1wire-filesystem", which I don't use, so I just used the normal path to the 1-wire sensor. A couple small adjustments, and blammo, I'm logging CPU temperature and room temperature from the 1-wire sensor. Okay, granted - the 1-wire temperature is being written in millidegrees C, so 23 degrees C logs as "23000". At some point I'll discover how to change the data before it gets logged, but if I'm accessing this data in Graphite, I'll be able to use the wonderful Graphite functions (specifically, offset and scale) to convert it into whatever I want. It doesn't matter how I log it, just that I do.

If you're curious, the Graphite target for this conversion from millidegrees C to degrees F would be:

target=offset(scale(path.to.1wire.metric,0.0018),32)

That's 9/5, divided by 1000, and offset by 32 degrees. All calculated live on the Graphite server. Slick.

Then I realize -- this is the cURL plugin. The WS-1400-IP weather station has a base unit which makes the live weather data available on a web page, http://{weather station IP}/livedata.htm - why couldn't I just use the cURL plugin and hit that page?

Well, no reason aside from not being familiar with the cURL plugin - yet. Using the example above as a template, I started off with a simple config (added to the block above) to grab the indoor and outdoor temperatures:

  <Page "WeatherStation">
    URL "http://{weather station IP}/livedata.htm"
      <Match>
        Regex "inTemp.*value=\"(-+[0-9]+\.*[0-9]*)"
        DSType "GaugeLast"
        Type "temperature"
        Instance "insideTemp"
      </Match>
      <Match>
        Regex "outTemp.*value=\"(-+[0-9]+\.*[0-9]*)"
        DSType "GaugeLast"
        Type "temperature"
        Instance "outsideTemp"
      </Match>
  </Page>

Restarted Collectd, and what do you know - it's logging those temperatures! Well, that was easy!

Let me just explain how this works. I define a page "WeatherStation" which has the URL. Then, I define any number of matches against that one page (so I only have to load it once for all the metrics I'm collecting) to grab metrics. In the case of the inside temp, the output line in the HTML looks like this:

 <td bgcolor="#EDEFEF"><input name="inTemp" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="72.3" maxlength="5" /></td>

So, I write the regex to match against that line, starting with inTemp, then matching anything up to value=". Because the regex is surrounded by double-quotes, I had to escape the double quote in the regex with a backslash. I then put parenthesis around the actual metric that I want to extract. One thing to note is that I'm looking for zero or one "-" indicators (it gets cold up there) followed by one or more number, optionally followed by a literal "." (the decimal point) and more numbers. This allows my regex to match positive or negative integer or decimal values, which is important as the weather station reports temps to 0.1 degree, and humidity is logged as an integer. (percentage)

Starting with this template, it's only a moment of work to add the remaining metrics - inside and outside humidity, absolute and relative pressure, wind direction, speed and gust speed, solar radiation, UV value and UV index, as well as rainfall accumulations.

If you read the description on the weather station, you'll see that it can report all of these metrics to Weather Underground, and I do have it configured to do just that. That's all well and good, but sending all that data to them, I don't have access to it, and I can't graph it exactly the way that I want, but now that I'm logging this data, and I can get it into Graphite - I can.

Thursday, April 23, 2015

Logging FreeNAS performance data into Graphite

Update 12/23/2015 - I now have an updated post which supercedes this post.

Update 12/2/2015 - This information is dated, and there's a really good way to handle FreeNAS logging to Graphite with FreeNAS 9.3 that I need to document. I'll update this post with a link once I get that post done. In the meantime, this is just a placeholder.

FreeNAS is a great NAS appliance that I have been known to install just about anywhere I can. One of the things that makes it so cool is the native support for RRD graphs which you can view in the "Reporting" tab. What would be cooler, though, is if it could log its data to the seriously awesome metrics collection software Graphite. I've been working on getting Graphite installed at work, and have always wanted metrics collection at my house (because: nerd) and so once I got a Graphite server up and running, one of the first things I did was modify my FreeNAS system to point to the Graphite server.

Here are the steps I followed on FreeNAS 9.2.1.8.  I'm overdue for FreeNAS 9.3 and once I've done the upgrade, I'll update these instructions as necessary.


  1. Install a Graphite server.  Four little words which sound so easy, but mask the thrill and heartbreak that can come with trying to accomplish this task. I tried several guides and was a little daunted when I saw most of them mentioning how much of a pain in the ass installing Graphite can be, but then I managed to find this nice, simple guide that used the EPEL packages on a CentOS box. Following those instructions, I managed to get two CentOS 6 boxes up and running in pretty short order, and then with some slight modifications, I set up a CentOS 7 graphite server at home.
  2. Make sure port 2003 on your Graphite box is visible to your FreeNAS box. This usually involves opening some firewall rules.
  3. SSH into your FreeNAS box. (If you don't know what this means, you probably never got to this step, as "Install a Graphite server" would have entirely broken your will to live.) You will also need to log into your FreeNAS box as either root, or a user that has sudo permission.
  4. Edit the collectd config file:
    1. sudo vi /etc/local/collectd.conf
    2. At the top of the file, change "Hostname" from "localhost" to your hostname for the NAS. Otherwise, your NAS will report to the Graphite host as "localhost", and that's less than useful.

      Hostname "nastard"
      ...
    3. There is a block of lines all starting with "LoadPlugin". At the bottom of this block, add "LoadPlugin write_graphite":

      ...
      LoadPlugin processes
      LoadPlugin rrdtool
      LoadPlugin swap
      LoadPlugin uptime
      LoadPlugin syslog
      LoadPlugin write_graphite

      ...
    4. At the bottom of the file, add the following block, substituting the hostname for your graphite hostname:

      ...

       
          Host "graphite.example.net"
          Port "2003"
          Protocol "tcp"
          LogSendErrors true
          Prefix "servers."
          Postfix ""
          StoreRates true
          AlwaysAppendDS false
          EscapeCharacter "_"
       
  5. Change to the directory "/var/db/collectd/rrd" - this is where FreeNAS logs its RRD metrics that is visible in the GUI. If we just restart collectd, it's going to start logging under the hostname (since we changed that above) and that'll break the UI's RRD graphs. While we'll still have the data in graphite, we can have our graphite cake and RRD eat it too by doing the following steps.
  6. Shut down collectd:

    sudo service collectd stop
  7. Move the "localhost" directory (under /var/db/collectd/rrd) to whatever you set the hostname to in the collectd.conf above:

    sudo mv localhost nastard
  8. Symlink the directory back to "localhost":

    sudo ln -s nastard localhost
  9. Restart collectd:

    sudo service collectd start
  10. That's it! At this point, you can reload the FreeNAS GUI and see that you still have your RRD data, but more importantly, if you go to your Graphite GUI, you'll see that you should now be getting metrics.

Protips:

  • Collectd writes data every 10 seconds by default. If you write all your collectd data with the "servers." prefix as I've shown above, you can make sure your whisper files are configured for this interval with the following block in your /etc/carbon/storage-schemas.conf:

    [collectd]
    pattern = ^servers\.
    retentions = 10s:90d,1m:1y

    This will retain your full 10s metrics for a month (30d), 1 minute interval metrics for a year. That config results in each whisper file being 15MB, and with my NAS config (with 6 running jails) I have 220 whisper files for a total disk space of 1.2G. Considering disk space is pretty cheap, you could easily adjust these numbers up to retain more data for longer.  You should also read up on the graphite aggregator which controls how the data is parsed down when it's saved at lesser intervals.

    Thanks to Ben K for pointing out that more than one or two aggregations will greatly increase the amount of disk access. Initially I had a four stage aggregation, but that would require a crapload of access happening with each write. Since Graphite is very IO intensive to begin with, that's not a good idea.