Displaying articles with tag ruby

Bandwidth Tracking

Posted by wesgibbs, Wed Apr 16 22:04:00 UTC 2008

Like a lot of web applications, MadPolitic has different levels of user accounts. One of the account differentiators is bandwidth allowance; higher-level accounts have an increased bandwidth limit. To implement this, I needed a way to track bandwidth usage per account. Here’s how I did it using Apache.

Use CustomLog and cronolog to create and rotate a custom bandwidth log

The standard Apache access.log was missing several things I needed in order to track bandwidth.
  • the format did not include the host from the request header
  • it also did not include the number of bytes Apache sent back across the network to the client in response to each request
  • the log rotation didn’t give me a clean history of log files by day

To remedy this, I created a CustomLog and used cronolog to get some more advanced log rotation. Apache provides log rotation capability out of the box, but the Apache docs themselves recommend cronolog for more flexible logging.

Here’s the CustomLog I created, from my httpd.conf file

CustomLog "|/usr/local/sbin/cronolog -l /etc/httpd/logs/bandwidth_today -P /etc/httpd/logs/bandwidth_yesterday /etc/httpd/logs/%Y/%m/%d/bandwidth.log" bandwidth env=!dontlog
Apache passes most of the work off to cronolog, which does a few things
  • creates a symbolic link called bandwidth_today to the current bandwidth log file
  • creates a symbolic link called bandwidth_yesterday to the previous day’s bandwidth log file
  • defines a spec that indicates how I want the logfiles rotated and stored, in this case, in directories based on the month, day and year

After the piped invocation of cronolog, the CustomLog directive has two more parameters. The first, bandwidth, references a LogFormat I created. More on that in a moment. The last parameter enables the ability to not log certain requests. For example, if you have Apache serving up other sites or content that you do not want logged, you can add lines like this to your httpd.conf file

SetEnvIf Host "myblog\.mydomain\.com" dontlog

and that final parameter to CustomLog makes sure requests to myblog.mydomain.com don’t wind up in the log.

Create a custom LogFormat

Recall that bandwidth parameter I passed to the CustomLog directive. It should reference a LogFormat directive that tells Apache how I want each line of the log file to look. By default, the Apache access log is missing the two key pieces of information I needed: the host from the request header, and the number of bytes that Apache sent back across the network to the client.

Logging the host was one of the keys to making all this work for my application. That’s because I use a subdomain-per-account approach, so each account gets a unique subdomain like bob.madpolitic.com. Capturing that host information in the log files along with the bytes transferred would give me all the raw data I needed to map bandwidth usage to accounts.

I started by adding those bits into the format and came up with the following

LogFormat "%t %{Host}i %O %h \"%r\" %>s" bandwidth

You can reference the Apache docs for the LogFormat directive for more detail, but in brief, this line tells Apache to log the time (%t), the contents of the Host: header in the request (%{Host}i), the bytes sent over the network to the client (%O), the remote host (%h), the first line of the HTTP request, quoted (\"%r\"), and the HTTP status of the response (%>s). Finally, it gives this format a name, bandwidth.

I now had a log file being written out in a format that I specified that included, most importantly, the host and the number of bytes sent to the client for each request. That log file was being rotated daily and previous logs were stored in a directory structure based on the date. I also had some handy symbolic links to today’s and yesterday’s bandwidth log files.

Make Apache write your Ruby code

This worked great. The plan was to fill a log with this information, and have one log file per day. Then I could use a log analyzer and some Ruby to parse through all those log files, compile the bandwidth statistics, and write it all into my application’s database so that my Rails app could get to it to do things like show the user how much bandwidth they’ve used. After trying it out for a little while, however, I came up with another idea.

Why not have Apache just write the Ruby code for me? I could drop the log analyzer and basically just execute the log file like a Ruby script and have it give me the data in the structure I wanted: a Ruby hash mapping hosts to bytes.

I went back to work on my log format and came up with this

LogFormat "bw[\"%{Host}i\"] = (bw[\"%{Host}i\"]) ? bw[\"%{Host}i\"] = bw[\"%{Host}i\"] + %O : bw[\"%{Host}i\"] = %O" bandwidth

This gave me logs with lines like this

bw["bob.madpolitic.com"] = (bw["bob.madpolitic.com"]) ? bw["bob.madpolitic.com"] = bw["bob.madpolitic.com"] + 123 : bw["bob.madpolitic.com"] = 123
bw["frank.madpolitic.com"] = (bw["frank.madpolitic.com"]) ? bw["frank.madpolitic.com"] = bw["frank.madpolitic.com"] + 3442 : bw["frank.madpolitic.com"] = 3442
bw["bob.madpolitic.com"] = (bw["bob.madpolitic.com"]) ? bw["bob.madpolitic.com"] = bw["bob.madpolitic.com"] + 33 : bw["bob.madpolitic.com"] = 33

Now I had a file full of Ruby code that kept a hash called bw which mapped the host to the number of bytes transferred. Each line would check to see if that host already existed as a key in the hash, and if it did, add the bytes to the total already in the hash. Otherwise it would create a new entry in the hash, host => bytes.

Get the data from the hash into the application database

Next I wrote a Ruby script that gets invoked by cron once a day just after midnight. The script does the following
  1. creates a new empty hash called bw
  2. uses the bandwidth_yesterday symbolic link to read in yesterday’s log file all at once to a string. I process bandwidth totals once a day, and remember that this script is being run just after midnight, so I want to process the previous day’s log
  3. sends the eval message to Kernel and passes the string that contains the log file along
  4. when this is finished, my bw hash is now populated with host => bytes entries
  5. the script iterates through each entry in the hash, and for each
  6. uses the key (the host) to find an account in my application. Remember that each account gets a unique subdomain, so the host bob.madpolitic.com would map to bob’s account
  7. looks to see if there is already a row for that account, for the current month, with a bandwidth total and creates one if there is not. This would happen the first time that account is ever used, or at the start of each month
  8. adds the value from the current hash entry, which is the bytes transferred for that host for the previous day, to the existing value

When it’s all finished I have a table that is populated with the number of bytes transferred for each account, for each month. The table looks like this

create_table "web_stats", :force => true do |t|
  t.column "account_id",   :integer
  t.column "month",         :string # stored in a %m%Y format, like 042008
  t.column "bytes",           :bigint
end

I’ve also got all of the historical log files still sitting on my file system (getting backed up once a day) in the event I need to recreate the data or mash it up some other way.

0 comments | Filed Under: | Tags: ruby

A Little irb Context

Posted by wesgibbs, Fri Apr 11 08:00:00 UTC 2008

When you’re sitting at your irb prompt, sometimes it can get a little confusing trying to remember the context in which all those commands you’re entering are executed. We know that anything entered at the prompt in irb is executed in some context, but it’s easy to forget what it is since it’s loaded for you and is not explicit.

We know that if we ask irb about itself, it tells us its class is Object


>> self.class
=> Object

So if we take that together with two facts from the pickaxe book:

"class and module definitions are executable code", and 
"a class definition is executed with that class as the current object" 

we can begin to cobble together an answer. Great! What was the question again?

The question: what is the context of that irb prompt?
The answer: you are entering code into the class definition of the Object class, and it is being executed in the context of an instance of the Class for Object named ‘main’. Clear? I thought so.

Sometimes I find it useful to visualize the context of irb using this example:
class Object

    # all of Object's methods defined here

    # Now imagine that your irb prompt is sitting right here, 
    # in this gap in Object's class, right before the final end, 
    # flashing at you, waiting for you to do great things.

    # >>

    # And every time you hit the Return key, the entire class 
    # definition for Object is executed again, with whatever 
    # command(s) you entered at the prompt appended to the 
    # end of the class definition.
  end
This means you can do things like
class Object

    # all of Object's methods defined here

    # >>
    private 
    def foo(arg)
      puts "#{arg}"
    end

    foo("bar")
   # And you'll see "bar" output.

    # Now define a class to play around with
    class Fool
      def touch
        "slap!"
      end
    end

    fool = Fool.new
    puts fool.touch

    # And you'll see "slap!"
  end

Obviously this is a gross over-simplification of what irb is. But maybe this little Gedankenexperiment will help contextualize your irb-ing.

0 comments | Filed Under: | Tags: ruby

That First Line of Ruby

Posted by wesgibbs, Wed Apr 09 15:37:00 UTC 2008

Probably the very first piece of ruby code most of us wrote was some invocation of the puts method. In those early days, I remember grokking enough to know that puts added a newline and print did not and then I moved on.

Recently, I was watching a coworker give a demonstration to our team of the facade pattern using ruby. His example worked great, except that the output wasn’t formatted the way he expected. He had overridden the to_s method in one of his classes and was using Kernel#p to output the object.

class Grouper
attr_accessor :taste, :color, :price
  def initialize(taste, color, price)
    @taste, @color, @price = taste, color, price
  end
  def to_s
    "Taste: #{taste}, Color: #{color}, Price: #{price}"
  end
end

p Grouper.new("good", "white", "cheap")
which yielded
#<Grouper:0x314494 @taste="good", @price="cheap", @color="white">

He was expecting his overridden to_s method to have been invoked, and it wasn’t.

I decided it might be a good idea to go back and dig a little deeper into some of the ways ruby provides for getting output to $stdout.

To start, let’s take every newbie’s favorite method, puts. So hearkening back to those heady early moments with Ruby, I fired up irb and invoked puts:


>> puts "Oi" 
Oi
=> nil

What just happened? Well, we know that the puts message must have a receiver, even though it’s not apparent from this invocation through irb. To track down what receiver I was sending the puts message to, I started with ruby info:

ri puts

which didn’t get me very far:

IO#puts, IRB::Locale#puts, IRB::Notifier::AbstructNotifier#puts,
IRB::OutputMethod#puts, Kernel#puts, Net::Telnet#puts,
Net::WriteAdapter#puts,
Net::SSH::Service::Process::OpenManager#puts,
Net::SSH::Service::Process::POpen3Manager::SSHStdinPipe#puts,
StringIO#puts, XMP#puts, XMP::StringInputMethod#puts,
Zlib::GzipWriter#puts, TMail::Decoder#puts, TMail::Decoder#puts,
TMail::Encoder#puts, TMail::Encoder#puts

So which one of these puts was being invoked? If you took the time to read the list, there’s really only one likely candidate, but still, let’s try a more direct approach. Back in irb


>> self.class
=> Object

The “hidden” receiver of the puts message is the Object class. If we check Object’s instance methods


>> Object.instance_methods.include?("puts")
=> false

we see that puts isn’t one of them. Let’s check its privates


>> Object.private_methods.include?("puts")
=> true

and there it is. But Object#puts wasn’t one of the results of our ruby info request. To be precise, puts is actually a private instance method declared by the Kernel module. Remember that the Object class doesn’t define any instance methods; it gets them all from mixing in the Kernel module. This misled me for some time because the docs show lots of methods for Object. Then I bothered to read the class documentation for Object, which points out

    
"Although the instance methods of +Object+ are defined by the +Kernel+ module, we have chosen to document them here for clarity." 

Had the opposite effect on me, but oh well. Nonetheless, now we know that the method we’ve been calling from irb is Kernel#puts. Back to ri

------------------------------------------------------------ Kernel#puts
     puts(obj, ...)    => nil
------------------------------------------------------------------------
     Equivalent to

         $stdout.puts(obj, ...)
$stdout is a pre-defined variable referencing an instance of the class IO. So at long last, with one final stroke of ri, we find the method we’ve been invoking all this time
---------------------------------------------------------------- IO#puts
     ios.puts(obj, ...)    => nil
------------------------------------------------------------------------
     Writes the given objects to _ios_ as with +IO#print+. Writes a
     record separator (typically a newline) after any that do not
     already end with a newline sequence. If called with an array
     argument, writes each element on a new line. If called without
     arguments, outputs a single record separator.

And all it does is write out what you pass it, adding the slightest bit of formatting in the form of new lines after each object. (The record separator used is a bit of a mystery.) What if you pass it something other than a string? Then that object’s to_s method is used to get the string representation of the object.

Let’s move on to another popular method, print. As I said at the beginning, I used to just think of print as being like puts but without the new line. That is correct, but is not the only difference. When print is invoked without an explicit receiver in irb, it’s the IO class that receives the print message, just like puts. In the case of print, however, there is some extra logic that gets applied to the formatting. print will add the value of the output field separator variable ($,) after each field, and append the value of the output record separator variable ($\) to the end of the output.

So typically those variables are nil

>> print "one", "two", "three" 
onetwothree=> nil

Nothing between the fields, and nothing at the end of the output. If you give those variables values


>> $\ = " [louise] " 
>> $, = " [mabel] " 
>> print "one", "two", "three" 
one [mabel] two [mabel] three [louise] => nil

To finish, we’ll look at Object#inspect and Kernel#p. The key difference between Object#inspect and the other methods we’re looking at is that Object#inspect does not write anything to $stdout and returns a String. The other methods all write to $stdout and return nil.

The common use case for Object#inspect is seeing what the current state of the object is, i.e. what all its instance variables are set to. By default, inspect returns a string comprising the object’s class and id, and the state of its variables.


>> g = Grouper.new("awful", "white", "cheap")
>> g.inspect
=> "#<Grouper:0x3503f4 @color=\"white\", @taste=\"good\", @price=\"cheap\">" 

Object#inspect does some simple formatting, like surrounding strings with quotes. One thing to note is that Object#inspect does not bother itself with Grouper’s to_s method.

Kernel#p takes one or more objects as arguments and just iterates through them, calling Object#inspect for each, and sending the output to $stdout.

>> p({"one" => 1, "two" => 2, "three" => 3}, "Frank", ["green", "blue"], 5, Object.new)
{"three"=>3, "two"=>2, "one"=>1}
"Frank" 
["green", "blue"]
5
#<Object:0x81790>
=> nil
So back to my coworkers’ example with the Grouper class. He had used the Kernel#p method to print out a string representation of his object. Kernel#p just sends the inspect message to Grouper and writes the returned string out to $stdout. At no point does the to_s message get sent to Grouper. This, however, would have given him what he wanted

>> puts Grouper.new("good", "white", "cheap")
Taste: good, Color: white, Price: cheap
=> nil
Here’s a summary of the four methods we looked at
                       writes to $stdout | invokes obj.to_s | returns
IO#puts(obj, ...)             YES        |        YES       |   nil
IO#print(obj, ...)            YES        |        YES       |   nil
Object#inspect                NO         |        NO        |   String
Kernel#p(obj, ...)            YES        |        NO        |   nil

If these don’t give you the control or formatting you need, then take a look at the PrettyPrint class and its Modules.

0 comments | Filed Under: | Tags: ruby