Open source log and event management

Jordan Sissel

Thu, 8th November 2012


We are delighted to have Jordan with us today talking about Logstash. Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs.

It is fully free and fully open source. The license is Apache 2.0.

Max, Concise Courses:
It is 12 p.m. eastern standard time, and we’ve got Jordan Sissel. Did I say that right, Jordan? I’ve got an equally hard name as well. So Jordan is live and direct from San Jose, California. And the presentation today is Logstash, open source log and event management. Usual ground rules. We’ll do the presentation, questions after the end via the chat logs. So, Jordan, thank you so much for joining us. Please give us a little bit of a background of what you’re up to. And when you’re ready just get straight into the presentation.

Jordan:
Right on. Thanks for having me. I’m a sysadmin by trade. I’ve been in the industry for about eight or ten years now, and I’ve always sort of had logging ish problems that I needed to be solved, among other ones. There’s a lot of things that sysadmins do. Most of the time it’s just glueing systems together.

And I went to school for programming. I’ve always sort of hacked in my spare time. Some of those things are useful; some of them are not. And Logstash happened to have been one of those things that I started with some folks, and it’s gotten to be a pretty good community. I think it’s about 3,000 users now. Probably gets about 100 active users in the community, so it’s going quite awesomely.

So by the way of introduction, I’ll try and sort of explain the problem space that’s being solved. This is an Apache log. You’re not supposed to be able to read it. So if you can’t read it, that’s actually the point. There’s lots of lines here. That selection is just one line. There’s about 40 of them.

The point is that there’s two problems here: One, there’s too much data. You as a human would have a great deal of frustration trying to figure out what is meant by all of this madness. Is there a problem? Is there something that we can glean from it? What is all this stuff?

And the second problem is: You need to be an expert to understand exactly what some of these things mean. If you don’t know web serving, you don’t know this is what a request looks like. You might not know that this 200 is a response code, or the number next to it who knows what that is. Because they’re not labeled, it’s not a very structured log format. It’s not very friendly. So what ends up happening is you think you need this data, so you keep it around.

And what happens is you’re on call. You’re monitoring system pages you at 4:00 in the morning, and you’re angry because you’re awake at 4:00 in the morning. And it tells you that disks are critical. And that sucks. This is sort of a thing that could be self healing, but you haven’t implemented it yet because you’re not sure you want to go around deleting random files.

So you log into the server, you find the log files, you blow away the logs, then you go back to bed quite angrily. So that sucks, right?

There’s another problem where you get error messages that just are totally absurd. They may appear to be English, but they’re not. And they tend to make you smash things because you can’t understand them.

So you’ve got this backstory now, that you got woken up at 4:00 in the morning because logs are causing you problems, so you blew them away. And then later you tried to read them, but you couldn’t make sense of them. So you’re not in a very positive state of mind right now.

And the next day you get a business developer, or a customer support guy or gal, who’s not necessarily the most technical, but they know there’s an angry customer.

The promise here is that there’s some amount of data that seems impossible to sift through. And you as an assistant with maybe some small amount of coding skills but certainly some Unix skills you swing into the room and type some stuff on the keyboard and you get a graph out that looks like this.

The interesting part for me is that the visual here, the graph, is you don’t need to be an expert to identify anomalies. I don’t know if you can see my mouse, there’s a red spike, three quarters of the way down the graph. You don’t have to be any kind of technology expert, any kind of systems expert, to understand that that red spike is something anomalous. It’s some sort of thing that’s out of the norm. And I think that becomes really powerful. Because now you’ve gone from something that’s sort of impractical which is an Apache log, which has a very bad format, requires expert knowledge to understand and now you’ve got a visual that pretty much anybody in the business can understand. That’s really powerful. That means you can teach computers about how to identify those signals as well. This happens to anyone who has family that are nontechnical.

If you’re the only IT guy or nerd in the family, you get phone calls about how to do stuff wireless or how to set up this program. The same things happen in business, where you get your nontechnical folks starting to rely too heavily on the technical people just to do day to day activities. So you end up becoming people’s human keyboards. So I say, try not to do that. You don’t want to be someone’s interface.

But before we talk about maybe solving some of these things, it’s worth discussing what a log is. And if you ask me, a log is a time stamp and some data. If you look at any log format, any metric data format, you’re going to see that the main commonality is there’s always a time stamp and there’s always some kind of data. But what data is there will vary by the application.

So how can Logstash help with that? First and foremost, it’s a free and open source. The community is growing quite rapidly these days, so there’s a good amount of open source support for it. If you pop in the IRC chat or the mailing list, you’re pretty much guaranteed to find someone who can help you. So taking that example and throwing it into Logstash Logstash is primarily a pipeline model, kind of similar to the Unix model where you have some programs outputting text, that you can pipe that output to input some other things.

And the same idea here is that you have input that could be files or other things that generate events or sources of events and filters that will do some kind of processing. And then finally output storing that somewhere else, shipping it outside the machine.

And in this example we’re taking Apache logs, we’re processing it through a Grok, a filter called Grok we’ll talk about that shortly. Parsing it through a filter called Date, which is just to sort of unify the time format in all of your logs so you can search on them, and shipping it to this full text search engine called elastic search, which will let you get real quick search on your logs.

So once you do this, you get some fairly powerful features. The screenshot you’re looking at right now is the recommended Logstash webinar face, and I’m actually going to switch over to that now. So this is a live demo. This is running pretty much all the time. I try to keep it up. It’s at demo.logstash.net, and it’s just a way for me to get people to be able to play with the main webinar face.

And these are just Apache logs going in. And if you drill into any of the logs, you see a field breakdown. And these are things that Logstash picked out of some of the logs. There are things that are in the field breakdown that are not in the log; for example, location, all this GEOIP information. All of that is not in the Apache log itself. It actually comes from when Logstash pulled out the IP address, which there’s another filter called GEOIP, which will do a geographic look up based on the location of the IP address.

So let’s say I wanted to search all things that are from Poland. So there’s only like 10 hits. But the idea here is that you’re drilling through your logs, and you’re not having necessarily to learn a specific courier language. For example, let me go back to the beginning and look at a scoring, which will give me the most popular things. There happened to be a lot of events with no company name. Those are just not Apache logs. But now I can drill into and say, “Show me everything from the United States.”

And, again, the power here is that you don’t necessarily have to be an IT expert to navigate this stuff. You can just click around and look for anomalies yourself, which I think is very powerful. What else can I talk about? Let’s talk about Grok for a little bit. Regular expressions are a very powerful tool. They’re very concise in text for expressing how to match a piece of text. But as you can see from this slide, this slide is a regular expression. Sometimes they can get quite big if you’re not careful. This one happens to be for HAPROXY HTTP logs.

And the point here is that what’s written in the background is not something a human wrote. It’s something a piece of code generated. The piece of code is called Grok and it shifts with an ability to sort of name your pattern. The Grok syntax is that you say, “I want to match a specific syntax, and I want to call it something semantic.” examples being “I want to match a number and call it bytes. I want to match quoted string and call it referrer.” These are all things coming from the Apache log format.

But not everything is a string, right? Whenever you’re using a regular expression system, its input is text, output is usually text. It gives you away to casting and integers of floats, so that your final storing system can interpret them as actually being numbers. And that’s useful for doing range queries, for example.

In general, the pattern here is that I have a sample log and then the Grok pattern that will match it I’m saying there’s a Syslog time stamp that the syntax I have a host name pattern. I want to call it host. The host name will match this. The Syslog program will match this guy. And then everything else in the message, I’m calling message. And this is a fairly powerful way to sort of pick apart a roughly unstructured, poorly formed log that will let you later with less skills required query these things.

And these things can be recursive if you’re interested in that kind of thing where the Syslog time stamp is actually comprised of other patterns internally. It’s a month, day, and a time. You can see that up here. The host name is this long thing that I wrote once, and I have some tests for the proof that it worked, but I never have to look at it again, and ditto for the Syslog program.

And I mentioned test. Frequently, you’ll see people write complex regular expressions and provide unit tests for them. Grok comes with a way to do that fairly easily, so there’s just 72,000 assertions here. Almost all of those are testing that the patterns are actually matching correctly. So you have some fairly good confidence in that if you say, “Match a number.” It will actually only match a number.

So another slight rant is that there’s lots of different date formats, and I’m going to seven here. These are all the different time stamp formats. The main point here is that if you did not have something like Logstash, and you were trying to correlate two different log formats let’s say you have two log formats that you use both of these time stamps. You’re not going to be able to correlate across time because your Unix tool to do that is going to be “sorts.”

And “sorts” only does text usually. So it’s not going to have any way of comparing these top two values. So you’ll have to have something that processes that. Well, there’s a date filter that comes with Logstash to fix exactly that kind of thing. And this is just an example of how to use it.

The date format that Logstash tends to prefer and what the data filter turns everything into is ISO8601. It’s quite a large standard if you want to look at the most specific things and search for access date time. That’s about the smallest subset you can get. But this format here is fairly easy for most people to read. It sorts pretty well, as long as your time stamp is the same.

like Apache, where its default log format only includes second no. If you have ten requests in one second, you have no idea later, necessarily, how far apart those things were. One of their common problems is that humans would look at this little block here and say, “Yeah, that’s one of that trace.” If you’ve ever seen a stack trace exception log from Ruby, Python or Java this happens to be Java you would agree that this is one event.

Most log processing systems assume that there’s a hard boundary between a line and a file, which would be like this (indicating). And an event. So it has that one to one mapping. So this thing would be represented by most log systems as four events. And if you’re looking for instances where fancy pants inception happens, you’re going to miss the fact that it happened in this particular file, because it’s not part of that event. So Logstash comes with the filter to help you to merge all those things into one event.

This is a metrics example. So prior, in all the previous examples we’ve been doing an elastic search output. This one is “statsd,” a pretty cool program for doing metrics analytics. You end up getting the same results as the graph I showed earlier. There’s some interesting things once you start throwing structured data into a search engine like an elastic search. You can devise problems I didn’t even think existed.

For example, this screenshot is showing the duration value of each HTTP request. And the minimal value of that field is negative, which is kind of weird because it means that the request took less than zero seconds; which to me, seems like impossible. But when I dug into it and this is another screenshot of me trying to drill into that problem I’m searching for anywhere where the duration is less than zero. I find a couple of hosts. The log that I was hosting, it turns out they have bad hardware clocks. So NTP is running on these servers to try and sync the clocks. And it’s actually unable to do it in a reliable way, so it has to sort of punch the clock and reset it in a way that causes time to jump. And when it jumps backwards, you get a negative request time.

So that helps me find a hardware failure just by looking at the minimum maximum of duration field in Apache, which I think is kind of cool. So John Vincent, one of the Logstash contributors, describes Logstash as being a Unix pipe on steroids, and I think that’s true. To that end there are 23 inputs, about 18 or 20 filters, and I think there’s now 43 different outputs that you can all plug independently together. You can have logs going from files, as in input, and going to an elastic search. You can send it to Nagios if you want it to alert you.

You can set it to pager duty, or a metric system, all kinds of things. I think that’s about it for the technology. I want to talk a little bit briefly about some of the project focuses, because I think these are important. These are decisions, and I guess philosophies, that you use to drive the direction of Logstash.

Two feature focuses. One, is being able to transport and process logs to and from anywhere. That’s exactly why there’s that pipeline system. That’s exactly why there are almost 80 plug ins now, to be able to get logs from any system to anywhere else, whatever is most useful to you. I mentioned earlier that sysadmin do a lot of glue work. And this is fundamentally like glue software.

And then once you get that processing engine, you might want to do a search and that’s exactly what this webinar face is built to do. This was built by a member of the community to be an improved webinar face on what Logstash already provides.

Some design features. My background as a sysadmin is that I don’t want software to be difficult to operate or run or deploy; so to that end, I want Logstash to be able to fit your infrastructure. I don’t want you to have to look at Logstash as a project and figure out that,” Well, to deploy it, I have to rearchitect the way I do things and my systems and my infrastructure.”

If you’re having to make that decision about a piece of software, the easiest answer is, “No. I’m not going to use that piece of software.” Those 80 plug ins I mentioned already Logstash is very extendable. The plug ins are one of those methods. I think this is the most important point I try and make when I talk about community, especially with open source. If a new user has a bad time, it’s a bug. It might be a bug in Logstash, it might be a bug in the documentation, but it’s a bug.

There’s very high expectations that if someone is having trouble, it’s probably because the software or the documentation is leading them astray. And if it’s neither of those things, then there’s some other way we can help. Maybe we can improve the documentation, have a video presentation, do things like this to get people a little bit more comfortable with software. And I mentioned that if a newbie is having bad times, being a bug, that documentation is extremely important. And to that end, contributions are a lot more to me than just code.

I’m happy writing code. I have a happy accepting practice. I’m also happy accepting people doing support on the mailing list, or filing bugs, and things like that.


Questions and answers

Max, Concise Courses:
Jordan, that was an amazing review of a lot of information in a tight schedule really, really fantastic. Thank you very much. I’ve got a couple of questions here, and I’m sure there’s going to be a lot of people who want to contact you, so we’ll have that up on the page; which I’m sure most people know, this video goes live on the same page that you’ve registered. But a couple questions here. And guys, and ladies and gentleman, if you have questions, send them in.

Is Logstash available in any other languages? Are the records ever written in, let’s say, Japanese?

Jordan
Presently, there are some problems in Logstash with respect to Unicode, and I would also expect there to be issues possibly with, like working on those, I’m hoping to have those solved in the next week or two. But I think if your end to end pipeline is logs in Unicode or UTF8, Logstash will happily accept those. Logstash will turn most plain text logs anything input into Json objects or anything that’s representable as Json, so any Unicode or any UTF8 should be accepted by Logstash. And if it doesn’t, that’s a bug. And if you’re having problems with that kind of thing, just let me know and give me an example of something that failed, and it’s usually pretty quick to fix.


Max, Concise Courses:
Is it possible for a hacker to remove logs or redact logs?

Jordan
Let me try and step back a little bit and answer that question. Logstash, itself, is primarily a processing pipeline. So from that point of view there’s no storage in Logstash itself. What you do is you use an output. And the main output I recommend everyone using is elastic search.

And you can set up elastic search in a way that it would reject delete requests, so you could have it such that the system was a only if you needed it to be. And that’s certainly one of the appeals of using typical Syslog servers, where you’re shipping stuff over the network. That way if the machine gets compromised, you still have that extra data that they can’t cover their trails with.


Max, Concise Courses:
I’ve got one last question here. Do you offer any training or courses to understand logs better?

Jordan
Not at the moment. I think maybe that’s on the road map. At the moment right now, the project itself is open source community driven. There’s no company strictly behind it, although I am working full time on it now at DreamHost. So the quality of documentation material that I’m able to write should be going up quite soon.


Max, Concise Courses:
Thank you so much. Very expertly explained and reviewed. And we really, really appreciate your time. And hopefully you’ll be up for coming back on and giving us a progress report in the forthcoming months.

Jordan
Absolutely. I’m happy to receive any questions if you guys have them. If they’re about logging in general, Logstash, I’m always happy to answer questions, mainly because it’s a source for me to sort of gauge the field of what everyone’s trying to do with respect to logs. So I’m happy to do some sort of knowledge sharing over email if people are interested. Thanks for having me on.

Max, Concise Courses:
Thank you, my friend. Bye.