According to the official osquery docs, osquery (os=operating system) is an operating system instrumentation framework that exposes an operating system as a high-performance relational database. Using SQL, you can write a single query to explore any given data, regardless of operating system (more on osquery basics here).
This, fundamentally, can help you see why osquery is a handy utility right out of the box, but the real value of the instrumentation agent is discovered when the data it can access is gathered and analyzed at scale, across an entire enterprise.
When you look at developing a solution like this, osquery is a key part, but the entire system is not possible without additional components handling the transport, aggregation, storage, and presentation of all the rich data that osquery can provide.
In some cases this could mean introducing a commercial offering, but in this post we’re going to outline how to make osquery work using supplementary open-source tools.
This list is by no means exhaustive, but we’ve distilled it down to some of the most commonly used tools for building an osquery ecosystem. We’ve split them into six respective functions:
Combining one tool from each of these functional areas will be a Do-It-Yourself starting point for deploying osquery at scale.
Chef, Ansible, and Puppet: while each of these tools have their own strengths, they all serve to allow you to automate provisioning and configuration of endpoints for a variety of operating systems, and can be used to push osquery packages and configurations out to endpoints at scale.
Munki: The outlier of the group, Munki is specific to macOS fleets. It allows for configuration and update of software packages on macOS, and can be used to deploy osquery.
We are building this potential solution set around osquery, so we will assume that osquery is the primary tool used to inspect the endpoint. However, there are several open-source tools that osquery integrates with or is packaged with that bear mentioning, and many more that can be integrated or used in complement with osquery via extensions.
Augeas: Commonly used to read configuration files into key-value pairs; this tool is built into osquery to make *nix config files parse-able by osquery. Osquery wasn't originally designed to read files but a lot of linux config settings are stored in text files, and this allows all of the supported types in Augeas to be addressed as key-value pairs in a table, without having to write code for an additional osquery table for each one.
Prometheus: An open source metrics collection & publishing project that was initially birthed by the security team at SoundCloud. If you combine osquery and Prometheus, you can query a Prometheus API and get high fidelity performance counter results inside of osquery, something which current osquery is somewhat lacking in. Although some find there is a bit of a learning curve, its flexibility and efficiency make it worthwhile.
Google Santa: unrelated to the other Google Santa project that tracks Kris Kringle’s journey. As he typically does, Santa determines whether things are naughty or nice - in this case, we’re gauging binaries on macOS. This tool can determine what is running on your machine and potentially prevent the spread of damaging binaries across your fleet through white or blacklisting what files are allowed to execute on your Macs. With a recent extension for osquery created by Trail of Bits, Google Santa can now be managed from osquery.
For our use case, Endpoint Management for osquery ideally means allowing you to issue commands across all of your machines rather than just one at a time. This can be done by changing scheduled query packs, but can in many cases also allow you to run queries on an ad-hoc basis. All of the tools for this step listed here include a TLS server, but you could do this through configuration changes and log-shipping.
Doorman: One of the initial osquery fleet managers. Appreciated amongst users but discussed less frequently given the availability of newer endpoint management tools with more features.
Fleet: Another Golang powered fleet manager, Fleet was first a commercial product from Kolide, that was then released as open-source. Fleet is still a core component of Kolide’s commercial offering, which has expanded beyond what is in the Fleet component.
Zentral: Zentral is a greater framework for capturing events from a variety of sources and linking them to an inventory. It uses osquery among other technologies, and also works with Google Santa (independent of osquery). Built on Python, Zentral identifies as a best fit for emerging to medium-sized IT teams, and have a slightly different structure in terms of how they treat osquery endpoints versus TLS servers purpose-built for just managing osquery.
SGT:The most recent addition to this list, released in early 2018, this is an osquery server that was built by the team at Okta using Golang (hence its namesake, Simple Golang TLS).
The whole point of an osquery management system is to gather data and put that data somewhere you can use it. There are two main points where data is transferred -- from the endpoint to the management solution, and then from the management solution to the data store. There are many, many ways to do this, but several pop up more regularly than others in osquery circles:
TLS Server: The most common way to get data off an osquery endpoint to a central management system is by using a TLS server -- some of the same technology that allows your web browser to talk to a website securely is built into osquery, and osquery endpoints will happily talk to a TLS application server if both sides are correctly configured. This is what most osquery at scale solutions are built around for endpoint-to-management data transfer.
Beats & Logstash: created for use with Elastic Search, Beats endpoints and Logstash servers are used for shipping logs from endpoints and then processing and writing the data into storage, usually as a part of the popular ELK stack method of ingesting, processing, and parsing data. Some of the first implementations of osquery were using ELK. This covers transporting data from the endpoint to the management cluster, and then also along to the back end data store.
Kafka: a databus tool that moves data from the management server to a variety of back end data stores and processing. The bus metaphor is invoked because the data travels along a route that may include several “stops” and/or processing along the way, it is not just being moved from one destination to another. If implemented properly, Kafka allows for much more efficient use and analysis of data, and also allows data to be written and stored in different systems that can run in parallel and provide different types of functionality.
Data storage can be any sort of relational SQL or unstructured database that can handle osquery data. There really is no limit to what you can use here, however, it should be a technology that you are familiar with, and can plan for the eventual need to scale if you are going to keep osquery data long term.
One of the core benefits of thorough data storage is the ability to execute historical incident investigation. The most popular solutions include:
Elastic Search: An unstructured key-value pairtype database. This was the initial means of storage used by Facebook- it is more robust in some ways for dealing with large amounts of data than a normal relational database. ES allows for horizontal scale, and is designed for users to get more out of unstructured data. Part of the ELK stack.
Postgres: A very common relational database that has become the de facto standard for open-source SQL in many organizations.
Once you have all this rich endpoint data from osquery, what do you do with it? Data visualization is important to both understand the scope of what osquery can provide, to see trends that can be derived from osquery data, and also attempt to look through this data for anomalies and information to support incident investigations or hunting. Without being able to make sense of the data, even very rich data is useless. The solutions here are some of the open source options available, but many organizations will end up eventually building their own customization using these, and/or some combination of their existing visualization solutions.
Kibana: The most commonly used open-source data visualization tool for osquery builds. Part of the ELK stack, a natural fit for teams already using Logstash and Elasticsearch. Kibana has a domain-specific language that allows you to write a variety of queries that cut down on the data displayed to segregate “what matters” for a given use case out of the mountain of data that you have, and allows for rough pivots on some types of data.
Grafana: More suited for trending of data over time, another package that allows you to build visualization off of queries derived from a data store
Apache Zeppelin & Spark: analysis “notebooks” that allow you to create queries in a graphical notebook format (realistically a web page with a variety of frames) -- this allows for metadata and formatting to be placed around the data output, and in many cases, the queries can be exported in a portable format, with or without the data, to share with others. Not as easy to do time-based queries as Kibana, but allows more ways to approach the data for analysis (these tools were originally developed for the scientific data community, but have easily transitioned over to event analysis).
Don’t forget, successful deployment requires people + process + technology. While we’ve provided suggestions about the open source technology you can use to deploy osquery at scale, you will still have to determine which align best with your organization's inclinations, existing processes and resources. In many cases, the path of least resistance is to use whatever your admins have in place and build on top of that or adopt tools that fit your technical team's talents and skill sets (i.e. don’t have python programmers try to customize a tool written in Java).
Related osquery resources: