As attackers continually evolve their tactics, the arsenal of tools at hand for defenders needs to respond to attacker complexity while still enabling day-to-day business to happen.
When it comes to detecting malware, the arms race between attackers and defenders is certainly nothing new. The once seemingly simple battle between nuisance script kiddie worms and simple anti-virus software evolved over time into a much more complex and layered approach towards stopping powerful weapons against organizations to extort, incur damages, and steal intellectual property.
For a long time now, malware-detection technologies have become more sophisticated as malware works harder than ever to gain access to a target machine and then conceal its presence as it runs.
What Does YARA Stand For?
An abbreviation of YARA: Another Recursive Acronym, or Yet Another Ridiculous Acronym, YARA was originally developed by Victor Alvarez of VirusTotal and was released on GitHub in 2013.
What Is A YARA Tool?
Primarily used in malware research and detection, YARA is a tool that provides a rule-based approach to create descriptions of malware families based on textual or binary patterns.
YARA Rules
Like a piece of programming language, YARA rules work by defining a number of variables that contain patterns found in a sample of malware. Depending on the rule, if some or all of the conditions are met it can then be used to successfully identify a piece of malware.
To detect malware, defenders need a strategy and tools that can recognize it, even if it has been disguised! In the past, defenders have identified malware by it’s unique filehash signature (typically an MD5, SHA1, or SHA256 checksum). Colloquially you can think of a malware's filehash signature like a fingerprint — it's a unique identifier derived from the entire contents of a file, that reveals the malware's true malicious nature. A downside of filehash based malware detection is that attackers can easily disguise their malware by adding blank lines or comments to their code so new variants have a totally new filehash, rendering detection with old filehash useless!
More advanced detection methods do not calculate a single signature from the entire file (something that is too easily changed), instead they use multiple signatures each of which are strings (hex or ascii) or regular expressions, used to identify important functional sections within the malware. One such open-source tool for advanced signature-based malware detection is called YARA. YARA covers all the operating system bases by running on Windows, Linux, and macOS and is easy to install. The screenshot below shows installation on Ubuntu.
Defenders secure their systems by installing YARA, downloading a set of YARA rules and running a scan of their systems using those rules. The screenshot below shows how to run a recursive YARA scan on all files in the /lib directory (using rules in file: rules1.yar).
Full Filesystem YARA Scan vs Targeted Osquery YARA Scan
In order to measure the time taken to run a YARA scan on the various directories of our Linux server we created a script to record the start and end time along with the number of files processed, the script (run_yara.sh) is shown below:
Running the script against the /lib directory we see it takes 11 seconds to scan 23,887 files.
Doing similar scans for the other major system directories (/home, /etc, /usr, /var) we can calculate the total number of files scanned and the time taken.
We also measured the CPU usage during our full YARA scan and found 48.2% of one CPU core used.
As you might imagine, running a full (167 second) scan, multiple times per hour, across thousands of machines could get expensive quickly (especially since most servers have many more files than our small test machine). If only there was a way to run YARA more efficiently?
Enter osquery. With osquery, you already know what processes are running and can subsequently run a targeted YARA scan, against only said process files. You don't need the full force of a YARA scan on files that aren't doing anything — save that kind of firepower for where it's actually needed, especially if you're in a resource-intensive environment.
The screenshot below shows the SQL for running a YARA scan (for all running processes) inside osquery (the start and end time are also captured).
We also measured the CPU usage of osquery (YARA) during the targeted scan. The osquery interactive shell (used to run the YARA scan) is called “osqueryi”.
A summary of all the results from our full scan vs targeted YARA scan with osquery is shown below.
The takeaway: in contrast with the full filesystem scan that took up 48.6% of a CPU core to run, our targeted scan with osquery only used 4.0% of one CPU core and took a fraction of the time.
At this point some people may object that the above is not a fair comparison and that we could limit our full YARA scan by scripting a list of running processes (using “ps -ef” for example) and then scan only those files (you don’t need osquery for that). The limitation with such an approach is that even if you run very regularly, there is a chance you miss a process that runs in the interval between two scans. Osquery solves this problem with it’s eventing framework, where no process, network socket, or user login is ever missed (read more here).
We further highlight our analysis and get you started using YARA and osquery, with our on-demand webinar: “Cross-Platform Malware Detection with YARA and osquery”. Register using this link.
Related osquery resources: