Monday, August 20, 2012

An adventure in Windows 8 and OVAL (with a layover in PowerShell)

By:  S.S., Senior Security Engineer @ G2


Back when the first preview versions of Windows 8 came out the first thing I did once I stood up a VM was to try to run OVAL content against it. Using OVALDI it worked fine, and I was able to write inventory definitions for each of the preview releases as they came out.
With Windows 8 being released Wednesday, at least if you have a Technet or MSDN subscription, it was time to take the next step and explore the new world of applications.
If you haven't heard, Microsoft is following the money and taking Apple's App Store model. Can't blame them for that, easy money skimmed off the top for digital distribution (this is why Gabe Newell of Valve hates Windows 8 so much - if there's a digital distribution mechanism built into Windows, who is going to use Steam? There goes their profits...). More importantly, this means a new way of packaging and delivering applications. I had not read a lot about it, just knew it was different, and thought it might impact the way we do things in OVAL.
More specifically, the issue is how OVAL inventory definitions work. If you're not familiar with inventory definitions, they're checks to see if an application is installled. The problem is that determining if an application is installed isn't the easiest thing to do. How do you determine if an application is installed? It depends on the application. Sometimes the installer will create some obvious registry keys, so you know if they're present the application is too (OK, they could be entered manually, but we'll trust it). Maybe the only registry keys will be in the Uninstall information, which is difficult to search. Or maybe it doesn't create any registry keys, what do you do then? You can always search the system for the executable, but that's not feasible in bulk, as it just takes too long to scan the entire file system that many times. It gets even worse when the application doesn't have an installer and is just dropped wherever the user decides to put it. And then there's always the variable of vendors changing installer packages, deciding to change the keys that are written, etc. Very often the hardest part of writing OVAL content is actually finding the thing you're supposed to test.
So does Windows 8 change any of that? Only one way to find out...
I started out following my usual process for trying to determine the footprint of a piece of software, beginning with doing a file system and registry snapshot with System Explorer. Then I went to the Windows Store and installed an app. I went with FlightAware, a flight tracking app, since Steph is flying to Vegas this afternoon/evening/night (2 3 hour flights with a 3 hour layover...). Then I went back to System Explorer, did another snapshot, and then did a diff. This is where I hope that it points out an obvious registry key. No such luck this time. Of course there were some keys here and there, and some files, but nothing that looked like it would be a reliable indicator. Worst of all, searching the diff for the word "flight" came up empty. So how am I going to write OVAL content to find this application if I can't even find its name in any of the registry entries or files that it changed?
I did some Googling on how to list installed programs on Windows 8, hoping I'd get lucky. I did and I didn't - I got the answer I needed, but I didn't like the answer. To list your installed programs you use PowerShell.
PowerShell, my old nemesis... You may remember that a few years ago I wrote that it may be the death of OVAL on Windows, as Microsoft intends to use PowerShell as an abstraction layer as the only way to access the data we need (and a basic tenet of OVAL is that you go as low level as possible - don't trust layers you don't need). It looked like PowerShell was going to be the only way I could write reliable inventory definitions on Windows 8, so I had to get over it.
PowerShell support in OVAL, implemented via the cmdlet_* structures, is pretty new. I looked at the MITRE OVAL Repository to see if any content there used cmdlet, but nothing did. Then I looked at MIITRE's sample content, no luck there either. I knew that Microsoft used PowerShell in their OVAL for Exchange benchmarks they publish with SCM, so I downloaded and installed the latest version. That went nowhere, as it refused to let me export SCAP for the Exchange benchmarks (all the others were find, though that did me no good). Finally I remembered that Matt Kerr had written some cmdlet content as part of creating the content for the SCAP Validation program at NIST earlier this year, so maybe there I could find a model. After Greg Witte helped me find what I needed I at least had a bit of a model to follow.
So I quickly wrote a test definition that should have brought back a list of all the installed applications. It didn't work. Tried a few tweaks with no luck. Then I wondered if there were issues with PowerShell, OVAL, and OVALDI. To test this I extracted all of the OVAL content from the validation program's cmdlet section and tried it on Windows 8. It failed miserably, so much so that I thought OVALDI was to blame.
I downloaded the source code to OVALDI, found the cmdlet code, and quickly got lost. While getting lost I noted that there didn't appear to be anything that would have restricted what I was trying to do. So OVALDI was off the hook.
Since the validation content worked on previous versions of Windows I wondered if something changed with PowerShell in Windows 8. A little more searching showed me that Windows 8 and Windows Server 2012 use a new version of PowerShell, version 3.0. Could this be the problem? Does OVAL/OVALDI just not handle version 3.0 since it is new?
By this point I was working with a pretty simple test definition that would bring back a list of services. This looked a lot like something from the validation content, so I was pretty confident that I had it right, yet it wouldn't work. Then I noticed that there is a module_version field in the cmdlet_object... is it not working because Windows 8 uses PowerShell 3.0? I wasn't too confident, as the validation content had 1 as the version, but I thought PowerShell 2 was in use. Taking a guess, I updated the version from 1 to 3.0. Ran the content again and wouldn't you know it, I had a list of services. So OVAL and PowerShell can work on Windows 8, it was just my new stuff that didn't work. I guess that's progress...
All I needed to do was make my new Get-AppxPackage call use version 3.0 and I'd be good, right? Nope. Umpteen revisions to my content all have the same results, no data coming back. Then I start thinking about the module referenced in the test content... Microsoft.PowerShell.Management... maybe Get-AppxPackage isn't part of it. Since there is virtually no documentation of PowerShell 3.0 I can't find anything that says where Get-AppxPackage lives. I did some searching for commands I could run in PowerShell that might tell me more about how everything was organized and I stumble onto a nice PowerShell 3.0 addition - a command to launch a GUI editor. I launch it, find the command I've been trying to run, and there's nothing useful there, no drilldown for more info, nothing. After a few seconds a tooltip pops up and it lists a module name that doesn't look anything like what I've seen used before. I find a few more commands, get a bit more info, and I'm convinced that I had the module wrong. So I update the model name in my content and it worked.
Just kidding, this isn't close to long enough yet. After numerous trial and error attempts I think about the version number again. Surely since the Appx module that I've finally identified is new in PowerShell 3.0, and pre-existing modules didn't work unless I changed the version from 1 to 3.0, the version needs to be 3.0. Just for fun though, let's try 1. Would you believe it worked? Yes, something that used to work with version = 1 only worked with version = 3.0, and something new in 3.0 would only work with version = 1.
Finally I had the list of installed applications. All I needed to do now was filter the data and I'd be done. But I noticed how slow the PowerShell stuff was to run, and if I continued down my path we'd be taking that performance hit in every Windows 8 application inventory definition. If instead of limiting the results to the app I wanted I just brought back the full list, then looked to see if the app I wanted was in the list, then all inventory definitions could be based on a single pull of the data. So I threw in a state that I wanted to match, containing the name of the app, and gave it a shot.
Big shocker, that didn't work. After some more trial and error I figured out what was happening, but not a solution. A cmdlet_object is fairly unusual in OVAL in that it returns what amounts to a nested array of values. You get back one object that is a set of sets of name/value pairs. I wanted to return true if any of the inner sets contained the name "name" and the value "FlightAware.FlightAware", but by default OVAL says that all of the inner sets have to contain a match in order to return true. Fortunately there's entity_check in the state, which controls how many matches there needs to be. Adding that attribute with the right value finally gave me the result I wanted.
So after probably 4 hours of work, research, trial and error, and a couple of good hunches, I can now write inventory definitions for apps bought through the Windows Store. If I can keep my eyelids open as I finish them off I'll be sending to the OVAL Repository the Windows 8 OS definitions and a selection of the most important apps (FlightAware, Cut the Rope, and Pinball FX 2). I was pretty excited about figuring this out. How many applications to go?
I should probably address my experience using Windows 8 a little bit. While there are some things that are a little weird, and running in a VM presented some extra challenges, I found it far easier to get used to Windows 8 than I expected it to be. As a productivity tablet OS on the Surface (vs. an entertainment table OS on the iPad) I can see it being great. As a desktop OS it is certainly a big change, but there's a lot of good in it. If my G2 machine met the system requirements I don't think I'd mind switching to it as my primary OS.

Thursday, August 2, 2012

Distributed Databases - Round 2 - MongoDB

By:  J.B., Software Engineer @ G2

MongoDB

Overview

MongoDB is a schema-less, document oriented, distributed database. Documents are built upon key-value pairs and incorporate a variety of data types, primarily dictionaries, arrays, lists, strings, and documents. Embedding data structures inside such documents alleviates the need for SQL joins, increasing performance. MongoDB’s document structure is based upon JSON and binary JSON (BSON) to allow for quick and easy use. It was designed with high performance, availability, and horizontal scaling in mind. MongoDB was developed by 10gen in 2007 and is being used by MTV Networks, Craigslist, and Foursquare.

Features

Schema-less
A major selling point for MongoDB is its schemaless document based design. These documents are based upon a Key Value pair that looks like JSON. Users are able to create documents that have specific fields in certain cases and omit them when it’s not necessary in other documents. For example: one could create a “blog” document that contains the properties title, author, date, and blog post. In another “blog” document one could add a “comments” field, a tag field, or both.
Join-less
Since this is a NoSQL database, there are no joins, which increases performance. Mongo is able to index on keys from embedded documents or data structures. Mongo is built with fault tolerance in mind by having highly replicated servers with automatic failover occurring when a node goes down.
Master-Slave Relationship
Mongo has a master-slave relationship with the distributed servers. The master is able to perform writes/reads. All the slaves across the shards will read from the master and will be used for reads by the clients. There is a feature called auto-sharding that allows for data partitioning across the data servers. There are backups of the shards preserving the data. Using the auto-sharding method MongoDB is able to scale horizontally by adding new servers.
MapReduce
Another major feature is the ability to perform batch processing such as MapReduce. It is a built in feature in which all the user has to do is create a mapper and reducer process. There is an incremental MapReduce feature if a given dataset is continually growing and prevents aggregating over the entire dataset every time.

Limitations

Access Control
A major limitation in MongoDB is that it has no security built into it. There is no notion of permission or role directly inherent in the database. Developers intentially left security to the application level.
Auto-sharding
When new servers are implemented the auto-sharing feature automatically kicks in and can slow down processing.
Document Size
Depending on what is being saved the file limitations could be a limiting factor with 4/16 MB document sizes.

Use Cases

Rapid Development
MongoDB would be excellent for rapid agile development. It would be beneficial for modifying a dynamic type of input where specific fields would be needed in specific cases. MongoDB was built to allow users to focus more on the application itself and not the database.

Wednesday, August 1, 2012

Databases, DHT's, and Data (Oh My!)

By:  B.Y., Software Engineer @ G2

Recently, or not so recently, a co-worker and I worked on a grant to explore the different distributed databases that were available. We ended up choosing four that we found which seemed to be most prominent. Now that our research has ended it came to our attention that others might benefit from this knowledge as well. With that we thought it best to release it as a series of blogs posts; one a week featuring an overview of said database. Given that we're starting off with Redis...

Redis

Overview

Redis is an in-memory key-value data store that supports a wide variety of atomic instructions. It operates on a single CPU and allows for scalability increases through multiple instances. Each instance will spin up on a separate CPU, thus allowing scalability through sharding across multiple cores and physical machines. The creator of Redis is a man of Italian descent named Salvatore Sanfilippo with all support funding coming from VMware. Redis is currently being used by StackOverflow, Github, Blizzard, and more.

Features

EXPIRE and TTL

The EXPIRE command sets the time to live (TTL) for a given key. The TTL command allows for an immediate check of the time left on a given key. This ‘age-off’ capability adds immediate value when discussing short term, or short lived, values that are queried often.

Data Structures as Values

Redis allows for abstract data structures to be put into keys, namely lists (L*), sets (S*), and sorted sets (Z*). These values can be operated on (addition, union, etc.) incrementally within the data store.

Persistence

Redis can have its data serialized to disk periodically to allow for backups and fault tolerance. This feature spins up a separate thread on the CPU to allow for a minimal hit to the performance of the actual data store and ensures that in unknown circumstances or power outages a Redis store can be restored into memory.

Limitations

Memory

Redis currently operates within the bounds of physical memory on a given system.  If more is needed it falls down to malloc() calls which, with no available memory left, will just return NULL and fail to insert. Further, Redis adds a set of metadata into each key-value pair which can add up to become a nontrivial number when dealing with small keys.

Use Cases

Cycling Data

Redis is very good at aging data. As seen with the EXPIRE and TTL commands any area where one needs a higher-level ‘cache’ region for rolling or cycling data that is frequently queried would be ideal.

Small / Frequent Data

Given its entire in-memory usage, and equal limitations, Redis makes a prime candidate for frequent and updating data, such as statistical primitives.