Searching for a way to parse binary files

statsman

Thinks s/he gets paid by the post
Joined
Apr 17, 2008
Messages
2,025
I have a series of binary files which have a fixed number of bytes per entry. Contained within each entry are five fields that are actually in ASCII. Each of these five fields all start at the same byte offset within each entry and have a fixed number of bytes. I would like to process each binary file to extract those five fields and output them to a text file.

I suspect a C compiler would be appropriate here, but I haven't had one since Turbo C/C++ many years ago. I'm looking for any suggestions that would work on a Windows 10 computer. There is the potential to eventually process over 150 files, and each file will have anywhere from a 200 to well over 1,500 entries each.
 
Elixir (and Erlang) have an elegant method for binary parsing via pattern matching. Once you get a handle on the syntax, it's a pleasure to use versus implementation in other languages.
 
It is more a matter of do you want to use a programming language you already know, or use this as a means of learning a new one. If you still want to use C/C++, there are free compilers for Windows, such as Visual Studio Code, Codeblocks, CodeLite, etc. Otherwise, as stated above, there are many programming language options - I an doing most of my current development in python so that is my current bias.
 
It’s been awhile but I thing awk, and similar Unix commands, might do it for you
 
Yes, agree with previous posters. Unix commands (like grep or awk) can surely do it. As you are looking to run this on Windows, you would need something like a Unix console/emulator (e.g. Cygwin). If you post your requirements and sample file on unix.com forum, you might get some folks to even write the command(s) for you.
 
I had a similar task that I wanted to accomplish a few months ago on a MacOS desktop: parsing a bunch of old Nokia “.vmg” messages that had been downloaded from a flip phone to a Windows machine. The easiest way to do it was using a scripting language (like Python or Perl as mentioned). I chose Tcl (tool command language) mostly because of familiarity but it’s not in fashion.

I’d installed a pre-built package for Mac from Active State and that made it easy.

http://activestate.com
 
I really should have taken the time in the past to completely learn Python. This and a few other tasks I've stumbled upon over the years might have been a lot easier.

That said, I came up with a somewhat quick and dirty solution that I could run from a Windows batch file. The first step was to run each binary through a command line program called Certutil.

The specific command is "certutil -encodehex {binary file} {text file}", where the text file that is output by this command looks a lot like what you would see from a Hex editor like HxD. Each output line has an address offset, followed by a series of hexidecimal code, following by the ASCII text equivalent, if it exists.

Once this output text file is created for a binary file, I then processed the text with mawk, which I do have installed on my PC and have used quite a lot. I had mawk extract the text I wanted, which in some cases was on more than one line of text output from certutil, and a single line of text I was looking for was output by mawk.

A somewhat kludgy process, but once created, I was able to process the 150+ sizable binary files rather quickly. I still should learn Python. It would have been easier.
 
Back
Top Bottom