Using BagIt in 2018

One of my more popular posts to this blog has been my 2016 round-up of BagIt, the Library of Congress’ seminal file packaging specification/software library. My overall explanation for what BagIt is, why it’s so important, the still-scattered state of documentation, and the need for a roundup of implementations for practical use all still stand… but I’ve realized lately that this post/topic could use a revisit, for a couple of reasons:

1) A year on, I’ve done a lot more interaction on GitHub and with open source software, and I regret my general tone when discussing the need for better BagIt documentation. One of the beautiful things about open source projects (and BagIt particularly, since the LoC hosts all the code for the BagIt libraries and several of its implementations on GitHub, which is *made* for collaboration) is the opportunity for direct, constructive feedback. I should have raised my problems with unclear documentation as an issue on GitHub (looky here, just as I did while preparing this post), or at least posed my confusion as a question/concern to be improved, rather than as a complaint “behind the backs” of the developers! Etiquette is important, and I will do better at remembering that digital preservation is not an unfeeling collection of tools and tech – there are people behind every line of code and every social media post (OK besides the twitter bots but you know what I mean).

2) Software changes! It updates! That’s the whole point! And instructions that worked even a year or two ago may no longer work in the most contemporary environments. To that end, there have been some changes in macOS systems in particular that make me want to create new installation instructions (particularly for bagit-python) to help people avoid headaches.

So check out my previous post for why BagIt’s great – and then look below for a new roundup of how and why to use its various interfaces and implementations in 2018! (yeah I know it’s still 2017, but as much as digital preservation is about constant updating I’d like to future-proof this thing by *at least* two weeks, ya know?)

1. Bagger (GUI)

It’s Bagger! Still a nice intuitive GUI interface with big honking buttons for the basic tasks of bagging (creation from multiple files or bagging a directory in place, adding metadata, verification/validation). Still probably the best/most intuitive implementation for novice users. And the LoC GitHub repo for Bagger now has specific first-time installation/run instructions for both Windows and Mac. Beautiful!

2. bagit-java (library + CLI)

The LoC’s bagit-java 5.x library can be incorporated into any scripts or applications written in Java (such as the two GUI implementations elsewhere on this list). It can not, however, be interfaced with as a stand-alone command line utility. For that, you can still install and use bagit-java version 4.x, even though that version has obviously been surpassed and is not being actively developed. For installing and using bagit-java via CLI, you can use Homebrew (note that you will need Java installed as well):

$ brew install bagit

which installs bagit-java v4.12.3. Documentation for using bagit-java can be found in the utility’s help page, invoked with:

$ bagit –help

Just a quick note: the –help page incorrectly refers to the command to invoke bagit-java. The help page usage example says to use “$ bag <operation> [operation arguments]”, but the correct syntax is in fact “$ bagit <operation> [operation arguments]” ! (per my question on GitHub about this, apparently this problem is hard-coded and would require recompiling the Java source rather than just tweaking a doc, so since bagit-java CLI isn’t actively maintained, no fix is forthcoming)

Screen Shot 2017-12-15 at 11.54.15 AM.png

3. bagit-python (library + CLI)

So, this section is really less an update on bagit-python and more an update on python itself. Bagit-python can still be used either as a library to integrate into scripts and applications written in python, or as stand-alone command line utility. Your preference for using bagit-java or bagit-python in the CLI could be decided by looking at both utilities’ help pages – but I would actually reverse my previous recommendation and go back to generally recommending/using bagit-java, as the bagit-python CLI appears pretty exclusively aimed at creating bags-in-place and has fewer commands for verification, splitting, flexible bag creation and updating, etc. In either case, if you are interested in using/installing bagit-python, changes in recent macOS versions have meant that my previous instructions created more headaches than intended.

(Thanks to the brave MIAP students in Video Preservation who discovered and tried to deal with these inconsistencies!!)

So, for explanation: starting with OSX 10.11 (El Capitan), Apple introduced a feature called System Integrity Protection, nominally to keep unverified or malevolent applications downloaded from the internet from messing with critical OS-installed system software. What this means is, without futzing around a lot with permissions (which is not a great idea for a novice user), using a package manager like Homebrew winds up with some software in the OS-controlled “/usr” directory and its subfolders, and some software in the user- or package-manager-controlled “/usr/local” directory and its subfolders.

My previous instructions, which directed people to mix the default macOS-installed version of Python with the user-installed versions of pip (python’s package manager) and bagit-python, generated a whole bunch of permissions issues.

The solution? Stay away from the macOS python altogether and install all components with a package manager to keep the installation contained within “/usr/local”.

So, assuming you have Homebrew installed:

  1. $ brew install python

    This will install Homebrew’s Python 2.x package (currently 2.7.14), which includes the pip package manager by default (macOS’ Python package does *not* include pip by default). Note however! Since your Mac already came with a python installation (at /usr/bin/python), Homebrew renames its versions “python2” and “pip2” to avoid confusion/overwriting. (so its commands/binaries live at /usr/local/bin/python2 and /pip2)

  2. $ pip2 install bagit

    THAT’S IT. “sudo” shouldn’t be necessary. You can invoke bagit-python commands with

$ bagit.py [path/to/directory]

just as before. Check out the help page with the “–help” flag for more info.

(If you already tried to install bagit-python with the previous instructions, you will likely need to do some cleanup in the /usr folder to clear everything out and stop throwing errors. If you need help or advice doing this, feel free to get in touch!!)

4. BagIt for Ruby (library + CLI)

The “bagit-ruby” implementation has been expanded and documented since last year! If you are interested in including a BagIt module in a Ruby application/script, or using this version via the command line, you’ll first need to install Ruby:

$ brew install ruby

Which will include Ruby’s built-in package manager, gem.

$ gem install bagit validatable

Note that you can’t install this Ruby package and the Homebrew package of bagit-java at the same time, as you’ll get a collision with them named the same thing in /usr/local/bin. Once downloaded/installed with gem, the BagIt for Ruby CLI is documented at:

$ bagit –help

….but it’s real basic, even compared to the CLI for bagit-python. This particular implementation is probably most ideal for its library and incorporation in Ruby scripts/apps, not necessarily for direct command line interfacing.

5. Exactly (GUI)

Not much more to say about AVPreserve’s packaging/transfer application since last year – but the combined ability to not just bag, but deliver or receive directories over standard network protocols still make it a great option for those on Mac or Windows and in need of a simple workflow that combines two major ingest steps (bagging and delivery) into one quick and easy tool.

6. bagger-js (experimental library + web app)

Likewise, the LoC’s BaggerJS library/app could serve as both bagging and delivery system, via a web browser interface instead of a stand-alone, downloaded app. It’s basically “bagit-javascript” – that is, the BagIt library written in JavaScript (which is a web programming language entirely separate from Java). I assume it’s referred to as “bagger-js” because in the LoC’s naming system, “bagger” implies a GUI, whereas “bagit” is just the underlying library or CLI.

Bagger-js is still referred to in the LoC GitHub repo as “experimental”, so the library and accompanying demo web interface (which can bag a local directory and send it to a remote server compliant with Amazon’s s3 protocol) are not production-ready like Bagger or the other BagIt libraries/interfaces. But, again, all the work that they’ve done so far is right there and available to adapt/incorporate into your own JavaScript/web app projects!

7. other apps

Of course, there are likely a number of applications or other pieces of software that incorporate BagIt as one piece or microservice of a larger workflow/system. Archivematica’s a major one that I’m aware of. Maybe you have another! Feel free to let me know what I’ve missed.

 

Advertisements

Author: Ethan

Ethan currently lives in Connecticut and works in digital and audiovisual preservation. Detailed thoughts on 1930s Soviet cinema available upon request.

2 thoughts on “Using BagIt in 2018”

  1. Hi there! It’s difficult to find anything interesting on this subject (that is not overly simplistic), because everything related to 3D seems very difficult. You however seem like you know what you’re talking about 🙂 Thank you for finding time to write good content for us!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s