8 Year of Lightroom

Lightroom 8 Years Later: An Introduction

It’s been just almost a decade since Adobe released Lightroom, their digital asset manager (DAM) and non-destructive raw image editor. In that time, Adobe has made a number of improvements in many respects, from improving the raw engine to adding new processing features, to finally adding GPU rendering support. However in many ways Lightroom hasn’t changed much at all, and there are all kinds of interesting quirks and issues from it’s nearly decade old heritage.

Truthfully, age hasn’t been kind to Lightroom. The user interface (UI) is dated and clunky. There are a whole host of quirks, pit falls, and short comings. A lot of the internals and back end processes make interesting trade offs that sometimes are woefully inefficient. Finally, based on the few fits and starts we’ve seen from Adobe, it’s not clear at all whether they ever plan to address any of the deficiencies going forward.

When the first public release of Lightroom hit the market in February 2007, it wasn’t the first commercially available raw processor and digital asset manager (DAM). Apple beat Adobe to market by more than a year with Aperture. However, Lightroom was the first cross platform non-destructive raw editor and DAM, which was largely a first for us Windows users.

I switched from using Bridge and Photoshop to using Lightroom with version 2 back in mid 2008. At the time, for me at least, Lightroom was a breath of fresh air and quite revolutionary. The develop/process as a recipe style of editing made it significantly more space efficient than doing a similar kinds of edits in Photoshop, and the ability to make tweaks and easily sync them across many images made working with large numbers of similar images, very efficient.

However, even back in 2008 there were quirks and issues. Performance was middling at best, the Camera RAW engine didn’t have all that great of noise reduction or fine detail rendering capabilities, and the UI had a number of holes. At the time, I certainly didn’t find any of this all that big of an issue; the improvements from the workflow as a whole vastly exceeded the drawbacks that might crop up from time to time. Besides, in 2008, Lightroom was only on version 2 and there was plenty of time to smooth out issues.

Fast forward to 2016. Lightroom is now up to version 6. In the intervening years, Adobe has significantly overhauled the rendering engine twice, once in 2010 and again in 2012. Most notable in these changes were significant improvements in noise reduction — though still not nearly as good as other commercial noise reduction software is capable of — fine detail rendering, and the addition of lens distortion and camera color corrections. Along with a whole host of other tweaks and improvements.

However, while progress as being made, all be it slowly, on the rendering engine. The same can’t really be said for the rest of the application and UI.

The only major change that Adobe has made to the Lightroom UI was the addition of multi-display support way back in 2008 when Lightroom 2 was released. Yes, there have been some smaller changes, like the addition of the Collections pallet to many of the modules, or the reworking of controls in develop with changes in the development engines. However, as a whole, the Lightroom UI has remained remarkably steadfast.

As far as functionality goes, that too has remained largely constant. The only major additions were in 2012, when Lightroom 4 added two new modules, Map and Book, to the existing Library, Develop, Print, Slideshow and Web modules. The Map module, added a mechanism for geotagging images and manipulating those tags, as well as visually placing and grouping images. The Book module added support for producing and publishing photo books, primarily though the Blurb service, but prepress files could be generated as PDFs for any service.

The open question is what does Adobe intend to do with Lightroom moving forward and what audience are they going to target. As a pro photographer, I’ve found Lightroom to be an indispensable and invaluable tool to make my life easier. However, by and large it’s capable and accessible enough that it can also be used by less serious users. Moreover, again at least in my experience, it’s easy enough to use that you don’t have to be a real computer expert or image manipulation expert to get along with it.

Unfortunately, at least as far as most software companies are concerned, serious and novice user bases are diametrically opposed positions. Generally speaking the “thought” process is that novice users wan’t simple programs that hide the actual controls behind abstractions and pros want direct access to those handles without having to figure out what the “coolness” slider is actually doing to their images.

The kind of clash between these two lines of thinking came to a bit of a head in late 2015 when Adobe patched in a new import dialog with Lightroom 6.1. The outcry against the new import dialog was so fierce that it was subsequently completely removed in Lightroom 6.2 just a few months later.

I wrote up a piece about the dialog when it was released. But the short version is that I thought it had promise. It was cleaner and more streamlined in many ways than the existing dialog. However, instead of continuing to refine the idea and reimplement the missing features, Adobe yanked it as a whole. What was clear though, was that there was a focus on making things more accessible to novice users; and while I rather liked some of the design of the new import dialog, the undercurrent of dumbing things down for simpler users concerned me, and still does.

Over the following weeks, I’ll be publishing a series of articles looking at Adobe’s Photoshop Lightroom, where it’s come, what’s improved, its shortcomings, its technical issues, are and what needs to be improved to make it better going forward.

Lightroom 8 Years Later: Modules & Organization of User Interface

Eight years ago, Adobe released Photoshop Lightroom, a unified digital asset manager and non-destructive raw image processor. Since then, Adobe has released 5 major versions of Lightroom, each making small revisions, tweaks, and sometimes adding new functionality. This is part 2 of my series looking at Lightroom in depth, inside and out, and what I think Adobe needs to do to continue to build on Lightroom as a tool for both professional and amateur photographers alike.

In this article I’m going to look at the overall Lightroom user interface at a high level. Many of the nuances and issues in many parts of Lightroom’s user interface (UI) stem from decisions made early on about the way Lightroom would work and act.

One of the problems that I run up against in talking about software UIs and how they were designed, is that there’s a great deal of interdependency in the decision process. A design decision can have ramifications that impact other decisions that in turn either reinforce or work against the initial design goal.

The other major problem I have with trying to look at Lightroom’s UI is that I’m not at all clear, even to this day, what market space Adobe is really trying to target. On one hand, as a pro photographer, Lightroom dramatically improved my workflow in terms of managing, searching, and processing images. In fact, I can say that same sentiment is generally true with most of the other pro photographers I know or work with. On the other hand, Lightroom makes this kind of workflow accessible to more inexperienced and less technically savvy photographers as well.

The trouble is that pro/serious user oriented interfaces are almost always an in scrutable nightmare for novices, and the converse are underwhelming for pros. However, without really having a handle on what market segment the software is actually targeting.

With that said, lets get started.

Lightroom’s UI stands out as an interesting exception pretty much everything else Adobe has done. To start with it’s inherently a single window, and therefore single screen, application. Yes, Adobe did add support for a second monitor, however, the support is somewhat hackneyed and contrived and generally more tacked on than though out — I’ll be coming back to this shortly.

Beyond just being designed for a single-window, the styling, overall organization, and general UI functionality inherent in the overall design is radically different, and radically simplified, compared to the rest of Adobe’s product catalog.

Modules as a means of grouping tasks

To start with, unlike any other Adobe application Lightroom organizes it’s functions into what Adobe calls Modules. Initially there were 5 modules:

Library Module: Where images can be viewed, and organized, folders and collections can be navigated, and metadata can be applied to the images.
Develop Module: Where images can be processed
Slideshow Module: Where images can be played back on screen in the from of a slideshow
Print Module: Used for printing images to either a local printer or JPEGs for online print services
Web Module: Used to build pre-rendered HTML and Flash web galleries that can be uploaded to a web hosting service.

Lightroom 4 added two more modules:

Map Module: For accessing Geo tagging information, as well as grouping and navigating images that have been geotagged.
Book Module: For creating photo books that can be either printed using an online service (Blurb) or exported as print ready PDFs.

This approach does have some merits, especially for a program that offers as diverse of functionality as Lightroom does.

For example, the Library module can be linked to Bridge, the Develop module to Photoshop, and the Book module to In Design — though obviously all of the stand alone programs are much more fully featured and capable.

The module system allows each sub-set of functionality to be nicely grouped with all the relevant controls and capabilities needed to be effective, without overloading the user with extraneous options. This at least makes the number of controls manageable, and the UI somewhat discoverable (and as a counter point, if you don’t care about say books, easily ignorable by hiding the module from the Module Picker).

The ability to compartmentalize functionality is also a pretty significant point when you consider single screen users, especially those on laptops. While I could — and with other Adobe software actually do — spread out all of the controls to maximize my working space and efficiency on my workstation where I have 3 displays; on my laptop, in the field, that’s not something that’s readily possible. There absolutely needs to be a way to sort through functionality easily and efficiently on single screen devices.

At the same time, while modules contain functionality they also aren’t totally self contained. In very significant ways, the Library module is the core of Lightroom. It’s where you have to be to effectively navigate your image library to bring up the folders or collections that contain the images you want to work on.

Adobe even implicitly recognized this dependance, and made some effort to address back in Lightroom 3 or 4 when they added the Library’s collections pallet to the Left Module Panel in all of the modules. However, this only goes so far, and if you don’t really use collections you’re somewhat out of luck.

Laying out the Workspace; Panels and Sidebars

Stepping in a bit tighter to the UI, the overall interface is laid out a bit differently than pretty much all Adobe software as well. Lightroom’s main interface is divided into 5 principal areas. At the center of the interface is the primary working area. This could be an image grid or a single image in the Library, the image you’re editing in Develop, or the layout of a book in the Book module, or whatever is relevant for the module you’re using. Surrounding the window are 4 hide-able panels. These are the Module Picker along the top; the Film Strip along the bottom; and the Left, and Right Module Panels (what I’m going to call sidebars from now on) on the left and right sides of the window respectively.

Outside of changing the size, hiding, or hiding one of the components of the Module Picker, Film Strip, or Left, or Right Module Panels, there’s very little you can do to customize or rearrange anything. This is in stark contrast to virtually every other Adobe application, where you can just about reconfigure the UI to whatever you find most useful for you needs.

The strict arrangement of what Lightroom calls Panels is also a deviation from the more typical Adobe design paradigm.

This is an area that goes back to my point about how it’s somewhat necessary to understand what market things are aimed at to really understand where the UI design was headed. In a whole lot of ways, panels mechanic is just awkward.

For example, if you expand all of the panels in either of the sidebars they scroll off the screen. You can kind of work around this by enabling Solo Mode, which will only expand one panel at a time. However, even in solo mode, long panels like the Keywords list, Folders panel, Collections panel, or develop History panel, can scroll off the screen anyway.

At a minimum things that aren’t on the screen aren’t very discoverable or obvious. At best, it’s a small but tangible hurdle that you have to jump every time you switch to or from that control.

Keyboard Shortcuts

Keybaord shortcuts is one place that really bugs the heck out of me with Lightroom. Adobe radically departed from their norms on this. In Photoshop and pretty much every other Adobe application, virtually everything can have a keyboard shortcuts associated with it, even if it doesn’t by default. More importantly, the Keyboard shortcuts are all user customizable.

With Lightroom, you’re stuck with not only what Adobe decided you should have shortcuts for, but what those shortcuts will be. Beyond that, they’re not consistent with, say Photoshop, for identical functionality. For example, the crop tool in Photoshop is the keyboard shortcut “C”; in Lightroom, it’s “R”. The “C” key in Lightroom brings up the Library’s Compare View.

For me, I hardly use the Compare View, at least I certainly don’t use it enough to warrant it being bound to a keyboard shortcut. At the same time, I certainly wouldn’t begrudge someone who does use it frequently from not having it on a keyboard shortcut. The most functional compromise is to let us users change the shortcuts.

Another example of this kind of frustration comes from the “F” key shortcut. Until Lightroom 5, the F key cycled forward though the fullscreen display modes. Press it once, and the titlebar would be hidden; press it a second time, and the menu bar and on windows the task bar (where the start menu is) would be hidden; press it a third time, and you were back to a regular window where you started. With Lightroom 5, Adobe decided that F should instead bring up a full screen loop view of the currently selected image.

For several years, I was always hitting F to switch to full screen mode, because the removal of the title, menu, and task bars freed up 73 vertical pixels of screen space on my display. While that doesn’t sound like a lot, it’s 6% of my 1200 px tall screen (and would be almost 7% of a 1080p screen), and is more than noticeable. All of a sudden that process gets changed to “Shift+f” to go froward and “ctrl+shift+f” to go backwards through the full screen states. It was an incongruous disjoint for me as a user, and the new fullscreen view isn’t as useful to me as the old one was.

Again, I don’t speak for everyone, and I certainly wouldn’t’ begrudge someone who uses the the new fullscreen view enough to warrant it being hot keyed as ‘F’. But again, if I could change the keyboard shortcuts to suit my needs this wouldn’t even be a point of contention.

Individually, none of these issues are really big deals. Even taken together they’re more like running frustrations for someone who is trying to maximize their efficiency when working. I certainly wouldn’t eschew Lightroom due to them; in fact, as much as they generally do frustrate me, I couldn’t imagine going back to the file browser and Photoshop approach to catalog management and image editing.

That said, you can’t push things forward if you don’t address the issues, and Lightroom is not a piece of software that’s ever really moved forward in any meaningful way. Cameras have improved, resolutions have increased, long time users have had their catalogs grow over as many as 8 years of contestant use. What may not have seemed like a big deal back in 2008, when nobody was expecting people to have million image libraries with thousands of keywords, is now more likely to be something that Lightroom has to deal with. So while I’m being hard on the software, nothing has ever gotten better by served nothing but praise.

Well, this has gotten a lot longer than I had anticipated, so I think I’m going to wrap this up here. Admittedly, I’ve really only scraped the surface here, and will be digging into more specific things in a lot more depth in future articles.

Lightroom 8 Years Later – Asset Management and the Library Module

The Library module is easily the core of Lightroom’s asset management functionality, and as a side effect of that very much the core of Lightroom as a whole. In this part of Lightroom 8 Years Later, I’m going to look at the Library module in mode detail.

The big picture aspect of the Library module is that it’s a thumbnail and image viewer with supporting aspects for importing, exporting (though it’s not limited to the library alone), and managing the images that are in your collection. Over Lightroom’s history, the Library module has remained mostly unchanged. There certainly have been some refinements and improvements, but the organization and features hasn’t radically changed.

Back in part two I wrote about Lightroom’s user interface, specifically the penalized organization and structure. The Library module, like all Lightroom modules, uses that system at least as best as it reasonably can.

Library Overview

In the Library module, the left panel primarily contains navigational elements that allow you to get around your catalog’s images. Specifically these are the Folder List, the Collections list, and the Publish service list. All of these collections are presented as hierarchal trees where child nodes (folders, or collection sets) can be expanded to show further children.

Lightroom's Library module. — Lightroom’s Library module.

These are all pretty standard navigation aids for traversing a hierarchically organized system; be that file system folders, or nested collections in collection sets, or the collections that are part of various publish services. The latter, publish services, Lightroom treats very much like a special kind of collection set.

The right panel in the Library module is dedicated to manipulating the metadata of the selected image or images. This includes a quick develop tool, keyboarding, and more specific IPTC and EXIF metadata editing and viewing.

Incidentally this kind of left to right mapping of general to more specific or start to finish is a general style in the Lightroom UI. And that consistency is reasonably useful.

The main section of Library module consists of image viewer. In the Library, there are several options for how Lightroom can display your images. These are; a thumbnail grid, a single image “loupe” view, a multi-image survey view, and a dual image x/y comparison view.

For the most part all of this functionality was available in the first versions of Lightroom, and haven’t really changed. And for the most part, there aren’t really any such great deficiencies that the Library as a image navigation and management tool is fundamentally crippled by the design.

That said, many of the details, such as the import system or the Keywording system, also haven’t fundamentally been overhauled, and often to their detriment.

Importing and the Import Dialog

Since Lightroom doesn’t just open and mess with images willy-nilly, it needs some mechanism to know what images it should track and deal with. A natural extension of this is also clearly the ability to copy images off a flash card to local storage as well. For Lightroom this is all handled by the Import dialog.

The import dialog is one place where Adobe’s approach of letting Lightroom’s UI languish has had some real consequences. Moreover, when they did try to revise the Import dialog in late 2015, they did so in what can best be described as a clueless heavy handed fashion. I wrote about the new import dialog when Adobe introduced it with mixed feelings — and my response was quite possibly the most moderate of any I’ve seen published.

Lightroom's Import dialog. — Lightroom’s Import dialog.

A large part of the problem with the new import dialog Adobe tested, and then quickly retracted, was that Adobe appeared to be under the impression that many Lightroom users would be accepting of dumbing down something as critical as the import processes. This is demonstrably not the case, and is very much contrary to what actually needs to be done in the import dialog.

At the same time, the late 2015 attempt at a updated Import dialog did underscore some of the problems with the Import dialog as I see them.

As a whole, Adobe follows the same left-to-right source-to-destination pattern in the import dialog that they have in the rest of Lightroom. The left most column is the list where you select the source of the images you want to import. The center of the dialog shows you thumbnails or a loupe view of the images that are in the source folder. Finally the right most column allows you to specify how the images should be tagged, manipulated, grouped, and where they should be stored on your computer.

As a whole, that organizational strategy is not a bad one. The problems with the Import dialog stem from a lack of attention to details.

One of the biggest oversizes that Adobe made goes back to their decision to add support for video files back in Lightroom 4. In fact, the whole video support is amazingly half backed as a whole — I’ll come back to this when I talk about exporting later.

When Adobe added support for video files, they simply expanded the filter set for supported file types that should be imported. Instead of just allowing raws, jpegs, tiffs, and a couple others, they also added the most common container formats for video (mov and mp4).

That was nice, and necessary to get video files into Lightroom, but that wasn’t all that needed to be done — it was merely all Adobe would do. They also needed to add options to deal with the video files more precisely as well. And a lot of this comes down to the half baked rest of the video support too.

The two points I feel are especially important are; first, the ability to filter the import by file type (e.g. video, stills, or both), and the second is the ability to store stills and video files in separate places.

I won’t say that these features are absolutely critical, but they would go a long way towards improving the incomplete video support that exists.

As for why I think these are necessary, the answer is simply that not everyone is going to treat video and stills the same way as a photojournalist. Moreover, video files are considerably bigger, and there’s considerably less that can be done with them in Lightroom, compared to stills.

Of course, a lot of this will of course come down to what exactly you’re shooting and why. In my regular travels shooting wildlife and landscapes, I frequently will find interesting behavior that is best documented in a video clip instead of stills. These clips aren’t part of a dramatic work so completely separating them from my library makes little sense to me. At the same time, there’s also a good chance that they’ll end up in some kind of edited work too, which makes separating them from the regular images handy. Moreover, video files are quite large compared to stills as such I frequently want to store them on a different disks than the rest of my Lightroom library.

In any event, even if I don’t want to mess with video at all in Lightroom, there’s no easy way to exclude it from the import process if it’s on the card. And quite frankly this is a poor solution given that it would not be difficult to filter video from stills just as Adobe filters new images from duplicates.

The other major usability issue I want to talk about in the settings dialog is the way the actual settings are presented. This is on place where I think the late–2015 “new” dialog did something right.

In the standard import dialog, there’s no way to hide the source list (left side) or the destination and attributes (right side). For many users, at least in my experience, most of this stuff doesn’t change from import to import, and doesn’t need to be prominently displayed all the time. Instead being able to hide or collapse those panels frees up more room to display the thumbnails that will be imported.

Beyond that, I’ve run into some real usability issues with the import dialog’s save to selection area and less computer savvy users. Specifically I’ve fielded a couple of cases where people have miss-clicked on a folder in the list which trying to click import in a hurry. Given their proximity, this is neither all that difficult to do either. They then end up importing their images into a folder that’s outside of their regular folder structure and then have trouble finding their images later on in the catalog.

On the other hand, the folder list is not something that generally needs to be changed every time you import files. Though I wouldn’t recommend removing it, as well there are certainly times where it is necessary or desirable to import things to a different location (e.g. when working on multiple external drives on a laptop). However, at the same time, it doesn’t need to be prominently located on main screen all the time either.

That about wraps up the import dialog, which bring us to the general process of dealing with the imported images and adding metadata to them.

Editing Image

Obviously everybody’s workflow is going to be a little different, but for me, and I suspect the vast majority of Lightroom users, the next step they’ll undertake in the Library module is what I would call editing. Specifically, I’m talking about the process of sorting through your images to find the ones that the keepers, the ones that are the rejects that need to be deleted and so forth.

Lightroom provides users a number of tools to use to edit and refine their images and most of them work pretty well. Or at least, I can’t find any real objections to them. Moreover, there’s a whole slew of ways to go about this, and honestly that’s a good thing given that people will obviously approach the process differently.

Lightroom provides users 3 main ways of marking their mages in the editing process; flags (pick, reject, and not flagged), stars (0 through 5 stars), and color labels (red, yell, green, blue, and purple). Additionally, these marking systems can be tagged on the fly using keyboard shortcuts. For example, ‘x’ flags a picture as a reject, ‘u’ un-flags a picture, and ‘p’ flags a picture as a pick. (Stars are assigned with 0 through 5, and color labels 6 through 9.) In addition holding shift, or turning on caps lock will advance the selection box to the next image after the flag or label has been applied.

Beyond the marking systems, Lightroom has the previously mentioned survey view and the x/y comparison view.

All told, Lightroom does a good job dealing with the editing use case. In fact, I really don’t see any obvious points where it really needs improvement in this respect. If anything, the biggest improvements to one’s editing workflow would probably come from a vastly bigger monitor and the extra screen real-estate that comes from that — used for either bigger thumbnails or more thumbnails on the screen at a time.

Metadata, Keywords, and the Map Module

The final big use case for the Library module is adding and editing metadata. I’m also including the map module in this discussion, as its sole purpose is to add and display GPS track data which, for all practical intents and purposes, is just more metadata.

Metadata

So lets start with the goods. Adobe made Lightroom’s UI flexible enough that the metadata entry box — the lowest box on the right panel — can be reconfigured with either a plugin or through the now deprecated metadata field list files.

I have to say, that while Adobe has deprecated the metadata field lists, they were the more approachable of the methods that could be used to customize Lightroom. Instead of deprecating them, I would have really liked to see Adobe integrate a field list editor right into Lightroom.

Okay, so I may have to back up a hair and address why one would want to edit the metadata panel layout. The answer for me boils down to having the fields I use and care about easily accessible without a ton of added overhead. Instead of me rambling about this, I’ve published my metadata field list, feel free to download it and see what I’ve changed for yourself.

Map Module and Geotagging

Back in version 4 Lightroom added support for geotagging and displaying the position of your images on an embedded Google map. This was done through what technically was a separate module, the Map Module, but I’m including it here for two reasons. First, the functionality on it’s own is quite limited, and second, because it logically relates to metadata and tagging images.

Lightroom's map module — Lightroom’s map module

In any event, the map module basically offers two bits of functionality. First, images in the active folder, collection, or selection with embedded GPS data will be displayed on the map in a location that corresponds to the GPS coordinates of the image.

Adobe has provided limited “geofencing” capabilities in that you can define a point and the radius around it to be a “Saved Location” that can be quickly referenced. Unfortunately that’s about all you can do with the saved locations functionality. You can’t leverage it to automatically add IPTC information such as city, state, country, country code, or sub location.

The other major bit of functionality that is available through the GPS module is the ability to apply GPS data to you images from a separately recorded track log. Most modern stand alone GPS units, as well as a whole slew of software for smart phones can generate .gpx formatted tracklogs with coordinates being set at various time and/or distance intervals. Lightroom can then import that file and use the time stamps in the file and on your images to add estimated GPS coordinates to your images.

Using tracklogs is something I’ve never personally bothered with. Largely because every time I think, “Oh I should do a tracklog for this,” I’m already done shooting. In fact, this is generally the problem I’ve seen with not having GPS capabilities integrated into one’s camera, the overhead of getting a log, making sure the clocks are synced reasonable well, and applying it after the fact, outweigh a lot of the utility of having GPS tagged images.

Beyond those two functions, there’s not a lot going on in the map module to complain about. The only major deficiency, and this is a deficiency in ALL of the modules other than the Library, is the lack of a folder navigator to use to access your images.

Keywording

The last major area of the Library module I’m going to talk about is the keywording aspect. This is quite possibly the most broken, or at least the lest finished, part of Lightroom’s library module.

Keyword entry and management are provided through two panels on the right side-panel bar. The “Keywording” panel provides a straight forward entry mechanism for adding and removing keywords from an image. It’s mostly a large text box that you can type fairly free form keywords into. In addition to that there are two 3×3 grids of keyword suggestions and user defined keywords sets.

The second panel is the Keyword List, which is what Adobe thinks passes as a keyword manager. The keyword list shows all of your keywords in a hierarchal tree (similar to the folder and collection lists).

Let me start with the good. Keywords in Lightroom are hierarchal. You can nest, say “abacus”, inside another keyword, say “things”. The hierarchy means that you can narrow or widen your search based on how you setup your keywords. If I have for example.

Things
- Math Instruments
  - Abacus
  - Calculator
  - Slide rule

I can search for a specific thing (e.g. abacus), a more general set of things (e.g. math instruments), or a really general set of things (e.g. things). This simplifies finding suitable material in the face of a client’s request; “I’d like to get a large print of tropical fish,” versus, “I’m looking for an illustration of a clear lion fish.”

Lightroom's keyword entry and management. — Lightroom’s keyword entry and management.

This is also useful as Lightroom will, by default, export all the keywords in the hierarchy. So exporting an image tagged as abacus, as shown in my above example, will also have the exported image tagged, math instruments and things. Since most image hosting services (e.g. Photo Shelter, Smugmug, etc.) will index the keywords of images, that abacus image will show up for searches of the other keywords as well.

In addition to being hierarchal, Lightroom’s keywords have self contained synonyms. I use this extensively with my wildlife images, where I’ll make the keyword the common name (e.g. bald eagle), and add the scientific name (e.g. Haliaeetus leucocephalus) as a synonym in the keyword. For all intents and purposes I’ll never see or need to address the scientific name in my workflow.

Like the parent keywords in a hierarchy, Lightroom will export the synonyms of keywords with images that are exported. So tagging an image with bald eagle, will also tag it with haliaeetus leucocephalus. While most people might not care, it does mean that someone looking for specific images of a specific subspecies (and I do get that specific when I know the subspecies) can find the images in a search using the scientific name.

Beyond just the mechanics of the keywords themselves, Lightroom provides a number of useful tools for applying them to images. The keyword sets that I mentioned previously can be applied on the fly using alt-number (1–9) based on the keywords position in the keyword in the grid relative to a numeric keypad. There’s also a keyword paint brush that can be used to apply and remove keywords by clicking on the image in the thumbnail grid.

Generally speaking most of the aspects of Lightroom’s keyword system are well implemented. The big problem ultimately is the lack of any kind of serious keyword management mechanism. Adobe’s idea of a keyword manager is the Keyword List panel. This works fine for a few 10s or maybe 100–200 keywords, but falls apart very quickly after that.

On my 24″ screens there’s only enough room for about 35–40 lines of keywords. Again, this is certainly passable but the last time I counted, and ignoring synonyms, I had nearly 1000 keywords in my keyword list. Trying to manage that is nearly untenable given the single scrolling list the is the keyword list.

Of course, there are ways to mitigate the problems, if you get ahead of them. If you know that you’re going to have a lot of keywords from the start, you can make extensive use of hierarchies to organize your keywords as you go. Of course, most of the time you don’t realize you’re going to have as many keywords as you end up with, or you over estimate your needs and end up with needlessly deep hierarchies and overly complicated categorizations.

Wrapping Things Up

Over the last 8 years, Lightroom’s library hasn’t really seen all that much in the way of change. And honestly, I’m not sure that it’s a bad thing. The vast majority of functionality that’s needed in a solid image organizer is there and has been there since the beginning.

Almost all of the systems and mechanics are generally quite flexible — at least to the extent that it’s feasible to make them. There are multiple ways to deal with editing and categorizing images, and Lightroom doesn’t strongly push anyone into any specific system. If you don’t like creating categories all the time, you can use folders.

The flags, stars, and color labels make it possible and effective to approach editing and notating images in a flexible way that works best of your.

Even the keyword system is largely mechanically sound.

The lack of change in Lightroom’s library module hasn’t been that much of a detriment. However, there are still areas where there’s room for improvement. The Import dialog can certainly be improved, especially with the addition of video support to Lightroom and the potential desire to not want to import video or store video in a separate place than with ones images.

Additionally Lightroom really needs a proper fully fledged keyword manager at this point. For those of us that have been accruing keywords since Lightroom 1, or 2, our keyword lists have grown to the point that effectively managing them is closing on being an untenable situation.

So with that said, that wraps up this entry in my series on Lightroom 8 Years Later.

Lightroom 8 Years Later: Core Technology

In the first couple posts of this series I’ve talked about Lightroom’s UI, and I’m going to probably get back to that in the future, but I also want to look at some of the technology in and around Lightroom. I also want to do this in part so I can talk about a couple of core technical issues in more detail in future articles as well.

There are three core technologies that Adobe has leveraged in Lightroom that make it what it is. Surprisingly some of these are open source products, one of which is absolutely a critical core part of Lightroom. These core technologies are Adobe’s Proprietary Adobe Camera RAW engine, the Lua programming language, and the open source database engine SQLite.

Adobe Camera RAW

It should be well known that Lightroom leverages Adobe’s Camera RAW technology to do all the heavy lifting in Lightroom. And, really, why shouldn’t they. Camera RAW is the same rendering engine that Photoshop, and really all Adobe programs that support reading raw files, use to convert the raw file into a useable bitmap.

As far as raw engines go, camera raw isn’t horrible. It’s been a while since I’ve seen a good comparative between camera raw and the competition. However, the last major over haul, process version 2012, put camera raw on par with Phase One’s Capture One in terms of the ability to distinguish and render fine details, and pretty close with it’s peers in terms of noise reduction.

That said, ACR does lag behind more specialized tools in many respects. Dedicated noise reduction software, like Noise Ninja or Neat Image, can typically do much better noise reduction that ACR can.

Likewise, specialized raw engines from camera manufacturers can have more detailed profiles and capabilities. Canon’s Digital Photo Professional software, for example, can preform very specific lens corrections (axial chromatic aberrations, and field curvature removal for example) as well as diffraction correction for supported lenses.

That said, I don’t really fault Adobe and ACR for not being as capable as specialized software, after all ACR has to support virtually every camera and lens made, and honestly it does a very good job of that. Things like advanced lens aberration correct and diffraction correction require extensive profiling of various lenses to know exactly how they behave, and that’s just not something that I’d expect Adobe to be able to effectively compete with the manufacturer designing the lens in the first place.

That said, Adobe’s distortion and chromatic aberration correction does work really well, especially given how easy it is to generate your own profiles and test images.

Arguably the biggest problem with ACR and Lightroom is the fiefdom problem in digital photography. Adobe’s ACR engine is entirely closed proprietary technology. When a manufacturer releases a new camera, Lightroom users have to wait until Adobe can build the required updates into their rendering engine. In some cases, this can take several months, making it painful for some early adopters of new hardware.

It would be nice, if Adobe had designed the rendering engine in Lightroom around a more flexible approach. Something that allowed manufacturers to write and publish Lightroom rendering libraries for their cameras that offered users the best possible image quality from the manufacturer though the convenience of Lightroom’s cataloging engine and DAM. But of course, that’s just wishful thinking.

The Lua Programming Language

If the ACR rendering engine is the core of Lightroom’s image processing, the Lua programming language is the core of Lightroom’s extensibility and, if the number of binary Lua files in the Lightroom installation are any indicator, a major part of what Lightroom it self was written in.

Lua may not be my first choice in programming languages. However, in its role as the public scripting/extension language for Lightroom, it’s certainly reasonable — if not the best choice. Lua interpreters are small and easy to embed in an application. The language is relatively straight forward to learn for those who aren’t serious or professional programmers, yet it’s still powerful enough to allow sophisticated programming concepts to be applied. Moreover, Lua is very commonly used in this kind of situation.

In fact, this is one of those places where I think Adobe absolutely did the right thing in Lightroom. While the language itself doesn’t see the popularity that other languages see, even other open source languages. Lua as a language has been round for more than 2 decades. It’s extensively used as an in-program scripting language in various industries — the computer games industry being one of the bigger spaces where I see Lua used in this way a lot.

SQLite — Literally the Lightroom Catalog

The last bit of technology I want to touch on in Lightroom is the use of the open source relational database engine SQLite. To say that Lightroom uses SQLite is almost an understatement. The catalog file is literally an SQLite database with a custom file extension (lrcat).

This is another place where Adobe did the smart thing and leveraged existing stable well tested and supported code instead of trying to roll a half baked solution of their own. And given how fundamental the catalog is to Lightroom’s functionality this is not a place where Adobe could afford to have a half-baked half-reliable implementation.

At the same time, SQLite is extremely well tested and used on massive scales. A huge majority, if not all [1], Android apps use SQLite to store their settings; as do many iOS apps. As do many desktop and standalone apps on Windows and Mac OS. The sheer number of places where SQLite is used to provide core functionality is staggering. That breadth of use also means that it’s extremely well tested to insure it’s correctness and stability.

That said, while the catalog is build on a solid foundation with SQLite, that doesn’t necessarily mean that the actual design of the catalog’s database tables is all that great — and this is going to be focal point in the next couple of articles in this series.

With that said, I’m going to wrap up this part here. As far as the core technical stuff goes, Adobe did a pretty solid job when they designed Lightroom. By leveraging widely used open source tools that were designed to be robust to start with, Adobe potentially had more time and resources available to focus on the image specific aspects of the software; the ACR engine. Even with respect to the ACR engine, Adobe deserves some credit. Over the 8 years that Lightroom has been around they’ve evolved it into a very capable raw engine with a massive amount of supported cameras and lenses.

I believe SQLite is integrated into Android OS as the default mechanism for storing settings and other persistent data. Though I’m not an Android user or developer so I may be wrong on this. ↩

Lightroom 8 Years Later: A Critical Look at Virtual Copies

In the first couple of articles in this series I’ve looked at some of the more broad user interface points about Lightroom. Now I want to start getting into some more technical aspects, starting with virtual copies.

As I’ve said over and over, one of the biggest features of Lightroom is that it doesn’t directly manipulate the pixel values of the images in the catalog. Every image you see in the interface is made up of two “parts”. First, there’s the original image file, be it raw, jpeg, tiff, psd, or what not, on the disk. Second is the set of instructions that tell Lightroom how to process and display an image; these are stored in the catalog.

One of the advantages of having these two separate parts to an image (the raw file and the recipe) is that it enables a very space efficient way to address multiple alternative versions of an image. You don’t need to store completely separate files on disk, you just need the bookkeeping in the database for two images that point to the single file on disk.

It’s not like the space savings of virtual copies is something to sneeze at. The develop recipe can be as little as a few KB. Even big develop recipes — and I’ll be going into much more depth about the storage of Lightroom’s develop settings in a future article — will almost always be under a few hundred KB. Compare that to the 3–6 MB needed for a 10 MP JPEG, never mind a 50MP raw at 60–70 MB, or the even larger file sizes needed for fully converted RGB tiffs or PSDs.

Make no mistake, Adobe unquestionably got the overall concept of virtual copies right. However, there is one major detail that just drives me up a wall.

Adobe decided for some reason, to impose an artificial distinction on the images in Lightroom. There are Master Photos and Virtual Copies, and these two “types” of images behave differently in certain subtle but important ways.

Master photos are the first entry in the catalog that’s created when and image is first imported. Beyond that there’s nothing special about them other than that artificial distinction and the subtle differences in behavior.

I can only speculate as to why Adobe chose to do this, but if I had to guess I think maybe Adobe was trying to retain some kind of skumorphic analogy to film. That is, the first slide/negative is the master image, from which you make copies and do your work. However, such a distinction doesn’t exist in digital; all copies are identical to the source they came from. When all copies are identical until you change them, the impetus to track the “first” or even generations at all is almost completely negated.

Moreover, it’s not like there’s a clear technical limitation that requires this behavior. I’ve dug through the catalog. Adobe properly normalized the file references from the image table. Many images can refer to the same files. The distinction is in fact extra information that’s stored.

It would seem, to me at least, that Adobe thought there was some benefit to this. Certainly you can filter your library by master and virtual images. But I have to be honest, I’ve never really seen the utility to this.

Since I hinted at subtle differences in behavior I should probably enumerate them. So the first difference is that you can filter the catalog based on whether an image is a master image or a virtual copy.

The second difference has to do with deleting, or removing the image from the catalog. When you delete a virtual copy, Lightroom only removes the virtual copy from the database. The image file remains on disk, and the master image and all other virtual copies remain in the catalog. However, when you delete a master image, Lightroom removed all of the virtual copies that are derived from that image and potentially also the raw file stored on the disk.

Lightroom’s master images are “more important” than a virtual copy, even though they’re technically the same thing as far as the database is concerned.

Perhaps the best way to understand my complain is though an example.

A lot of times I’ll work many similar variations on the processing of an image to find the way I like the best. I do this by making a virtual copy and tweaking the settings repeated. In the end I may have a processed master photo and a half dozen slightly different virtual copies. Even though virtual copies don’t take up much space, I find having a lot of extraneous images in my library less than desirable.

After my process of working up alternative images, I typically clean up the ones I don’t want and move the “choice” develop settings to the “master photo” to keep. With Lightroom’s implementation of virtual copies, to do this I have copy and paste the develop settings from my choice virtual copy to the master image and delete all the other virtual copies. What I should be able to do is just delete all the “copies” I don’t want and have the last remaining virtual copy become the “master” image.

The difference is not huge, and the workaround I use is workable. However, given that there are no real technical reasons that virtual copies need to be treated different than master photos, the process is just frustrating to me.

The other are where I think Adobe could further improve the use of virtual copies is with respect to publish services. One other desire I have generally is to insure stability of images in publish services. I don’t, for example, want to have a different version of an image listed on my site than what I have on my computer to make the print from. I do this by creating virtual copies to attempt to fix in place the settings. It would be nice, if there was a way to automatically create virtual copies on adding images to a collection or publish service.

So that about covers what I think I have to say about virtual copies in Lightroom. Again, Adobe did things pretty solidly. Internally, the catalog design is, so far as I can tell, sound in the implementation of virtual copies, and there’s nothing that would prohibit the more fluid interface, that I would prefer. Moreover, for the most part, the UI doesn’t suffer from a whole lot of major issues, and the benefits in storage space savings is significant as opposed to having multiple real copies to accomplish the same effect.

Next time I’m going to really burying into some technical details in the catalog as I look at the way Adobe stores develop settings, and the implications of that on the size and efficiency of the catalog.

The Lightroom Catalog and Develop History States

Over the past couple of months I’ve been writing a lot about the progress of Lightroom over the 8 years it’s been around. And over that time I’ve been getting to increasingly technical details. This time I want to talk — and to be completely honest rant a little — about one very technical points of the Lightroom catalog and how Adobe stores data.

Also, let me also be up front about this, I don’t, nor have I ever, worked for Adobe. I don’t know precisely why they made the decisions they made, I’d like to believe they had good performance driven reasons to do what they did. At the same time, it’s entirely possible they did what they did because it was faster for them to write or saved them a couple of lines of code, and didn’t anticipate that people would end up with heavily fragmented multi-GB catalogs or that any problems would be masked by better hardware down the road.

In a previous post I lamented about how the unbounded growth of the preview cache folder can be problematic when working on systems with relatively small storage limits or where you intend to push the storage to the limits. Inefficiency, and unbounded growth, seem to the be rule rather than the exception when it comes to the inner workings of Lightroom. This time I’m going to talk a little about the develop history mechanic specifically how Adobe stores the data, and how the data grows and expands the catalog size tremendously.

The Premise of Lightroom’s Develop History

In my last post I touched on how Lightroom stores the develop settings for an image as structured data in the catalog as opposed to altering the pixels in the source image. Space efficient virtual copies are one extension of this mechanism. The second is the ability to persistently save an undo history of all develop steps to an extent and with persistence that’s not really possible in a bitmap editor.

Where a bitmap editor, like Photoshop, has to store every pixel value that’s change to be able to undo those changes, the vast majority of Lightroom’s adjustments could be stored as a handful of bytes. This enables Lightroom to be relatively aggressive in creating history states, and persisting them for the life of the image in the catalog.

All told, between editing an image and exporting or publishing it, it’s not uncommon to have many saved history states for any given image. In my library I average around 15 states for every image; however, because I don’t process every one of my images, that actually means I have a lot of history states for a comparatively few images.

Investigating History States

Unlike the rest of this Lightroom series, this post actually started as a specific investigation into a problem.

One of my friends has been using Lightroom almost as long as I have. Unlike me, though, he shoots a lot more than I do and his catalog is much larger than mine is. One day he called me up and asked me about a strange error he was getting when he started Lightroom; something about not being able to connect to something. If he let Lightroom load for a while, it would go away and Lightroom would work normally. Moreover, if he restarted Lightroom it wouldn’t always come up on subsequent restarts.

Not having seen the actual error, my first guess was that it was maybe something to do with Lightroom trying to connect to Adobe’s servers. The next time I was at his place, I took a look at it and started digging into what could be causing it. I quickly determined that the error had nothing to do with internet connectivity, it was something internal to Lightroom. For some reason my mind immediately jumped to the SQLite subsystem that is the core of the Lightroom catalog file.

The first place I looked was his catalog file; it was approaching 4 GB, and a quick look with a defragmenting tool showed that it was heavily fragmented.

While fragmentation has become much less of a concern with modern filesystems, like NTFS, it can still be a problem if the file is extremely fragmented. In this case, there was a 4 GB file, that didn’t have a fragment bigger than a few 10s of MB. That level of fragmentation, especially paired with spinning disks, created a significant decrease in disk performance and therefore increased loading time — which was what was ultimately causing the error that the server wasn’t responding.

I did a poor man’s defragmentation on the file by moving it to another drive and back. As an aside, it’s necessary to move the file off the file system (or copy it and remove the original) in order to get the OS to write the data out in as few fragments as possible — though be aware this method will not always work optimally.

That seemed to fix his problem, at least temporarily. But it also got me looking at the Lightroom catalog file.

He has quite a few images more than I do, but at the time he didn’t have 4 times as many and his catalog was 4 times larger than mine. At the same time, I have titles and captions on a lot of my images, and an extensive keyword library. While I wouldn’t expect the keywords to take up too much space, the titles and captions are all unique and all are several hundred bytes each. But that doesn’t explain why his catalog was disproportionally bigger than mine with less metadata involved. This suggested to me that there might be something up with what Adobe is storing in the catalog.

This is where that core open source technology SQLite comes in. Since the catalog isn’t a proprietary format, it’s possible to examine it (or potentially repair it) with readily available tools.

Step one: dump the table sizes

My first plan of attack was to look at how big the tabes were to see if there were any obvious problem spots.

SQLite provides a utility sqlite3_analizer, that will generate a whole slew of statistics for an SQLite file. These include table sizes, utilization, storage efficiency, and so forth.

*** Page counts for all tables with their indices *****************************

ADOBE_LIBRARYIMAGEDEVELOPHISTORYSTEP.............. 126636      50.5% 
ADOBE_ADDITIONALMETADATA.......................... 67200       26.8% 
ADOBE_IMAGEDEVELOPSETTINGS........................ 25920       10.3% 
ADOBE_IMAGEDEVELOPBEFORESETTINGS.................. 7420         3.0% 
AGMETADATASEARCHINDEX............................. 4136         1.6% 
ADOBE_IMAGES...................................... 3767         1.5% 
AGLIBRARYFILE..................................... 3356         1.3%

If the table containing the develop settings was consuming 10.3%, why would the develop history table be 50.5% of the my catalog file?

Sure, there should be more history states than current develop settings, but 5 times as much data stored? But in terms of bytes, that’s more than 400 MB of history data.

In any event, shouldn’t Adobe be storing develop settings as efficiently as possible?

So what does the history state table look like?

Running .schema Adobe_libraryImageDevelopHistoryStep in the sqlite3 client returns the following.

CREATE TABLE Adobe_libraryImageDevelopHistoryStep (
 id_local INTEGER PRIMARY KEY,
 id_global UNIQUE NOT NULL,
 dateCreated,
 digest,
 hasDevelopAdjustments,
 image INTEGER,
 name,
 relValueString,
 text,
 valueString
);
CREATE INDEX index_Adobe_libraryImageDevelopHistoryStep_imageDateCreated ON 
   Adobe_libraryImageDevelopHistoryStep( image, dateCreated );

That’s certainly not what I expected. What I expected was to see a whole slew of columns, one for each develop setting that needs to be stored. Maybe this was a naive view on my part.

Okay, lets pull a row from the table and see what’s actually being stored.

select * from Adobe_libraryImageDevelopHistoryStep limit 1;
id_local|id_global|dateCreated|digest|hasDevelopAdjustments|image|name|
  relValueString|text|valueString
928|21A0EDF0-3FF9-4503-B1BB-986330914768|465813406.266058|
b05afdbad359c8337b9bb6e663ca8aec|-1.0|916|
Import (10/6/15 04:36:46)||s = { AutoGrayscaleMix = true,
AutoLateralCA = 0,
AutoWhiteVersion = 134348800,
Blacks2012 = 0,
Brightness = 50,
CameraProfile = "Adobe Standard",
CameraProfileDigest = "BA45C872F6A5D11497D00CBA08D5783F",
Clarity2012 = 0,
ColorNoiseReduction = 25,
Contrast = 25,
Contrast2012 = 0,
ConvertToGrayscale = false,
DefringeGreenAmount = 0,
DefringeGreenHueHi = 60,
DefringeGreenHueLo = 40,
DefringePurpleAmount = 0,
DefringePurpleHueHi = 70,
DefringePurpleHueLo = 30,
Exposure = 0,
Exposure2012 = 0,
GrainSize = 25,
Highlights2012 = 0,
LensManualDistortionAmount = 0,
LensProfileEnable = 1,
LensProfileSetup = "LensDefaults",
LuminanceSmoothing = 10,
PerspectiveHorizontal = 0,
PerspectiveRotate = 0,
PerspectiveScale = 100,
PerspectiveVertical = 0,
ProcessVersion = "6.7",
RedEyeInfo = {  },
RetouchInfo = {  },
Shadows = 5,
Shadows2012 = 0,
SharpenDetail = 30,
SharpenEdgeMasking = 0,
SharpenRadius = 1,
Sharpness = 50,
ToneCurve = { 0,
0,
32,
22,
64,
56,
128,
128,
192,
196,
255,
255 },
ToneCurveBlue = { 0,
0,
255,
255 },
ToneCurveGreen = { 0,
0,
255,
255 },
ToneCurveName = "Medium Contrast",
ToneCurveName2012 = "Linear",
ToneCurvePV2012 = { 0,
0,
255,
255 },
ToneCurvePV2012Blue = { 0,
0,
255,
255 },
ToneCurvePV2012Green = { 0,
0,
255,
255 },
ToneCurvePV2012Red = { 0,
0,
255,
255 },
ToneCurveRed = { 0,
0,
255,
255 },
Version = "9.2",
WhiteBalance = "As Shot",
Whites2012 = 0 }

First reaction; WTF?

A Quick Primer on Datatypes and Storage

In a computer, all data is stored in a binary format — this is the whole ones and zeros thing. However, the meaning and arrangement of those 1’s and 0’s, and therefore what they ultimately represent varies with the data’s type.

Broadly speaking there are 3 types of data that computers deal with. Integer types store whole numbers (i.e. -5, 0, 1, 1000), and do so efficiently and in a format that virtually all CPUs can process natively. Floating point numbers store a representation of decimals or fractional data (i.e. 35.4 or -10,005.35), like integers most floating point numbers are generally stored in a standard format that can also be processed natively by CPUs. Finally strings store the representation of text as a series of integers that correspond to characters in a table.

Each of those types offer various pros and cons.

For example, an 8-bit/1-byte integer can store 2⁸ or 256 values. If the type is signed, meaning it can represent negative and positive numbers, those numbers range from –128 to 127. If the type is unsigned, it can store values form 0 to 255. Integers are always exactly what they represent, and the math is done in an exactly predictable way.

Floating point numbers are much more complex, and I’m not going to get into the details of them, but if you’re interested Wikipedia has a reasonable article about floating point numbers. Floats can represent huge ranges of numbers, but they are approximations due to the binary conversion, and these errors lead to some small levels of imprecision in calculations.

The final type I called text. Text is stored as a sequence of integers, but unlike integer values, the integers storing text represent a character not a number directly. Unlike integers or floats, a number stored in text is not something the computer can directly access and manipulate in hardware, it has to have some software to tell it how to translate the human mapped meanings into something it can process. Moreover, the amount of storage required to store a number in text is going to depend on the number of digits in the number. And this is the critical point with respect to what’s going on here in Lightroom.

For example, consider how one might store a value that ranges form 0 to 100 (like many of Lightroom’s sliders cover). To store this in an integer, only 100 values are needed, which is easily covered by the 256 possible values available from a single byte integer (1 byte = 8 bits = 2^8 = 255 options). On the other hand, if this is stored as text, it could use 1, 2, or 3 characters, and since each character requires at least 1-byte of storage, it could take as much as 3 bytes of memory to store the value.

In binary, an unsigned integer representing a value of 50 would read as 00110010. However, the textual representation ‘50’ would be two characters 5 and 0, which translate to ASCII values of 53 and 48, which in binary would be 00110101 00110000.

Now consider adding 5 to that 50. If the 50 in in binary, the computer can just add 5 (00000101) to the existing 00110010 it knows and they’re added with basically the same rules that you learned in elementary school only you carry when you sum up to 2 instead of 10. Anyway, the processor gets 00110111, which means 55 in binary.

On the other hand if you were trying to do this with the string representation, first some code would have to be called that understood how to convert the two characters into a computer usable number (and the 5 if that’s a string too). Then it would have to do the same math as done for the native computer usable type. Then if you wanted the data back as a string, convert the 55 back to two characters.

Which brings up a second aspect. Going from text to an integer — which is what the computer fundamentally requires to process data — requires more processing than if the computer already has the numbers in a format it can deal with.

Serialized Data in a Database

Before I started looking at the Library in SQLite, I had a mental model that each of the develop settings would be stored in their optimal forms as an integer, floats, or strings leveraging the database’s ability to relate values to fields to ascribe their meaning.

SQLite, like most SQL databases, has no significant problems in doing this. Each column is assigned a logical meaning, via a name, and the data stored in that column is understood to be that information implicitly by the programmers using it. Keep in mind, all of this is logical and for the benefit of the programmer, the name need not be meaningful, but it sure makes life a lot easier if it is.

Admittedly there are limits, but by default SQLite supports tables with up to 2000 columns, and can be configured to support up to 32,767 columns though it’s not recommended. Lightroom currently has about 60–65 develop settings that need to be kept track of, which is well below the 2000 column limit let alone the maximum limits.

Instead, Adobe is doing something different.

What Adobe is doing is serializing a Lua object and storing the resulting string in the database. Serialization does have some advantages, the primary one is that it allows for more simple interoperability.

Using serialization isn’t entirely unreasonable. One of the primary functions of serialization is to convert an object in the computers memory into a format that can be stored on disk without having to worry about specific formatting and implementation details. This is admittedly what Adobe is doing, saving the state of some object in Lightroom’s memory into the database.

However, serialization has limitations. One of those is that the serialized string form of an object generally will take up more space than the object did in memory. Some of that is the inherent inefficiency of converting native binary types into string representations of their value.

A second source of inefficiency is the addition of the characters that are needed to delineate and organize the representation of data. These are the curly braces, equals signs, and commas.

The Lua serializer that Adobe is using goes step further and added human readable flourishes to the data in the form of spaces and new lines. Fields are already separated by commas, and the field name and value are delineated by equals sings. There’s precious little reason to further pretty up the formatting for human readability. In fact, removing the spaces and new line characters reduce the size of the string by about 20%.

Probably the biggest source of inefficiency though comes from having to define labels so the software reading the serialized data knows what it’s reading. Those are all the human readable labels in the earlier sample.

In a database, those labels are defined essentially once, and every time you store another row of data, efficiency of those labels in terms of storage used improves. On the other hand, with serialized data, they have to be repeated every time you generate the serialized representation. Instead of a situation where the more you store, the less waste you have; the situation is that the waste increases at the same rate as the storage does.

Database Efficency

I wanted to test the theory that the database would be considerably more efficient if the individual settings were broken out into their own columns.

SQLite is a little different from most SQL database engines in how it stores data. Each column doesn’t need to have a size and type parameter specified for it — though you can as a hint. SQLite uses dynamic typing to store data in what it determines is the most efficient manner.

For example, integer types are stored using 1, 2, 3, 4, 6, or 8 bytes, depending on the value that’s sent to the SQLite engine. Store 50 and SQLite will use 1 byte. Store 200,000 and it’ll use 3 bytes.

I threw together a test program in Python to create two databases. One was loosely modeled after the table in Lightroom, though simplified to only have an index and the string column or the serialized text. The second database utilized separate columns for each of the 61 aspects that were stored in the test develop settings. I then inserted the data 10,000 times in each of the two tables.

Not unsurprisingly, the serialized string test database was much bigger than the columnar database; 20 MB versus 5 MB.

In my opinion a 75% smaller catalog is not something to sneeze at. For my more than 40,000 image catalog the reduction in the history steps table alone would shave 365 MB of my catalog file while retaining all the history. For my friends 4 GB catalog, of which 60% is history steps, reducing the storage requirements by 75% would shave 1.8 GB of disk space from his catalog.

Alternatively, I could delete all the image history in my catalog and free up about 450 MB. But in doing so I lose all that history information.

And keep in mind, the problem here isn’t that the catalog is big and disk space is in demand, it’s that the catalog has to be loaded into memory to be processed, and the larger the catalog file is, and especially the more fragmented it is, the longer this takes for that to happen and then you potentially get the problem I talked about at the start of this post.

As a secondary note, I was also curious about the performance of the serialized versus the serizlied tables. As such I timed, 10,000 insertions to see what kind of performance I was getting[1].

At least with Python’s implementation for SQLite, there was functionally no difference between a single big insert as a serialized string, and an insert with 61 separate parameters. The minor variations I did see, on the order of 1 microsecond, are within the realm of experimental error.

I should also point out the history step table isn’t the only place where Adobe serializes develop setting data into the catalog. The Adobe_ImageDevelopSettings table uses the same structure, and that accounts for a further 10% of my catalog size as well.

Storing Positional Data

There’s a second aspect to the history and develop data that I haven’t touched on but it’s also a major contributing factor to storage consumption by history steps and overall database efficiency. There are a number of places in Lightroom where the engine has to store x,y type positional information.

One example, and it’s seen it the same data earlier, is the tone curve. X and Y values are stored as sequential integers from 0 to 255. A straight linear mapping is stored at {0,0,255,255}, and a mapping with curved would be stored with more points such as {0,0,64,60,128,140,255,255}

Beyond the curve, there are several tools that store image positional information. Tools like the spot healing tool, the gradient tools, and a local adjustment brush tool. These all use regular intervals of x and y coordinates along with brush size and a definition set for the affected areas.

PaintBasedCorrections = { { CorrectionActive = true,
    CorrectionAmount = 1,
    CorrectionID = "CFD53BB4-F91E-4616-BFDE-ECE323554311",
    CorrectionMasks = { 
        { CenterWeight = 0,
            Dabs = { "d 0.846878 0.382692",
                "d 0.821662 0.399986",
                "d 0.795618 0.408876",
                "d 0.769187 0.411538",
                "d 0.743287 0.422603",
                "d 0.717651 0.436220",
                "d 0.692825 0.456801",
                "d 0.666778 0.461538",
                "d 0.640369 0.465035",
                "d 0.614742 0.477943",
                "d 0.593083 0.511154",
                "d 0.568074 0.528500",
                "d 0.548332 0.559867" },
            Flow = 1,
            MaskID = "39B9AC8F-BB01-4241-B67B-26AB767B356B",
            MaskValue = 1,
            Radius = 0.088226,
            What = "Mask/Paint" 
    },
    { CenterWeight = 0,
        Dabs = { "d 0.765612 0.659615",
            "d 0.791546 0.647717",
            "d 0.817480 0.635819",
            "d 0.843414 0.623920" },
    Flow = 1,
…

Note: I’ve indented the above excerpt to better show the structure, the spaces/tabs are not stored in the catalog.

As an aside, Lightroom’s painting is image size agnostic, the coordinates are saved as decimal percentages (between 0 and 1) with 6 decimal places of precision. This is especially handy as it enables Lightroom to use the same position data without multiple transformations on all export resolutions.

Using 6 places of precision also shouldn’t be a problem any time soon, as it will allow 1 pixel accuracy in images where the long edge is up to 100,000 pixels wide. For a 3:2 aspect ratio images, that would be a 6.66 giga-pixel image. For panos with high aspect ratios, a 100,000 pixel width will likely be substantially less resolution such that the limited might be approachable now or near future. However, 1 pixel accuracy is probably not necessary for most masks anyway given the coarseness of the brushes or the size of the images where it starts breaking down.

But back to the point about storage. These coordinate driven tools create significantly more data than the regular Lightroom adjustments do. Moreover, because Lightroom’s history steps are a complete snapshot of all settings at the time the history step was created, these large coordinate sets will propagate forward with subsequent history steps.

In other words, if you do any extensive work with the spot adjustment brush, spot healing tool, or the gradient tools (especially with the new masking capabilities), you can rapidity generate large history states that get stored in the database. In my database, I have a number of images with more than 10 MB of history data as a result of this behavior.

My recommendation is that whenever possible spot healing and adjustments and any masking of gradients should be kept to the end of the developing process to minimize the impact they have on ballooning catalog sizes due to their propagation in the history table.

Conclusions

I’m hesitant to just blast Adobe for doing what they did too hard. I’m not a Lua expert, and I don’t have access to the Lightroom source code (I’m even speculating that it’s mostly written in Lua). There may be very good reasons why Adobe has elected to store develop setting as serialized Lua

Thinking on that, about the only viable angle that I see is that by storing big text strings Adobe can add develop capabilities without having to update the catalog’s structure with a new column to represent the setting. When Adobe added dehaze capabilities, they only had to store that value in the string, and didn’t have to update the catalog’s tables to support the new feature.

At the same time, major Lightroom revisions have seen changes in the catalog of some structure in the catalog that has required the catalog file to up updated. Given the low computational bar for adding additional column to a table in SQLite — it wouldn’t even require doing something expensive like copying the table data — I’m not sure that that should be a significant consideration.

Certainly one could argue that with big modern disks, and certainly modern SSDs, the size of the Lightroom catalog isn’t that big of a deal. And that’s probably true, saving a couple 100 MB, or 1.8 GB in my friends case, aren’t that big of a deal when you have a 4 TB disk to store everything on.

On the other hand, as a programmer, the whole situation just looks messy to me on a number of levels. There’s a lot of extra data added to the serialized output that shouldn’t really be needed for it to be deserialize. Just removing spaces and new line characters reduced the size of the serialized test block by around 20%.

Further storage savings could be had by normalizing the coordinate based tools settings, even if they remained stored as serialized objects. This would allow much smaller index based references to be propagated from state to state instead of potentially multiple KB of repeated text entries.

Furthermore at least based on my cursory observations of the Lightroom’s catalog design, there’s a number of places where develop settings are stored redundantly. The Adobe_imageDevelopSettings table stores the current develop settings (in serialized form) and a reference to an entry in the Adobe_libraryImageDevelopHistoryStep table, which also stores the same serialized data.

There is also a Adobe_imageDevelopBeforeSettings table appears to store yet more serialized develop settings, presumably the import defaults that get applied to the image. However, the table is yet again filled with lots of large serialized objects as strings. And honestly, I’m not even sure why this table even exists.

Given that all three of these tables are referencing the same structure of data, and in the case of the develop settings and history table mirror identical datasets. There could be a whole lot of space savings by properly normalizing develop settings into their own table for all states both history and otherwise.

Bearing in mind, again, that between the history table, the develop settings table, and the before settings table, that accounts for nearly 64% of the data in the Lightroom catalog. Even just normalizing the develop settings and history steps should free up about 10% of the space used by my catalog storing redundant data.

Admittedly, the catalog is nowhere near the limits imposed by being an SQLite database. And there should be no added risk of data loss of corruption from having a large catalog file. The only negatives that should occur stem from slower random disk access and I/O loads especially on spinning disks, and even more so when the catalog is stored at the “ends” of spinning disks. Keep your catalog file on a fast SSD, and the size and fragmentation of the catalog shouldn’t become an issue until your catalog file is absolutely huge.

Tested on a Intel Xeon E3–1220v2, CPU governor set to power save, 8GB of RAM, 2 TB 5400 RPM mired ZFS pool with 8GB of SSD L2-ARC. ↩