The lowdown on Office Web Components
I recently mentioned the possibility of exploring MS Office Web Components as a replacement for server-side Excel automation. It turns out that for my particular problem, it didn't present a solution.The problem workflow:
Client sends an Excel file as an email attachment. The spreadsheet contains tabular data. Depending on the particular data domain, it could be a phone directory, a report describing the status of their various projects, or whatever. The important thing to note is that they store this stuff in Excel, and that's what they give us as source material. Our Perl scripts then need to create HTML/PHP based on the input data, or possibly build SQL update commands to populate their web database. Web pages and SQL command files are then copied onto our local server for testing, and once everything is proofed, it gets sent by FTP to their live server.
The key problem:
Extracting data from Excel on a web server is "difficult."
Specifically, the programmer who first wrote this suite of automated scripts used Excel's Automation COM objects to do the data extraction. These COM objects actually control Excel, open/run it on the server, hide the application's GUI, open workbooks, create data ranges (eg. A1:E256) and then get that range as an array of values. Once it's done, it closes Excel. Or does it?
When running automation through a web server, IIS manages its own garbage collection. So when you close Excel and delete the reference to the application object, that should be the end of it - the process should terminate. But IIS doesn't actually delete the object until it collects garbage, so you end up with a zombie Excel process which must either be killed by fancy sysinternals programming, or just reboot the server at that point. If I'm around, I kill all the Excel zombies; if I'm not around, and nobody else knows how to kill, then someone has to reboot the server.
So how would you do Excel-like things on a server? MS took care of half of the problem with Web Components, which give a lot of the same functionality as the office programs. So for instance, you can web script the Spreadsheet component and use it to populate a sheet with data and dynamic formulae, and then export it to an Excel file or XML spreadsheet. But you can't do the opposite, ie. open an existing sheet to get the data from it, which is what I need.
So I moved on and discovered a new third option, but it required making a small change to the workflow. Rather than directly uploading an Excel file to the server for processing, the account executive who receives the Excel file from the client would first have to save as a delimited text file. This way, Perl on the server could just open it up and read it line-by-line. It would cut out a lot of the problems we've had in the past with having to wrap everything in UsedRange sub calls, which prevented the script from looping beyond the part of the sheet that had been filled in and off into infinity whenever the client forgot to place their END token at the end of the data. But it introduced more tedious work and could easily be scripted.
Introducing: Windows Script Host. You may have heard of it and wondered what it was. It's basically MS's answer to AppleScript. It allows you to use VBscript or JScript, along with whatever ActiveX/COM objects that you have access to locally, to build quick and easy scripts. If you need an interactive GUI, you can build the scripts into an HTML Application, which gets a .hta extension, is double-click runnable from Explorer, and runs inside whatever Internet Explorer monstrosity you happen to have. Then you can do the usual HTML form scripting operations, button onClick methods, etc. The script itself is pretty easy: create an Excel.Application instance, use its Workbooks.Open(filepath) method to return a handle to the Excel .xls file, then call its SaveAs(destination, filetype) method and pass it the proper type number for tabbed text. Now the AE can upload the text file rather than the Excel file.
Possible improvements: If processes that could be done just as easily in VBScript or JScript are currently done in Perl on the server, then the whole process could be done on local hosts. But there might be regular expression parsing or things like that, that remain better done in Perl, in which case it might be better to move the local script into a button onClick event in the web form page, with special security permissions on the local hosts allowing the script to start up Excel locally, do the conversion to text, save it with the original Excel file, and switch the form's file upload input to reflect the change so that the text file gets uploaded. That'd be pretty sweet.
Either way, my team has plenty of refactoring to do in the new year. If you find yourself in a similar situation, where you have a system that "works" but could certainly be done better, and you can't risk breaking the whole thing and starting from scratch, then check out Martin Fowler's excellent book Refactoring: Improving the Design of Existing Code
UPDATE: I forgot to mention that the Office Web Components are going by the wayside, which was another reason to not get into using them, even if they had been able to do the job.
Labels: excel, refactoring, script