I occasionally have occasion to build small projects (web and command line) to do various things. A lot of times, these projects could be done in things like Rails or another framework, but honestly, I don’t need that much overhead so I fall back to using PHP or PERL. Between the two, I prefer working with PHP and setting it up to work as a traditional shell script. It’s fairly easy, and here’s how.
First, you need to make sure that PHP CLI is installed. You can do this by checking the version. reeset@enyo home]$ php -v PHP 5.3.3 (cli) (built: Jul 3 2012 16:53:21) Copyright (c) 1997-2010 The PHP Group Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies
Essentially, you are looking for the text, cli in the version string. If you have it, you are good to go. If not, you’ll have to hunt down and install it using apt-get for example.
Once you are set, you just need to hunt down the full path to the php binary. I find that it often can be found at /usr/bin/php. You’ll want to confirm that path because you’ll need it later.
Because I want to turn this into more of an executable script, I don’t want to have to run the script using the php command line. So, I’m going to add a shebang to the first line of the script file. This was new for me – I hadn’t realized that this was possible in PHP until I was looking through the manual, and for CLI scripts, I definitely prefer this method. Example script: <? #!/usr/bin/php <php echo “PHP CLI script \n”; ?>
The shebang allows the script to be run without refererncing the PHP executable in the command line. Now, all you need to do is make the file executable. chmod +x yourphpfile.php
With the file set as executable, you can now run the script like any other shell script, i.e.; ./yourphpfile.php
This afternoon, I put the finishing touches on the update/creation components for the ILS direct integration framework in MarcEdit. As a proof of concept, I’ve continued to demonstrate the processing utilizing Koha’s API. The Koha API has been abstracted as a separate class so I can push up to github for those interested in a C# implementation.
I’ve posted the following video to demonstrate the update/creation process utilizing MarcEdit configured for Koha integration.
I’ve been spending the last week working with the Koha API, using it as an example for MarcEdit’s direct ILS integration platform. After spending some time working with it and pushing some data through it, I have a couple of brief thoughts.
I was pleasantly surprised as how easy the API was to work with. Generally, the need for good authentication often stymies many a good API designs because the process for doing and maintaining authentication becomes so painful. I found the cookejar approach that Koha implemented to be a very simple one to support and work with. What’s more, error responses when working with the API tended to show up as HTTP Status codes, so it was easy to work with them using existing html tools.
While the API is easy to use, it’s also really, really sparse. There isn’t a facility for deleting records and I’m not sure if there is an easy way with the API to affect holdings for a set of records. I do know you can create items, but I’m not sure if that is a one off that occurs when you pass an entire bib record for update, or if there is a separate API that works just for Item data. Search is also disappointing. There is a specific API for retrieving individual records data – but the Search API is essentially Z39.50 (or SRU). I’m not particularly enamored with either, though Z39.50 works (and I’m told that it’s fairly universal in terms of implementation). I’ve never really liked SRU so it didn’t hurt my feelings too much to not work with it. However, after spending time working with the Summon search API for other projects here at Oregon State, I was disappointed that search wasn’t something that the API specifically addressed.
The API documentation leaves much to be desired. I was primarily utilizing the Wiki (http://wiki.koha-community.org/wiki/Koha_/svc/_HTTP_API) which includes a single page on the API. The page provided some simple demonstrations to show usage, which are really helpful. What is less helpful is the lack of information regarding what happens when an error occurs. The Authorization API returns an XML file with a status message – however, all other API return HTTP status codes. This caught me a little by surprise, given the Authorization response – it would be nice if that information was documented somewhere.
One thing that I can’t find in the documentation, so I really can’t answer this question is the impact of the API on system resources. The API seems really to be geared towards working with individual records. Well, MarcEdit is a batch records tool. So, in my testing, I tried to see what would happen if I uploaded 1010 records through the API. The process finished, sluggishly, but it appeared that uploading records through the API at high rates was having an impact on system performance. The Upload process itself slowed considerably as the records were fed through the API. But more curious – after the process finished, I had to wait about 15 minutes or so for all the records to make it through the workflow. I’m assuming the API must queue items coming into the system, but this made it very difficult to test successful upload because the API was reporting success, but the data changes were not visible for a considerable amount of time. Since I’ve never worked in a library that ran Koha in a production environment, I’m not sure if this type of record queuing is normal, but a better description of what is happening in the documentation would have been nice. When I first started working with the API, I actually thought that the data updates were failing because I was expecting the changes to be in real-time in the system…my experience however seemed to indicate that they are not.
Anyway – those are my quick thoughts. I need to caveat these notes by saying I have never worked at a library where Koha has been used in production, so maybe some of these behaviors are common knowledge.
One of the new features coming to MarcEdit is direct ILS integration. Working with Koha as a proof of concept, the attached video demonstrates how users can utilize the MarcEditor to retrieve data directly from within their ILS system (using both single and batch searches). This feature will be available to users around Dec. 1st-ish, 2012.
One of the things I have been working on is a framework for supporting the ability to configure MarcEdit to support the direct integration with one’s ILS system. Many ILS systems like Sierra, Alma, Koha, and others support (or will support) APIs that allow for the modification and creation of bibliographic. I’ve developed a framework in MarcEdit that should allow me to provide integration options for most systems (so long as documentation is available), and as a proof of concept, I’ve provided integration with Koha. Koha has an easy to use set of API that make updating and creating new bibliographic records on the system fairly easy. The attached video demonstrates how one will be able to configure MarcEdit to utilize this new functionality once it becomes available (around Dec. 1st-ish, 2012).
I’ve been tinkering with the Koha API to allow MarcEdit to do some direct ILS integration with Koha-based systems. When I first agreed to look at doing this work, I wasn’t sure what the API looked like. Fortunately for me, the folks that put it together made it simple and easy to work with. There are a few gocha’s when working with C#, and while I’ll be posting the source code for the Koha library that I’ve developed in C# on my github account, I thought I’d post some of my initial notes for those that are interested.
Essentially, to work with the Koha API, there are two things that you need to have. First, you need to authenticate yourself. Upon authentication, Koha provides session data that is maintained as a cookie that must be passed as part of future requests to the API. Generally, this process is straightforward, in that you create a cookiejar. In C#, this looks like the following:
In the case of Koha, the GetRecord function returns data in MARCXML format. This is my preferred method for retrieving data, but for general search and retrieval, while Koha supports SRU, the most reliable search API is still Z39.50 (sadly).
Finally, when you want to update or create a new record, you need to push the data up to the server in MARCXML format. If the data is an update, you pass a record id, if it’s a new record, you don’t.
And that’s pretty much it. Simple and straightforward. Koha supports a few more API (http://wiki.koha-community.org/wiki/Koha_/svc/_HTTP_API, but for my immediate purposes, these are the couple that I need to support some very simple integration with the ILS. Ideally, at some point, it would be nice to see these API also support automated deletion of records, as well as maybe an ability to set holdings/items — but for now, this is good enough. I’m sure if these other functions are needed, the communities themselves will push for them within their own user communities and when they show up, MarcEdit will indirectly benefit.
With all the new ILS systems being produced and the emphasis being placed on support for API support, one of the things that I’d like to eventually see is the ability to integrate directly with specific ILS systems to allow users to pull content directly from their systems (in batch or individually), edit the records in MarcEdit, and then upload the data back to their ILS systems. So, when I was asked to consider some direct integration work to support Koha, I was certainly game.
At this point, the program is primarily supporting search and update/creation of records. Essentially, users will select their ILS system from the list of supported ILS’s (at this point, just Koha) and MarcEdit will add a new option to the MarcEditor window. I’ve been working hard over Thanksgiving so that a first version of this function can be made available in the next update.
I’ll post a YouTube video to demonstrate the functionality later this weekend, but in nut shell, here’s how it works.
Users will enable the functionality through the preferences. A new Preference Tab has been added to the MarcEdit options.
In the preference list, I’ve added a list for Supported Systems. At this point, the only option is Koha, but I hope to eventually add support for other ILS systems as either vendors provide documentation or individual users request/sponsor development. The options needed to support a specific system are primarily a host name (baseURL that will be used to interact with the API), a username and password. My assumption is that there may be some additional data needed (for example, Koha will use Z39.50 for searching, so some additional data may be needed…and when it is, MarcEdit will ask specifically then and save the data upon entry).
When the ILS option is selected, a new menu item appears in the MarcEditor. This is the menu item that needs to be used if the user wishes to push data directly back to their ILS system.
While Koha will be the only option available up front, as I say, I hope that I’ll eventually be able to provide additional support for other ILS systems.
One of the often requested enhancements to the MarcEdit Delimited Text Translator is the ability to auto generate the arguments list. For many users, their spreadsheets or delimited text documents include a line at the beginning of the document defining the data found in the file. I’ve often had folks wonder if I could do anything with that data to help auto generate the arguments list used by MarcEdit to translate the data.
Well, in anticipation of Thanksgiving, I finished working on what will be the next MarcEdit update. I won’t post it till the weekend, but this new version will include and Arguments Auto Generation button that will allow MarcEdit to capture the first line of a data file and if properly formatted, auto configure the Arguments List.
The format supported by the Auto Generation feature is pretty straightforward. It essentially is the following: Field$Subfield[ind1ind2punct].
Let me break down the format definition:
Field – represents the field to be mapped to, i.e.: 245. This is a required value.
$Subfield – represents the subfield to be mapped., i.e.: $a. This is a required value.
ind1 – represents the first indicator. This is an optional value, but if defined, indicator 2 must be defined.
ind2 – represents the second indicator. This is an optional value, but if indicator 1 is defined, indicator 2 must also be defined.
punct – represents the trailing punctuation of the field. This is an optional value. However, if you wish to define the punctuation, you must define the indicator 1 and indicator 2 values as well.
Some examples of the syntax:
245$a — no indicators are defined, the default indicators, 2 blanks, will be used.
245$a10 – defines the field, subfield and indicators 1 and 2.
245$a10. – defines the field, subfield, indicators 1 and 2, and defines a period as the trailing punctuation.
In MarcEdit, you can join fields together. This allows users to join data in multiple columns into a single subfield. In MarcEdit, joined fields are represented by an asterisk “*”. If I wanted to join two or more fields, I can add an asterisk group to the field. For example:
MarcEdit will interpret field 0 and field 1 as being joined fields because the asterisk marks them as joined.
I’ve placed a video on YouTube to demonstrate the upcoming functionality. You can find out more about it here:
If you have questions about this new function or suggestions, let me know.
One of my hats that I wear at home is IT professional for my wife, specifically when it comes to her blog. To keep things running well, I periodically monitor bandwidth usage and space usage to make sure that we keep our hosts happy. Well, over the past two months, I’d noticed a really, really, really large spike in bandwidth traffic. Over a general month, the blog handles approximately 60 GB of http traffic. A generous portion of that comes from robots that I allow to harvest the site (~12 GB), the remainder from traffic from visitors. This changed however, last month when bandwidth usage jumped from ~60 GB a month to a little over 120 GB. Now, our hosts are great. We pay for 80 GB of bandwidth a month, but this is a soft cap so they didn’t complain when we went way over our allotment. At the same time, I like to be good neighbors on our shared host – so I wanted to figure out what was causing the spike in traffic.
Looking through the log files and chatting with the hosts (who have better log files), we were able to determine that the jump in traffic was due to one image. This one (example of linking to the file — should be broken unless you are reading it through the google reader, which I allow as an exception):
(this one has been downloaded and placed on my blog)
It’s an image from the Thomas Jefferson memorial. My wife had taken the picture the last time we were in DC and had posted it here: http://athomewithbooks.net/2012/10/saturday-snapshot-october-27/. In Oct. and 1/2 of Nov., this single image had been responsible for close to 100 GB of bandwidth traffic. What I couldn’t figure out was where it was all coming from…but looking at the logs, we were able to determine that it was being linked to from StumbleUpon. While the linking to the image wasn’t a problem, the bandwidth usage was. So, I started to look at options, and there actually is a quite elegant one if you find yourself in a position where you need to limit bandwidth.
The simple solution is to simply not allow linking to any images (or specific file types) from outside the blog domain. This is actually pretty easy to accomplish using mod_rewrite and an .htaccess file. Using the following snippet (replacing my domain – athomewithbooks.net with your own):
I’ve directed the webserver to only serve these file types (gif|jpg|jpeg|js|css|png) when linked from my specific domain. So, if I were to try and link to an image from a different domain, the image would come back broken (though – you can also serve a “you can’t link to this image” image if you wanted to as well. The way that this works, is it reads the headers passed by the browser, specifically the HTTP_REFERRER value to determine if the request is originating from an allowed domain. If it’s not, it doesn’t server the image.
Now, this method isn’t perfect. Some browsers don’t pass this value, some pass it poorly – so it’s likely some people that shouldn’t see the image will see it (due to the first line which serves content if the domain isn’t defined) but in generally, it provides a method for managing your bandwidth usage.