Apr 162013
 

Just a couple of notes.  I’ve been spending quite a bit of time as of late testing the current MONO builds against the current MarcEdit codebase.  It looks like the latest stable version works just fine with the program, while the 3.0.x branch has a few rendering issues (not sure if they are mine or not — I’m looking into it).  Anyway, what this means is that I’m looking at working on trying to simplify the installation process for folks using the software on Linux/Mac systems.  I’m starting with Linux (since I run Linux).  Right now, I’m working on using a tool called: makeself (http://megastep.org/makeself/) which will create a self extracting archive and then allow me to execute a cleanup script to take away a number of the necessary steps during the install process.  Ideally, the steps would be:

  1. Run the install script
  2. Run MarcEdit
           OR
  3. Get a report of missing dependencies

This would remove the need to remember to run the bootloader to clean up and set install paths, as well as get the configuration right for Z39.50 use if it yaz is installed on the system.  It should also give me a method to check dependencies and provide a list of packages needing to be installed to make installation as simple as possible (though at this point, the only dependencies are the mono-core and the System.Windows.Forms libraries — which sometimes is and sometimes isn’t included in specific distributions definition of “core” files).

Once I have this build process defined, I’ll find myself a mac and give MONO’s MacPackager a whirl.  It seems like that might be a simple solution to packaging the program, and  at this point, all the Z39.50 components can be disabled to greatly simplify the process.

Ideally, once I have an installer in place, I should be able to tweak MarcEdit’s automated updater so that Mac/Linux users receive notifications and downloads of the current installers.  From there, they would just need to spawn them within their own specific environments.  Easier — I hope so.  And as always, I’ll continue to provide a plan zip file for download, for those folks in environments that don’t allow the installation of software.

Questions, comments — let me know.

–tr

 Posted by at 10:12 am
Apr 132013
 

Sorry folks – apparently a hotfix Microsoft pushed out caused the setup tool I used to corrupt when working with 64-bit systems.  I’ve removed the hotfix and rebuilt both version (32 and 64) of the program and it looks like things are in good shape now.

If you tried updating today and had trouble, give it another go.  If you updated and didn’t have trouble, you should be fine (though you’ll be prompted to update again).

Sorry about that folks.

–tr

 Posted by at 3:25 pm
Apr 132013
 

So, I’ve been having a few more questions asking about how to setup MarcEdit on Linux and a Mac.  In general, the program works very well on both platforms, but takes a bit of know-how to get them running.  This is primarily due to the Z39.50 components that I link to.  On Linux, installation is very straightforward – on a Mac, the easiest process I’ve found is using Mac Ports – but it can take as long as an hour to install all the dependencies.  Given that Z39.50 is becoming less and less important as a protocol – I’ve worked out a process so that the program can essentially turn off these components by default when installed on non-Windows systems.  To that end, I’m reworking the zip download, and then will be looking at how I can make an installer that can be used on non-Windows platforms to make the installation a little bit easier.  On linux, I don’t think this will be all that difficult (I work primarily in a Windows/Linux environment) – on a Mac, it might take me just a little bit longer because I simply don’t use the platform often. 

If you have questions or some interest in this work, let me know.

–tr

 Posted by at 10:01 am
Apr 132013
 

Here’s the final list of updates that I’ve been working on for this weekend:

  1. Enhancement: The MARCEngine will now allow users to embed mnemonics into a UTF8 encoded file and those mnemonics will be expanded into proper UTF8. This means that if you have a UTF8 file, and are working in the MarcEditor, you could enter {aacute} in the record and MarcEdit will expand it when compiling the records into the proper character: á.
  2. Enhancement: Rest the MarcEdit Config Data. Sometimes, folks will find that MarcEdit is having trouble saving configuration data. Generally, this is due to a file corruption. While it doesn’t happen often, when it does – it can be a pain. This option allows you to quickly reset your configuration data.
  3. Enhancement: Delimited Text Translator – when using the Autogenerate option, the program will now automatically ignore the first line of the file it’s processing (the rules line) when converting data to MARC.
  4. Bug Fix: RDA Helper: Updated the 345 field to capture motion picture information related to screen-size and 3D.
  5. Bug Fix: RDA Helper: The GMD autogenerator was creating duplicate entries when processing data that could be both an electronic resource and something else. When this happens, the program defaults to electronic resource.
  6. Bug Fix: Processing Greek UTF8 to Greek 1253 – the stream wasn’t properly outputting the data. This has been corrected. Moving from Greek 1253 to Greek UTF8 wasn’t affected by this issue.
  7. Enhancement: Export Tab Delimited Records – the tool now allows for the Export of fixed field data by position. I.E., add a field less than 10 – the subfield text box changes to position. Format expected is position:length (i.e., 5:1)
  8. Bug Fix: Fixed the typo in the OAIDCtoMARCXML.xsl (which is used in the default oai-dc translation by the OAI harvester).
  9. Bug Fix: Script wizard – fixed a typo in the outputted script.
  10. Enhancement: MARCValidator – added the ability to find incorrect field lengths. Updated some error descriptions to make them more descriptive.
  11. Bug Fix: RDA Helper – in some cases the first indicator in the 260 wasn’t being retained when the data is moved into a 264.
  12. UI Tweaks.
  13. Bug Fix: Corrected Indicator replacement options
  14. Linux/Mac Installs – I’ve made some changes so you can disable the Z39.50 part of the program and simplify installation. I’ll be updating the process/video to document the new method.
  15. Automatic Updater – I’ve made the changes necessary to support the redirections to the new server where marcedit is being hosted.

For a while, I’ll be hosting the MarcEdit files on both my Oregon State web space and the new domain, marcedit.reeset.net. However, sometime around June, I believe the Oregon State web space will go away.

Direct downloads:

If you run into trouble, let me know.

 Posted by at 12:21 am
Feb 122013
 

One of the hard and fast rules that MarcEdit has consistently enforced is that you are not allowed to mix character set streams.  What this means – if your data is in UTF8 – MarcEdit will not process mnemonic data.  There are some good reasons for this – but best being that mnemonics in MarcEdit map back to MARC8 representations of a character which  is completely incompatible with the UTF8 character set. 

I’ve tried a few times to look at different ways to deal with this – but in most cases, I’ve been thwarted by the way C# handles streams.  In C#, all data is typed as UTF8 streams, unless the data is specifically types as otherwise.  In order to support MARC8 formatted data, MarcEdit reads all data as either UTF8 or as binary.  This allows MarcEdit to move easily between MARC8 and UTF8.  The problem occurs when someone wants to use mnemonics in a string that is already UTF8 encoded.  For example:
=246  13$aal-Mujāhid $bRees{aacute}, T{eacute}rry

The above is problematic to process.  Currently, MarcEdit ignores the mnemonics and treats them simply as strings.  Because these mnemonics convert directly to MARC8 bytes – one of these three diacritics sets would be flattened when processed against the stream.  If the stream was defaulted to UTF8, the {aacute} and the {eacute} encoded data will be flattened and the record generated by MarcEdit will have incorrect lengths.  If the Stream is converted to binary, then reconstituted as a UTF8 stream, any UTF8 data present in the stream is flattened, but the mnemonic data is processed correctly.  A bit of a pickle. 

To make this work – I ended up having to atomize the data that is to be processed, meaning that only the data in the mnemonic is processed – and then inserted back into a UTF8 data stream.  So, it would look something like this:

if (RecognizeUTF8(System.Text.Encoding.GetEncoding(1252).GetBytes(str_Source)) == RET_VAL_UTF_8)
{                               
    if (objChar == null)
    {
       objChar = new marc82utf8.MARCDictionary();
       objChar.UTFNormalize = UTFNormalization;
     }
     
     string tmp_diacritic = (string)lc_mnemonics_patch[tmp_string];
     tmp_diacritic = objChar.MARC8UTF8(tmp_diacritic);
     //need to convert bytes
     byte[] bytes = System.Text.Encoding.UTF8.GetBytes(tmp_diacritic);
     tmp_diacritic = System.Text.Encoding.GetEncoding(1252).GetString(bytes);
     str_Source = str_Source.Replace(tmp_string, tmp_diacritic);                                    
}

In this case, the atomized data is in tmp_diacritic and must be processed as MARC8 data to UTF8 utilizing the MarcEdit UTF8 Normalization library. At this point, the stream is switched to UTF8.  This data must now be converted to bytes, then transcoded to the internal base encoding for staging all character data, so it can then be passed back into the library for proper character handling. 

The upshot of this – MarcEdit will soon allow this type of mixed character editing.  The downside is that we still can’t get away from this type of MARC8 legacy crap.

–tr

 Posted by at 2:11 pm
Jan 282013
 

I want to thank everyone that took the time to attend my Mid-Winter Presentation.  I really appreciated the feedback and the ideas.  It’s the days like today, where I get to share some ideas and get to talk to some interesting people, that makes me glad I work in higher education.

Anyway – my presentation, entitled: Dragging old data forward:  finding yourself an RDA Helper can be found here:

 

–TR

 Posted by at 1:04 am

MarcEdit 5.9 Update

 MarcEdit  Comments Off
Jan 262013
 

I have been working hard over the past week to close a couple of outstanding issues with the application.  The biggest of those issues is related to the Find/Find All function.  While making some changes a few weeks ago, it appears that I introduced a bug.  I’ve been working off-line with a few folks that have been helping me debug the issue and it appears that I’ve been able to isolate and correct the issue.  The following changes have been made:

  1. Bug Fix:  Find/Find All – Regular Expressions were resulting in “Text Not Found” and Boundary errors.  This has been corrected.
  2. Bug Fix:  Find/Find All – The Find process has become incredibly slow due to some of the enhancements made to help the program jump directly to the found text when using Find All.  The culprit was an inefficient loop, which has been corrected.
  3. Bug Fix:  Find/Find All – When searching, and then moving the cursor and searching again, the program wouldn’t reset where the search would begin.  This has been corrected.
  4. Enhancement:  Find/Find All – I’ve added a directional component.  You can now search up and down the record while using Find.
  5. Enhancement:  RDA Helper – I’ve added support for the automatic generation of multiple 380 fields when the data to create those elements is available.
  6. Enhancement:  Console Program – I’ve added an –xml switch to allow for the processing of data from MARCXML to other XML schemas.

You can pick up the new build from:

–tr

 Posted by at 1:16 am
Jan 122013
 

I know that there have been a lot of updates lately.  Hopefully, folks haven’t minded.  These updates have largely been due to folks really working out the RDA Helper, which has been nice.  This update is specific to the RDA Helper.

  • Bug Fix: RDA Helper – When attempting to create the 380/1, the program looks for the presence of a 130.  Under rare conditions, the 130 format can cause parsing problems.  This fixes that issue.
  • Bug Fix:  RDA Helper – Incorrectly processes data when the copyright mark is set as {copy}.  This has been fixed.
  • Bug Fix:  RDA Helper – after the last fix to correct an error message generated when processing records where the largest field is no longer than 300, the line separating records could be dropped.  This has been corrected.

You can pick up the download from:

–tr

 Posted by at 1:06 am

MarcEdit 5.9 Update

 MarcEdit  Comments Off
Jan 112013
 

I’ve posted a new update that includes the following changes:

1) Bug Fix:  RDA Helper – Corrects the exception that occurs when trying to insert 33x, 38x fields when the highest field in the record is a lower than 329. 

2) Enhancement:  RDA Helper – I added the ability to embed regular expressions into the Abbreviation expansion, and will include 2 as examples with the next update.  This is necessary to deal with items like “v.”  This could expand to volume or volumes depending on the data that follows.  So, I’ve included a regular expression that will evaluate the data prior to “v.” to try and determine what the expansion should look like.  This should give some added flexibility for anyone wanting to augment the substitution list.

3) Behavior Change: RDA Helper – because I’m modifying the substitutions list due to the item above, I need to provide a mechanism to update the codes.  Essentially, I’ve added a routine that will automatically keep the substitution list synchronized with the master list.  However, if you add your own items to the list – they will get overwritten – so I’ve added a notification and save the previous list as a backup so you can move any custom substitutions to the new list.  I don’t anticipate changing the master list often, so this might be good enough.  But I’ll look at other ways of making this process less intrusive in the future.

4) Enhancement: RDA Helper – when generating the 336, I’ve expanded the data elements consulted to help improve the selection of more granular values in the 336$a – specifically for text items.

5) Enhancement:  RDA Helper – I spent a good deal of time last night optimizing how regular expressions are run in the RDA Helper and have seen a good deal of performance gained.

6) Bug Fix: Delimited Text Translator – When using autogenerate, indicators would occasionally be dropped.  This has been corrected.

7) Enhancement:  RDA Helper – MarcEdit will protect data within quotes from substitution.  This makes the assumption that quoted data should be data that has been transcribed from the object.

8) Bug Fix:  RDA Helper – When generating 264s from 260s, the first indicator wasn’t be retained correctly.  This has been corrected. 

9) Bug Fix: RDA Helper – When generating a 264, fields that ended with a hyphen were occasionally having periods added to the end.  This has been corrected.

10) Bug Fix: Find/Find All – I made a change to the jump list that allowed the program to select the text searched for.  Unfortunately, if you are looking for a regular expression, it can expand beyond the select.  This has been corrected.

You can download the program from:

–TR

 Posted by at 4:06 am

MarcEdit 5.9 Update

 MarcEdit  Comments Off
Jan 092013
 

Lite-update – really just to correct a couple data problems.  One note, to enhance the Abbreviation substitution, I had to make the regular expressions a bit more complex. I’ll be keeping an eye on this part of the function to ensure that this doesn’t cause a performance bottleneck.

  • Bug Fix:  RDA Helper – When doing substitution, sometimes it would match greedily.  This has been corrected.
  • Bug Fix:  RDA Helper – 338 – under some circumstances, the $a was generated as a $u.  This has been corrected.
  • Bug Fix: Delimited Text Translator – When auto generating, defined control data (like the 006, 007) cause a validation issue.  This has been corrected.
  • Enhancement:  Merge Records – Enhanced the function so that it can now handle records with multiple control numbers in a record, and will match those correctly.
  • Enhancement:  MARCEngine COM – new function – MarcEngine_Version to return the assembly build number.

Here’s an example of the Assembly number function

Dim obj_MB

Set obj_MB=CreateObject("MARCEngine5.MARC21")
msgbox obj_MB.MarcEngine_Version

Download from:

–tr

 Posted by at 2:15 am