Dspace Community-list; adding most popular collections entries hack

As I’ve started to get more involved in the Dspace work here at OSU, one thing that has struck me is the interface really could use a once over.  No place could this be more obvious than the community-list page.  While I doubt that this page was every meant to be the default access mechanism into an institution’s Dspace collection, I’d bet that most often it is (outside of accesses that come from outside Dspace like Google).  Like most institutions, Oregon State University is adding lots of new communities and collections to Dspace, and as the number of collections grow, so do does the list.  At some point, this list simply is too unwieldy for folks to actually work with, so our IR group has started looking at ways to make this easier for users to work with. 

So our IR group has taken a first crack at this.  Basically, the idea is to simply collapse the list using a little dhtml and then allow users to toggle the collections that they want to see. 

So with this interface here, we’ve obviously shorten the list, but introduced a whole new problem which I’m not sure will be a step forward or back — that being that you no longer can see collections.  Now, a user has to know what community a collection is a part of in order to find it.  To make this a little easier, I added a toggle all button, but that just gets you back to the big list interface.

I’ve toyed around with another option.  One thing that I thought might be interesting is adding links to the collections that are used the most.  At first, I thought that this was the type of information that dspace probably would be logging in the database — but after looking over the tables, I couldn’t find it.  So I started looking at the source and found a set of statistic classes that generate the dspace-general log files.  These log files are what dspace uses to generate the statistics screen in the administrative interface.  So with that, we are in business. 

If you look at the dspace-log-general log files, you see a couple of things. 

  1. A new one exists for each day (so long as you have logging enabled)
  2. They are very parseable. 

Using properties set in the config file, dspace analyzes log entries using a threshold to determine if the item should be placed into the list of most viewed items.  At OSU, we continue to use the default floor of 20 views to make the list, but you can have just about anything.  Once an item reacts that floor, the following entry will be placed into the dspace-log-general log file (from our dev server):

  • item.123456789/28=31

Here, you can see that you are getting a stem that can be matched on: item., the handle for the item being accessed: 123456789/28 and the number of times that the item has been accessed: 31.  So one of my thoughts for exposing often used collections would be to take this list and determine how often collections are being utilized.

Technically, the process isn’t really that difficult, though I’ll admit that I stumbled a little bit when working with the HashMaps since I’ve gotten lazy working with scripting languages.  Ideally, what you’d want to do is resolve each items collection and keep track of the number of times items have been accessed from the collection.  This way, you could then do a descending sort on the count to get the items with the most number of accesses — i.e., probably your most used collections.  The initial problem that I ran into was I wanted to use Java’s HashMap like a PHP associative array — which allows you to sort on either key or value, while maintaining keys.  There are a number of ways in Java to accomplish the same thing, but most seemed to require the use of Lists or LinkedHashMaps, and I didn’t want to bother with either.  So instead, I just created my own storage class and implemented the Comparable interface.  Much easier and cleaner I think. 

So code — I’ll have to know that I haven’t run this on our production environment yet and because of the limited logging that we do on our dspace dev instance, the code hasn’t yet been run against a large set of items to be resolved, so there may need to be revisions, but, for anyone wanting to look at the code, here you go. 

The changes made are basically broken into three parts:

  1. New class files (there are 2)
  2. Changes to the JSPs (one change to the community-list.jsp file)
  3. Additions to the dspace.cfg (basically, I didn’t want to hard code the log matching elements)

New Classes:

OSUCollection — This is the storage container that returns the collection name, the collection handle and the number of times the items logged in the collection had been accessed.  Items are returned in descending order — with most accessed items returned first.

 


package edu.oregonstate.library.util.objects;

import java.lang.*;
import java.util.*;

public class OSUCollection implements Comparable {
    private String name;
    private String handle;
    private int count = 0;


    public void setName(String s) {
        name = s;
    }

    public String getName() {
        return name;
    }

    public void setHandle(String s) {
        handle = s;
    }

    public String getHandle() {
        return handle;
    }

    public void setCount(int i) {
        count = i;
    }

    public int getCount() {
        return count;
    }

    public int compareTo(Object o) {
        return count - ((OSUCollection)o).count;
    }
}


GetPopularCollections — This is the actual logic part of the class that takes the stats log file, parses it, retrieves the collection specific metadata and returns an array of OSUCollection objects.

 


package edu.oregonstate.library.util;

import java.util.Arrays;
import java.lang.Integer;
import java.sql.SQLException;
import java.io.*;
import java.net.*;
import java.util.Set;
import java.util.Iterator;
import java.util.HashMap;
import java.util.Collections;
import java.util.Date;
import java.text.SimpleDateFormat;
import java.text.Format;


import org.dspace.content.Collection;
import org.dspace.content.DCValue;
import org.dspace.content.Item;
import org.dspace.core.ConfigurationManager;
import org.dspace.core.Context;
import org.dspace.handle.HandleManager;
import edu.oregonstate.library.util.objects.OSUCollection;

public class GetPopularCollections {
   public OSUCollection[] GetCollections()  throws Exception, SQLException {
    Context context = new Context();
    context.setIgnoreAuthorization(true);
    String record = null;
    HashMap tmpMap = new HashMap();
    HashMap itemMap = new HashMap();

    try {
       FileReader fr = null;
       BufferedReader br = null;

       try {
           String file = getLogFile(ConfigurationManager.getProperty("log.dir"),
                                    ConfigurationManager.getProperty("log.stem"),
                                    ConfigurationManager.getProperty("log.extension"));

           //String file = ConfigurationManager.getProperty("log.dir") + "/dspace-log-general-2005-9-20.dat";
           fr = new FileReader(file);
           br = new BufferedReader(fr);
       }
       catch (IOException e) {
          e.printStackTrace();
          System.out.println("Failed to read input file");
          System.exit(0);
       }
       while ((record = br.readLine()) != null) {
         String key = record.substring(0,5);
         String item = "";
         if (key.equals("item.")) {
            record = record.substring(5);
            tmpMap.put(record.substring(0, record.indexOf("=")), record.substring(record.indexOf("=")+1));
         }
       }

       //Setup the OSUCollections Object
       OSUCollection[] c = new OSUCollection[tmpMap.size()];
       Set set = tmpMap.keySet();
       Iterator it = set.iterator();
       int index = 0;
       while (it.hasNext()) {
           /*getCollectionInfo returns:
            * element[0]: collection name
            * element[1]: collection handle
            */
           String element = (String)it.next();
           String[] tmp = getCollectionInfo(context, element);
           if (!itemMap.containsKey(tmp[1])) {
              itemMap.put(tmp[1], new Integer(index));
              c[index] = new OSUCollection();
              c[index].setName(tmp[0]);
              c[index].setHandle(tmp[1]);
              c[index].setCount(Integer.parseInt((String)tmpMap.get(element)));
              index++;
           } else {
              int tint = Integer.parseInt((String)itemMap.get(element));
              int added = Integer.parseInt((String)tmpMap.get(element)) + c[tint].getCount();
              c[tint].setCount(added);
           }
       }

       Arrays.sort(c, Collections.reverseOrder());

       br.close();
       fr.close();
       return c;
    }catch(Exception e) {
        System.out.println(e.toString());
        return null;
    }
  }


  private String[] getCollectionInfo(Context context, String handle) throws Exception, SQLException {
     Item item = null;
     String[] vals = new String[2];

     // ensure that the handle exists
     try
     {
       item = (Item) HandleManager.resolveToObject(context, handle);
     }
     catch (Exception e)
     {
        return null;
     }

     // if no handle that matches is found then also return null
     if (item == null)
     {
        return null;
     }

     // build the referece
     // FIXME: here we have blurred the line between content and presentation
     // and it should probably be un-blurred
     Collection myCollection = null;
     myCollection = item.getOwningCollection();
     vals[0] = myCollection.getMetadata("name");
     vals[1] = myCollection.getHandle();
     return vals;
  }

  private static String getLogFile(String lfile, String lstem, String extension) {
      //We use the simpledateformat to check for the presence of the default
      //file, which dspace creates in the stats-general file.
      //If this file isn't present, then we use the stem and extension to return our file
      Format sdf = new SimpleDateFormat("yyyy-MM-dd");
      Date myDate = new Date();
      String default_file = lfile + File.separator + lstem + "-" + sdf.format(myDate) + "." + extension;
      File tmp = new File(default_file);
      if (tmp.exists()) {
         return default_file;
      }

      File dir = new File(lfile);
      String[] children = dir.list();
      String logfile = "";
      long lastmod = 0;
      if (children == null) {
          return null;
      } else {
          for (int i=0; i lastmod) {
                  lastmod = tmp.lastModified();
                  logfile = lfile + File.separator + children[i];
               }
              }
            }
          }
          return logfile;
      }
  }

}


JSP Changes:

Community-list.jsp

Around line 189 you should add the following:

 



<%
   GetPopularCollections objCol = new GetPopularCollections();
   OSUCollection[] col  = objCol.GetCollections();
   int x =0;
   if (col.length >0) {
%>
   

Most Viewed Collections

<% for (int i = 0; i < col.length; i++) { %>
  • /handle/<%=col[i].getHandle()%>"><%=col[i].getName()%> [<%=col[i].getCount()%>]
  • <% x++; if (x > 5) { break; } } } %>

    Config additions

    dspace.cfg — as noted, I basically add these entries so I don’t have to hardcode the values into the GetPopularCollections class. These are added around line 166, near log.dir

     

    
    
    log.stem = dspace-log-general
    log.extension  = dat
    
    
    

    And that’s it.  If you want to access the two class files directly, they can be found at:

    The end result of all this, is a display that looks like the following:

    Again, I have no idea if this will be useful for our users in general, but I can see some future applicability in terms of evaluating collection usage in helping to determine what materials to collect and archive.  But I guess we’ll see.

     

    –TR


    Posted

    in

    ,

    by

    Tags:

    Comments

    2 responses to “Dspace Community-list; adding most popular collections entries hack”

    1. Dorothea Avatar

      Interesting. This would be a great sidebar item for DSpace installs with three columns (mine, like yours, is 2-col).

    2. Administrator Avatar
      Administrator

      I’d kindof thought that as well — but I don’t see use moving to a three column structure soon. Honestly, while I’m hoping that users fine this interesting, I’m hoping to get some mileage out of it as well as a collection development tool since it will give me a better way to access collection usage.

      Something I’m toying with right now is using our ldap server to help me identify who our users are when they log in, so I can offer them direct links to collections with which they are associated or their department is associated.

      –TR