URLs: Smart resource identifiers

from http://www.javaworld.com/article/2077397/core-java/urls-smart-resource-identifiers.html

Q: What convenient pluggability patterns exist for loading resources via custom URLs?

A: In the previous Java Q&A article, "Smartly Load Your Properties," you observed how loading resources via classloaders can help decouple application logic from the details of the application's disk location. Although that article put some emphasis on loading .properties definitions, the real message was how it was done: by loading them through the classpath. As long as you can load a piece of data via anInputStream, you can also do it viaClassLoader.getResourceAsStream(), which hides the disk location details behind the classloader façade.

In this follow-up article, I take the next step towards disk location-independent code nirvana. Just like in the previous article, everything will be possible with a very simple augmentation of the existing Java APIs: sometimes the real Java skill is not in writing yet another library but rather in knowing how to use the existing core Java functionality to its full potential.

Think of URLs as smart resource names

In "Smartly Load Your Properties," I showed a simple method for loading resources via classloaders. The end result was that you could use a simple Java string as a resource identifier. For modest goals, that was good enough.

However, resource loading through classloaders can have disadvantages too. If the underlying storage changes (e.g., you add a new file to a directory in the classpath), is a classloader supposed to notice that or must it cache its view of the classpath at the time the classloader is instantiated? Some of them do, but most don't. This behavior is left ambiguous in the Java specifications. And even looking back atPropertyLoader.loadProperties() in my previous Java Q&A post, the caching behavior will depend on the LOAD_AS_RESOURCE_BUNDLE boolean flag.

Loading resources via files, resource bundles, or classloader resource streams are all different resource loading strategies with varying behavioral aspects. Using just a string for a resource name isn't really enough because it says little about the loading strategy—that knowledge must be maintained implicitly in your code.

It makes sense to augment a plain String resource identifier with some self-descriptive metadata that details how the resource is handled. Furthermore, it's even better to convert such an identifier to a Java object that knows how to retrieve its own content.

Java already has something that fits the bill quite well:java.net.URLs. Forgetting about HTTP and the Web for a moment, you can think of a URL as a string in a simple format:

     ":" <protocol-specific/opaque-part>

The protocol prefix is the metadata that explains how to process the rest of the URL, which could be almost an arbitrary string (e.g., a resource identifier and other collateral information). And methods like URL.openStream() andURL.getContent() retrieve the content behind the name.

A filename will always be just a filename, and a classloader resource name will always be just that. But a URL can be both of those and much more. Let's see how to take advantage of it.

URLFactory: an easy way to add custom URL schemes

Java has had java.net.URLs since version 1.0 and has supported several useful schemes (file:, http:, etc). It's not hard to implement a scheme of your own design: you need a custom subclass of java.net.URLStreamHandler and a matching subclass ofjava.net.URLConnection (see white paper and bug report in Resources for full details). However, installing the scheme in a JVM is annoyingly difficult even to this date: you either need to use URL.setURLStreamHandlerFactory() or append a custom package prefix to the global list in the java.protocol.handler.pkgs system property. The first method can only be called once in the JVM's lifetime, and the second option requires enough security privileges and cooperation from all components in the JVM. Neither one has much of a chance working for modular code (e.g., Enterprise JavaBeans (EJB) components). To make matters worse, custom URL handlers are not looked up via thread context classloaders by analogy with Java API for XML Parsing (JAXP) or Java Naming and Directory Interface (JNDI) object factories (see "Find a Way Out of the ClassLoader Maze" for more on this). Coupled with the fact thatjava.net.URL is a final class, this all but kills any aspirations of a budding custom URL Java programmer.

Working around these problems is possible, but requires a little foresight. Specifically, you can instantiate all URLs in your code using the following factory API instead of calling URL constructors directly (once the URL is constructed, however, it can be passed freely into anything, including core Java APIs):

public abstract class URLFactory
{
    public static final String HANDLER_MAPPING_RESOURCE_NAME = "url.properties";
    
    /**
     * A factory method for constructing a java.net.URL out of a plain string.
     * Stock URL schemes will be tried first, followed by custom schemes with
     * a mapping in the union of all {@link #HANDLER_MAPPING_RESOURCE_NAME}
     * resources found at the time this class is initialized.
     *  
     * @param url external URL form [may not be null].
     * @return java.net.URL corresponding to the scheme in 'url'.
     * 
     * @throws MalformedURLException if 'url' does not correspond to a known
     * stock or custom URL scheme.
     */
    public static URL newURL (final String url)
        throws MalformedURLException
    {
        // Try already installed protocols first:
        try
        {
            return new URL (url);
        }
        catch (MalformedURLException ignore)
        {
            // Ignore: try our handler list next.
        }
        
        final int firstColon = url.indexOf (':');
        if (firstColon <= 0)
        {
            throw new MalformedURLException ("no protocol specified: " + url);
        }
        else
        {
            final Map handlerMap = getHandlerMap ();
            
            final String protocol = url.substring (0, firstColon);
            final URLStreamHandler handler;
            
            if ((handlerMap == null) ||
                (handler = (URLStreamHandler) handlerMap.get (protocol)) == null)
                throw new MalformedURLException ("unknown protocol: " + protocol);
                
                
            return new URL (null, url, handler);
        }
    }
    
    
    /**
     * Not synchronized by design. You will need to change this if you make
     * HANDLERS initialize lazily (post static initialization time).
     * 
     * @return scheme/stream handler map [can be null if static init failed].
     */
    private static Map /* String->URLContentHandler */ getHandlerMap ()
    {
        return HANDLERS;
    }
    
    /*
     * Loads a scheme/handler map that is a union of *all* resources named
     * 'resourceName' as seen by 'loader'. Null 'loader' is equivalent to the
     * application loader.
     */
    private static Map loadHandlerList (final String resourceName,
                                        ClassLoader loader)
    {
        if (loader == null) loader = ClassLoader.getSystemClassLoader ();
        
        final Map result = new HashMap ();
        
        try
        {
            // NOTE: using getResources() here.
            final Enumeration resources = loader.getResources (resourceName);
            
            if (resources != null)
            {
                // Merge all mappings in 'resources':
                
                while (resources.hasMoreElements ())
                {
                    final URL url = (URL) resources.nextElement ();
                    final Properties mapping;
                    
                    InputStream urlIn = null;
                    try
                    {
                        urlIn = url.openStream ();
                        
                        mapping = new Properties ();
                        mapping.load (urlIn); // Load in .properties format.
                    }
                    catch (IOException ioe)
                    {
                        // Ignore this resource and go to the next one.
                        continue;
                    }
                    finally
                    {
                        if (urlIn != null) try { urlIn.close (); }
                                           catch (Exception ignore) {} 
                    }
                    
                    // Load all handlers specified in 'mapping':
                     
                    for (Enumeration keys = mapping.propertyNames ();
                         keys.hasMoreElements (); )
                    {
                        final String protocol = (String) keys.nextElement ();
                        final String implClassName = mapping.getProperty (protocol);
                        
                        final Object currentImpl = result.get (protocol);
                        if (currentImpl != null)
                        {
                            if (implClassName.equals (currentImpl.getClass ().getName ()))
                                continue; // Skip duplicate mapping.
                            else
                                throw new IllegalStateException ("duplicate "  +
                                    "protocol handler class [" + implClassName +
                                    "] for protocol " + protocol);
                        }
                        
                        result.put (protocol,
                                    loadURLStreamHandler (implClassName, loader));
                    }
                }
            }
        }
        catch (IOException ignore)
        {
            // Ignore: an empty result will be returned.
        }
        
        return result;  
    }
    /*
     * Loads and initializes a single URL stream handler for a given name via a
     * given classloader. For simplicity, all errors are converted to
     * RuntimeExceptions.
     */    
    private static URLStreamHandler loadURLStreamHandler (final String className,
                                                          final ClassLoader loader)
    {
        final Class cls;
        final Object handler;
        try
        {
            cls = Class.forName (className, true, loader);
            handler = cls.newInstance ();
        }
        catch (Exception e)
        {
            throw new RuntimeException ("could not load and instantiate" +
                " [" + className + "]: " + e.getMessage ());
        }
        
        if (! (handler instanceof URLStreamHandler))
            throw new RuntimeException ("not a java.net.URLStreamHandler" +
                " implementation: " + cls.getName ());
        
        return (URLStreamHandler) handler;
    }
    
    /**
     * This method decides which classloader will be used by all
     * resource/classloading in this class. At the very least, you should use the current thread's
     * context loader. A better strategy is to use techniques shown in
     * http://www.javaworld.com/javaworld/javaqa/2003-06/01-qa-0606-load.html.
     */
    private static ClassLoader getClassLoader ()
    {
        return Thread.currentThread ().getContextClassLoader ();
    }
    
    
    private static final Map /* String->URLContentHandler */ HANDLERS;
    private static final boolean DEBUG = true;
    
    static
    {
        Map temp = null;
        try
        {
            temp = loadHandlerList (HANDLER_MAPPING_RESOURCE_NAME,
                                    getClassLoader ());
        }
        catch (Exception e)
        {
            if (DEBUG)
            {
                System.out.println ("could not load all" +
                    " [" + HANDLER_MAPPING_RESOURCE_NAME + "] mappings:");
                e.printStackTrace (System.out);
            }
        }
        
        HANDLERS = temp;
    }
} // End of class.

Look at the newURL() method. It starts by attempting a plainURL(String) constructor. If that works, the URL string belongs to one of the core protocols, so the existing schemes just pass through this method. Otherwise, the URL protocol prefix is determined, and the URL is constructed by coupling its source string with a customURLContentHandler implementation. All such handlers are kept in the HANDLERS map, which is populated at static initialization time. The protocol prefix-handler class mapping comes from a classloader resource"url.properties". The custom URL result is constructed with a special three-parameter URL constructor, which is the ultimate reason for using this factory. It is rather significant that this resource is retrieved using ClassLoader.getResources() and not ClassLoader.getResource(), something I will further comment on later.

Example: a custom URL scheme for classloader lookup

As an example of what URLFactory can do, let's add a protocol that maps resource names to classloader resources:

    "clsloader:"getResourceAsStream()>

This is actually quite easy. The complete implementation is shown here:

public class ClassLoaderResourceHandler extends URLStreamHandler
{
    public static final String PROTOCOL = "clsloader";
    
    
    protected URLConnection openConnection (final URL url) throws IOException
    {
        return new ClassLoaderResourceURLConnection (url);
    }
    
    /**
     * This method should return a parseable string form of this URL.
     */
    protected String toExternalForm (final URL url)
    {
        return PROTOCOL.concat (":").concat (url.getFile ());
    }
    
    /**
     * Must override to prevent default parsing of our URLs as HTTP-like URLs
     * (the base class implementation eventually calls setURL(), which is tied
     * to HTTP URL syntax too much).
     */
    protected void parseURL (final URL context, final String spec,
                             final int start, final int limit)
    {
        final String resourceName =
            combineResourceNames (context.getFile (), spec.substring (start));
        
        setURL (context, context.getProtocol (), "", -1, resourceName, "");
    }
    
    /*
     * The URLConnection implementation used by this scheme. 
     */
    private static final class ClassLoaderResourceURLConnection
        extends URLConnection
    {
        public void connect ()
        {
            // Do nothing, as we will look for the resource in getInputStream().
        }
        
        public InputStream getInputStream () throws IOException
        {
            // This always uses the current thread's context loader. A better
            // strategy is to use techniques shown in
            // http://www.javaworld.com/javaworld/javaqa/2003-06/01-qa-0606-load.html.
            final ClassLoader loader = Thread.currentThread ().getContextClassLoader ();
            
            // Don't be fooled by our calling url.getFile(): it is just a string,
            // not necessarily a real filename.
            String resourceName = url.getFile ();
            if (resourceName.startsWith ("/"))
                resourceName = resourceName.substring (1);
            
            final InputStream result = loader.getResourceAsStream (resourceName);
            
            if (result == null)
                throw new IOException ("resource [" + resourceName + "] could "
                + "not be found by classloader [" + loader.getClass ().getName ()
                + "]");
            
            return result; 
        }
        protected ClassLoaderResourceURLConnection (final URL url)
        {
            super (url);
        }
    } // End of nested class.
    
    
    private static String combineResourceNames (String base, String relative)
    {
        if ((base == null) || (base.length () == 0)) return relative;
        if ((relative == null) || (relative.length () == 0)) return base;
        
        if (relative.startsWith ("/"))
            // 'relative' is actually absolute in this case.
            return relative.substring (1);
        
        if (base.endsWith ("/"))
            return base.concat (relative);
        else
        {
            // Replace the name segment after the last separator:
            final int lastBaseSlash = base.lastIndexOf ('/');
            
            if (lastBaseSlash < 0)
                return relative;
            else
                return base.substring (0, lastBaseSlash).concat ("/")
                    .concat (relative);
        }
    }
} // End of class.

There are a few subtle points in the above code. For historical reasons, java.net.URL design is very biased towards HTTP-like URLs. Although openConnection() is the only method that is a mandatory override fromjava.net.URLStreamHandler, I also override toExternalForm() andparseURL(). This is necessary to prevent the input string from being parsed as an HTTP-like URL. I need to keep intact the part of the input string after the colon (this is the "opaque" part of my URL scheme). I accomplish that by masquerading all of it as the "file" part in the constructed URL. This is done by calling the protected setURL() handler method, which in turn dispatches to the protected and otherwise inaccessible URL.set()method. (Why is URL.set() a protected method in a final class? Your guess is as good as mine.)

The combination of methods overridden in ClassLoaderResourceHandler and its nested connection class are enough to make URL.openStream() and related methods work by connecting the URL input stream to the classloader resource input stream. (They are still not enough to make URL.getContent() work, but I leave that as an exercise for you.) For simplicity, I use the current thread's context classloader for all resource lookup (see comments in code for other strategies).

As a demo of how this works, let's add a single line:

    clsloader = ClassLoaderResourceHandler

to a file "url.properties" that will be packaged along with the URLFactory classes and try the new URL syntax with this simple Swing application that displays ajavax.swing.ImageIcon:

public class URLDemo extends JPanel
{
    public static void main (final String[] args)
    {
        if (args.length == 0)
        {
            System.out.println ("usage: URLDemo url");
            System.exit (1);
        }
        final String iconURL = args [0];
        
        JFrame frame = new JFrame ("URL demo");
        
        frame.setDefaultCloseOperation (JFrame.EXIT_ON_CLOSE);
        URLDemo newContentPane = new URLDemo (iconURL);
        newContentPane.setOpaque (true);
        
        frame.setContentPane (newContentPane);
        frame.pack ();
        frame.setVisible (true);
    }
    public URLDemo (final String iconURL)
    {
        super (new GridLayout(1, 1));
        ImageIcon icon = createImageIcon (iconURL, "JavaWorld logo");
        JLabel label = new JLabel ("image loaded from " + iconURL,
                                   icon, JLabel.CENTER);
        add (label);
    }
    private static ImageIcon createImageIcon (final String url,
                                              final String description)
    {
        try
        {
            return new ImageIcon (URLFactory.newURL (url), description);
        }
        catch (IOException ioe)
        {
            System.err.println ("couldn't load icon URL: " + url);
            ioe.printStackTrace ();
            
            return null;
        }
    }
    
} // End of class

This demo is a reimplemented LabelDemo from Sun Microsystems' Swing tutorial. It has been modified to take the icon image URL as a command-line parameter. It then passes the URL string into a modified version ofcreateImageIcon() that uses URLFactory. Assuming all classes and resources have been packaged in urldemo.jar, the following all work equally well:

    java -cp urldemo.jar URLDemo clsloader:images/jwlogo.gif
    java -cp urldemo.jar URLDemo jar:file:urldemo.jar!/images/jwlogo.gif
    java -cp urldemo.jar URLDemo http://www.javaworld.com/images/top_jwlogo.gif

(However, only the first option is truly disk position-independent and requires no connection to an HTTP server.)

If for some reason you need to go back to using files, it can be done without any code modifications. Assuming you extract all images from urldemo.jar into a local directory files, this will work too:

    java -cp urldemo.jar URLDemo file:files/jwlogo.gif

(You can experiment with these options using this article's download.)

This kind of resource strategy flexibility is impossible in the original LabelDemo, which is caught in a dilemma: the javax.swing.ImageIcon constructor takes either a filename string or a URL. The first option is disk position-dependent. The second is not, but such a URL cannot be constructed by feeding a single String to a URL constructor to make that URL reference a classloader resource. This forces the original LabelDemo to hardcode just one strategy in its version of createImageIcon():

  protected static ImageIcon createImageIcon(String path,
                                               String description) {
        java.net.URL imgURL = LabelDemo.class.getResource(path);
        if (imgURL != null) {
            return new ImageIcon(imgURL, description);
        } else {
            System.err.println("Couldn't find file: " + path);
            return null;
        }
    }

Another powerful option: base and relative URLs

Before moving on, I'll mention another good design strategy. It is somewhat underappreciated that java.net.URLconstructors that take a URL context parameter essentially allow you to build URLs by combining two parts, a base URL and a relative URL. The following

   URL base = URLFactory.newURL ("clsloader:images/");
   URL url = URLFactory.newURL (base, "jwlogo.gif");

uses a two-step process to build a URL equivalent toURLFactory.newURL("clsloader:images/jwlogo.gif"). However, imagine that base is constructed in a special bootstrap part of the application and can be set toURLFactory.newURL("clsloader:images_en_US/") orURLFactory.newURL("clsloader:images_fr_CH/") based on external requirements. You have just gained an ability to localize all image resource URLs in the application with very little work while keeping the locale selection decision to a single point in the Java code. You should get a lot of other ideas at this point.

The opportunities here are truly limitless. Because file: URLs always accept forward slashes as platform-independent separators (and that's what I also happen to use in my clsloader: custom scheme), I can even transparently alternate the base URLs between those two schemes! In one fell swoop I can change the resource loading strategy from disk files to classpath resources with almost no code changes.

To support URL recombination in your own schemes, you need to be prepared to handle the non-null context parameter passed into the parseURL() method and have a way of combining it with the spec string. This is what I do in theClassLoaderResourceHandler.combineResourceNames() private method. The rules are similar for combining HTTP and file: URLs (names ending in "/" are treated as directories). You can even add processing for "." and ".." in resource names—the result feels just like filesystem browsing of the classpath without ever directly touching the disk!

getResources(): the smart way to load plug-ins

I now come to the second idea promised at the end of "Smartly Load Your Properties." I think of it as a zero code change pluggability pattern.

Hidden in URLFactory.loadHandlerList()'s implementation is a small Java design gem: retrieving a list of all identically named ("url.properties" in this case) classloader resources via ClassLoader.getResources(). This allows me to merge handler mapping data from as many url.properties resources as are deployed in the classpath (without knowing where they are exactly). This idea is borrowed from JNDI internals and is a rare Java situation where duplicate classpath entries would be there by design.

To show this off, let me pretend that long after deploying my URLDemo application I decide to extend it by adding a plug-in that implements a new custom URL scheme. This will feel quite artificial, but just as an illustration I create a urljoin: protocol that joins data streams from several nested URLs. That is, a URL

    "urljoin:"<url_1>","<url_2>","...

will first read data from url_1, then url_2, and so on. I demo it by reading jwlogo.gifin two chunks. A less crazy example of using urljoin: is to merge application-global and module-specific .properties configurations into one while keeping the underlying files separate.

The new scheme's implementation is quite straightforward (it is in the classURLJoinHandler in this article's download). Next, I create a new url.properties resource with a mapping

    urljoin = URLJoinHandler,

split jwlogo.gif into two pieces, and package everything together with the new classes into plugin.jar. The new URL scheme is now picked up automatically:

   java -cp urldemo.jar;plugin.jar URLDemo      urljoin:clsloader:images/jwlogo_1.gif,clsloader:images/jwlogo_2.gif

I want to emphasize what just happened: I extended the original application by merely adding a new jar to the classpath without changing a single line of code or even editing an existing property file. This is simply the ultimate in pluggability. I used this approach to extend my original URLFactory's capabilities, but the same idea obviously works in any plug-in scenario. And it does not require scanning anything on disk.

A sure sign of a novice Java programmer is to instead use code that figures out a classpath string of some kind and actually proceeds to scan all classpath directories and archives looking for "all classes in a given package" or other signs of plug-ins. That is really a hack that adds a lot of disk position-dependent code in your application. Java may one day learn to load classes from other types of archives besides .zip files (in fact, some JVMs already do that), so do not make your application dependent on a particular disk archive format. Let getResources() and a pattern of identically named classloader resources guide you towards disk position-independent nirvana.

So, what about JAR URLs or java.net.URIs?

Suffice it to say that JAR URLs are useful, but not for abstracting away the resource location details. Java 2 Platform, Standard Edition (J2SE) 1.4 also adds java.net.URI. This new class possibly fits the need for a generic resource identifier better thanjava.net.URL, but it is too new to be of considerable value at the time of writing. A URI needs to be converted to a URL via toURL(), and the tricks used for coercing URI data to java.net.URL format under the hood are similar to what I used in myURLStreamHandlers above.

Concluding remarks

Finally, I would like for you to take away at least the following points:

Smart resource descriptors like java.net.URLs help abstract away not only the location of resources but also many aspects of the resource loading strategy. If the requirements force you to revert to using files, using URLs can help insulate this fact from the rest of your code. It is not hard to reinvent custom descriptor classes, but java.net.URLs have the advantage because they are already integrated into core Java.
With a little foresight, you can make your applications extendible and pluggable without code or configuration changes by using ClassLoader.getResources() and a pattern of identically named resources.

posted @ 2014-07-25 01:10 princessd8251 阅读(177) 评论(0) 收藏举报

刷新页面返回顶部