Why consecutive slashes in resource path fail in Jar files

Published by Bilal Kaun on

Resource paths with consecutive slashes seem to work in the IDE (STS and IntelliJ) but fail to load when the application is deployed as a Jar file. Let’s see why.

tldr; To speed lookup of resources and classes, Java’s Zip implementation uses hashes to match filenames initially rather than a full string comparison. A double slash in the path is essentially mapping to an entirely different hash and thus to a hash not available in any jar files in the classpath. On a filesystem, the OS treats a slash as a path separator [5] and not member of the actual path itself, so double (or more) slashes map to an actual filesystem object.


The problem

This scenario is all too common, especially when dealing with Thymeleaf templates [1, 2]. Thymeleaf configures path suffixes and prefixes and appends and prepends them to the template name provided by the developer (in an effort to increase readability and reduce boilerplate code). The prefix is usually templates/ whereas the suffix is usually .html. Should the developer forget to not start their template name with a slash and instead return a template name of /index, then the final template name after the prefix and suffix are applied would be: templates//index.html. Notice the double slashes in the path.
This URI is requesting index.html file under the templates folder, which itself is under the resources folder in a typical Spring MVC application. When the code is execute from an IDE, normally the IDE does not package the project as a Jar file before running it and so the error goes unnoticed as the results come back expected. Meanwhile the jar packaged application executing elsewhere fails on the same piece of code.

Analysis

In order to reproduce this problem, observe the following two lines of code:

(1) MainClass.class.getResourceAsStream("/demo/sample.txt");

(2) MainClass.class.getResourceAsStream("/demo//sample.txt");

The first line (1) above locates the resource under the demo folder and returns an InputStream instance to it (or null if resource not found). Executing this, whether inside a jar or outside yields identical results of it successfully finding the resource and returning an InputStream reference.

The second line (2), which is identical to the first line save for the double slashes, however, has inconsistent behaviour inside and outside a jar environment. When run from an IDE or command line, this line succeeds in returning an instance of an InputStream. Yet if run from a Jar file, the call returns a null.

Searching for Resource

It all starts with the Classloader (please see these two fantastic articles on Classloaders intro [3] and deep dive [4]) but briefly (and in loosely technical terms) Java provides a mechanism to locate and load into the JVM classes and resource. This is handy to load libraries and dependencies for the program being executed. These classloaders use a delegation model to establish a search order (discounting for cache, the search starts from the parent classloader then to its child [the platform classloader], then to the application classloader, so on and so forth). Exhaustive discussion of Classloaders are out of scope of this post and for our purposes we will focus on classloaders searching on the classpath.

Omitting many of the finer nuances and exact details, the overall flow is as follows: The Classloader calls on the URLClassPath class to find and create a URL instance for the resource “/demo/sample.txt“. This class works with the url class path loader(s) initialized for that particular classloader. A common example would be a URLClassPath$JarLoader being the loader for the PlatformClassloader.

The flow we’re interested in for the purpose of our discussion is the AppClassloader calling URLClassPath::findResource. When the program is running as a Jar app, the loader available inside URLClassPath is the URLClassPath$JarLoader. However when running as a standalone (IDE or via CLI), the loader available to

Zip files [6] (Jar files) store files and directories in a random accessible structure, while having an index at the end of the file (Central Directory). This lookup contains references to all the files and directories in the zip archives. For each record, the filename is composed of the entire relative path (i.e any directories plus the file name).

In JVM, the JarFile class (which extends the ZipFile class) processes the jar archive upon first initialization and stores the central directory offsets of every file inside the zip against an integer digest of the file names in an in-memory integer array (inside ZipFile.Source static class). This facilitates a fast lookup of resources being searched. However, this hash considers the file name as it is provided, by both the central directory and the resource being queried by the class loader. In our example (2) above, the file in the central directory would not have the consecutive slashes and hence be hashed as such, whereas the string being queried in (2) contains consecutive slashes and so would produce a different hash. The lookup inside ZipFile.Source::getEntryPos(String, boolean) [7] would fail as a result. This leads to exhaustion of search by bother the ZipFile and classloader, leading to an application-space experience of resource not found – unique to a jar execution but not OS filesystem.

Summary

This error, which occurs conditionally only when running an application from a jar, is explained in this article. It stems from confusion surrounding the inconsistent behavior and expectation from classloader lookup of files vs Jar-based files when the path contains consecutive path separators (slashes). Since Jar files are structurally based on Zip files, the file-index of a zip file contains the full relative path as part of a file’s filename. This is then compared to the queried path by the user’s code which may contain consecutive slashes. The strict nature of the comparison means that a zip file’s artifact of demo/abc.txt is a no-match with a queried path of demo//abc.txt. Resulting in a comparison failure.

References

01 – Stackoverflow – https://stackoverflow.com/a/67796868/3084706
02 – Git Issue – https://github.com/spring-projects/spring-boot/issues/1744
03 – Baeldung : Classloaders – https://www.baeldung.com/java-classloaders
04 – VividBreeze : Class Loading – https://dev.vividbreeze.com/jvm-classloading/
05 – Superuser – https://superuser.com/a/1412261
06 – Zip Specification – https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
07 – getEntry(…) – https://github.com/openjdk/jdk/…/java/util/zip/ZipFile.java#L1623

Categories: Uncategorized

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *