Be Careful with file URLs |
Different Ways to Name Files
There are (at least) five ways to name files:
-
The platform-specific notation, called pathnames here (e.g.,
/abc/def/ghi.txt
on Unix,a:\bcd\efg\hij.txt
on DOS and Windows, andabc:def:ghi.txt
on Macintosh). -
A UNC-like notation, called UNC names here (e.g.,
//./abc/def/ghi.txt
or//./a:/bcd/efg/hij.txt
). The osl layer used to make heavy use of these as a platform-independent notation, but since osl has shifted to file URLs as the platform-independent notation (see below), UNC names have been deprecated and became pretty much useless (and are only mentioned here for completeness). -
The file URLs used by the osl layer as a platform-independent notation, called osl URLs here (e.g.,
file:///abc/def/ghi.txt
orfile:///a:/bcd/efg/hij.txt
). Read on to learn why it is important to explicitly label these file URLs as osl URLs. -
The file URLs used by the File Content Provider (FCP) within the Universal Content Broker (UCB), called FCP URLs (e.g.,
file:///home/usr123/work/abc.txt
orfile:///user/work/abc.txt
). Normally, osl URLs and FCP URLs are the same (after all, the FCP uses osl to access the files). But the FCP has a feature called mount points that allows it to restrict access to only certain files (those that lie below a given set of mount points in the file system hierarchy), and to give names to these files that hide their real locations.For example, if you have a mount point named
user
at the osl URLfile:///home/usr123
, the osl URLfile:///home/usr123/work/abc.txt
corresponds to the FCP URLfile:///user/work/abc.txt
. If you only have that single mount point, the osl URLfile:///home/usr567/work/def.txt
has no corresponding FCP URL (and cannot be accessed via the FCP). -
The URLs used by the UCB, called UCB URLs (e.g.,
file:///a:/bcd/efg/hij.txt
orvnd.sun.star.wfs:///user/work/abc.txt
). Normally, FCP URLs and UCB URLs are the same, because the UCB hands file URLs directly to the FCP. But there is a special content provider, the Remote Access Content Provider (RAP), that allows to rewrite URLs before passing them on to other content providers. This is used, for example, in the Sun ONE Webtop (S1W), where there are typically two file systems: a client file system accessed via normal (FCP) file URLs (i.e., there is no rewriting RAP between the UCB and the client FCP), and a server file system accessed via (FCP) URLs where thefile
scheme has been replaced withvnd.sun.star.wfs
(i.e., there is a rewriting RAP between the UCB and the server FCP).
The last two notations (FCP URLs and UCB URLs) are relatively unknown, because in a plain OpenOffice installation neither mount points nor the RAP are used, so that osl URLs, FCP URLs and UCB URLs are all identical. But when you want to write correct code that also works in unusal deployments (or in the S1W, which should be regarded not too unusal), you have to be well aware of these different notations all labeled as "URLs."
Where Different Notations are Used
As mentioned before, use of UNC names is deprecated. Also, since most code accesses the FCP not directly, but via the UCB, FCP URLs are only of interest to hard core UCB users (who should know what they are doing, anyway). So, in the following we can concentrate on three different notations: pathnames, osl URLs, and UCB URLs.
Where Pathnames are Used
Pathnames are used in only a few places, because the default notation used by osl (the lowest level of concern to us) already are osl URLs (which are a level above pathnames). It can be argued that interfaces that use pathnames should use osl URLs instead, and that pathnames are only of interest when communicating with the external world (other processes, or the human user).
One place where pathnames are used is class utl::TempFile
.
Where osl URLs are Used
The osl file system functions (in osl/file.h
and
osl/file.hxx
) now generally use osl URLs in their interfaces.
There should be few places above osl where osl URLs instead of UCB URLs are used (because generally all file access should be done through the UCB, and not directly via osl). One notable exception is the handling of temporary files (see above).
Where UCB URLs are Used
Generally, all interfaces that are designed to communicate resource names within the OpenOffice framework should use UCB URLs, and all implemenations that access resources by these names should do so via the UCB. Another advantage of this is that without any extra effort not only file resources can be accessed, but also other resources like HTTP and FTP (by using appropriate URLs, but these URLs can be opaque to the code, only interpreted by the UCB).
Converting between Different Notations
Sometimes it may be necessary to convert between different notations, and the routines to do so are well available:
-
The methods
osl::FileBase::getFileURLFromSystemPath()
andosl::FileBase::getSystemPathFromFileURL()
(and their plain C counterparts inosl/file.h
) convert between pathnames (called "system paths" here) and osl URLs. -
The methods
utl::LocalFileHelper::ConvertSystemPathToURL()
andutl::LocalFileHelper::ConvertURLToSystemPath()
convert between pathnames (again called "system paths" here) and UCB URLs.Because there can be scenarios where you have multiple FCPs on different file systems, it can be ambigious how to convert from a pathname (that does not contain any information identifying a specific file system) to a UCB URL. Therefore,
ConvertSystemPathToURL()
requires an additional parameterBaseURL
that identifies the FCP to be used. -
There are convenience methods
utl::LocalFileHelper::ConvertPhysicalNameToURL()
andutl::LocalFileHelper::ConvertURLToPhysicalName()
that choose the local FCP asBaseURL
and then forward to the aboveLocalFileHelper
methods.For this to work, the UCB maintains a notion of locality of content providers. This is an heuristic algorithm based on how the UCB accesses individual content providers (within the same process, via a pipe on the same machine, via a socket over a network). The net effect is that the UCB should always choose as most local the FCP running on the same machine as the UCB, and using these
LocalFileHelper
methods will then always convert between UCB URLs and pathnames that are valid on this machine.ConvertURLToPhysicalName()
also makes sure to do the conversion only if the given UCB URL corresponds to a local pathname (and not to a pathname on a non-local file system).
There is no direct way to convert between osl URLs and UCB URLs. To
convert from an osl URL to a UCB URL, use
osl::FileBase::getSystemPathFromFileURL()
followed by
utl::LocalFileHelper::ConvertPhysicalNameToURL()
. To convert
from a UCB URL to an osl URL, use
utl::LocalFileHelper::ConvertURLToPhysicalName()
followed by
osl::FileBase::getFileURLFromSystemPath
. But be aware that this
only works if the osl URL and the UCB URL shall denote files within the same
file system.
Author: Stephan Bergmann (Last modification $Date: 2003/12/06 22:37:31 $). Copyright 2001 OpenOffice.org Foundation. All Rights Reserved. |