BlockFile
BlockFile
is an interface for accessing files full of identically sized "blocks" of data. This interface is like
IntFile
, in that they both describe the abstract methods for accessing their respective files, and both have two concrete implementations: one for mapped file access, and the other for normal I/O access. In the case of
BlockFile
these are
MappedBlockFile
and
IOBlockFile
.
Unlike
IntFile
, the concrete implementations for
BlockFile
share a significant portion of their code. This is shared between the two classes by having them both extend an abstract class called
AbstractBlockFile
, which in turn implements
BlockFile
. So
MappedBlockFile
and
IOBlockFile
do not implement
BlockFile
directly.
Redundant Interfaces
This arrangement makes
BlockFile
superfluous, since
AbstractBlockFile
can be used in every context where
BlockFile
might have been. However, using an interface in this way is a well established technique in Kowari. It allows for the rapid development of alternative implementations as the need arises. It also permits an object that fulfils a number of roles to implement more than one interface. These advantages both help the codebase stay flexible for ongoing development.
Finally, these interfaces, redundant or not, provide a consistent way of documenting the method of using a class (its platonic "interface", as opposed to the grammatical construct in the Java language).
Records
The point of a
BlockFile
is to store a series of records of identical structure in a file. Each record will be exactly the same size. The data structure within that record is left up to the caller. Kowari uses
BlockFile
s to store numerous data types, including strings (of defined lengths), numbers, tree nodes, and linked lists.
Files which hold records like this are essential to scalable systems, as data from anywhere in the system can be found and modified rapidly. Variable length structures can be more space efficient, but require that a file be traversed before data can be found, and modification is often impossible. Variable length structures can be indexed for rapid retrieval, but modification is often impractical.
As a side note, the expected replacement for the Kowari file system
will use variable length structures, but with inherent indexing to make traversal faster than it currently is. The problem with modifying data will still exist, but this will be managed in a linear annotation queue, which will be merged back in to the main data silently in the background. The result should demonstrate much faster read and write operations, particularly for terabyte scale files.
Methods
setNrBlocks()
and
getNrBlocks()
are used to set and get the length of the block file. Since the file may only be viewed as a collection of blocks, then its length should only be measured in units of blocks.
Like all the other file-manipulating classes,
clear()
is used to remove all data from the file, and set it to zero length.
force()
is used to block the process until all data is written to disk.
unmap()
attempts to remove any existing file mappings,
close()
closes the file, and
delete()
closes the file and removes it from the disk. See
IntFile
for a more detailed explanation.
The remaining methods are all for manipulating the blocks within this file.
Blocks
The
Block
class was created to abstract away access to the records stored within a
BlockFile
. The block is always tied back to its location on disk, so any operations on that block will always be to the specific location on disk. This is important for memory mapped files, as it means that a write to a block will be immediately reflected on disk (though disk I/O can be delayed by the operating system) without performing any explicit write operations with the block.
Implementations of
BlockFile
are also expected to manage
Block
s through an object pool. This was particularly important in earlier JVMs, where object allocation and garbage collection were expensive operations. Fortunately, recent versions of Java no longer suffer these effects, and an object pool can even slow a process down.
The other reason to use an object pool for blocks is to prevent the need to re-read any blocks still in memory. This is not an issue for memory mapped files, but if the blocks are accessed with read/write operations then they need to be initialized by reading from disk. The object pool for blocks will re-use a block associated with a particular file address whenever one is available. If one is not available, then the standard pool technique of re-initializing an existing object is used. With Java 1.5, it is this last operation which may be discarded.
As a consequence of the above, the remaining
BlockFile
methods all accept an
ObjectPool
as a parameter. This may appear unnecessary, since a pool would usually be associated with an instance of the
BlockFile
class, but in this case there are several pools which may be used, depending on the needs of the calling code.
Block
Methods
allocateBlock()
is used to get a
Block
for a particular file position, defined by a
blockId parameter, but not to read the file for any existing data. This is for new positions in the file where the data was previously unused. There is no reason to read the data at this point, and the read operation will be time consuming. Writing the resulting block will put its contents into the file at the given block ID.
readBlock
is similar to
allocateBlock
, but the block is already expected to have data in it. The
Block
object is allocated and associated with the appropriate point in the file, and data is read into it.
Since read operations are never explicit in memory mapped files, the
allocateBlock()
and
readBlock()
methods are identical in the
MappedBlockFile
class.
Any blocks obtained from
allocateBlock()
and
readBlock()
should be passed to
releaseBlock()
once they have been finished with. This allows the block to be put back into the object pool. While not strictly necessary, it should improve performance.
writeBlock()
writes the contents of a
Block
's buffer to that
Block
's location on disk. This performs a
write()
in
IOBlockFile
and is an empty operation (a no-op) in
MappedBlockFile
.
The
copyBlock()
method sets a new block ID on a given
Block
. This now means that the same data is set for a new block. On
IOBlockFile
this is a simple operation, as the buffer remains unchanged. However, on
MappedBlockFile
the contents of the first block have to be copied into the new block's position. This is unusual, since
MappedBlockFile
usually does less work in most of its methods.
writeBlock
simply writes a block's buffer back to disk. This is a no-op in
MappedBlockFile
.
modifyBlock()
and
freeBlock()
should be removed from this class, as they are operations which are designed for
ManagedBlockFile
. This class provides a layer of abstraction over
BlockFile
and will be described later. The only implementations of these methods is in
AbstractBlockFile
where an
UnsupportedOperationException
is thrown.
Block File Types
The
BlockFile
interface also includes a static inner class which defines an enumeration of the types of
BlockFile
. This is the standard enumeration pattern used before Java 1.5.
The defined types are:
- MAPPED
- The file is memory mapped. Reading and writing are implicit, and the read/write methods perform no function.
- EXPLICIT
- The file is accessed with the standard read/write mechanism.
- AUTO
- The file is accessed by memory mapping if the operating system supports a 64 bit address space. Otherwise file access is explicit.
- DEFAULT
- This is set to AUTO, unless the value is overwritten at the command line. The override is performed by defining tucana.xa.defaultIOType as mapped, explicit or auto.
The code for determining this setup is in the static initializer of the enumeration class.
This enumeration class is used by the factory methods in
AbstractBlockFile
to determine which concrete implementation should be instantiated.