Learning Node.js: Building A Static File Server
As I've been learning about Node.js, I've come to the realization that learning Node.js is only possible in sufficiently complex contexts. If things are too simple, then the fact that it is "JavaScript" makes it seem deceptively simple. As such, I'd like to embark on some non-trivial experiments that force me to dig into the Node.js workflows and practices. This morning, I'd like to look at building a static file server in Node.js.
View this code in my Learning Node.js project on GitHub.
A static file server is a really great harness for experimentation because, in Node.js, an HTTP server can be anywhere from a few lines of code to thousands of lines of code depending on the robustness of functionality. As such, it lends itself well to feature iteration and a gradual increase in complexity.
For my static file server, I really wanted to learn about about promises and streams - the foundation of many Node.js workflows. Not only did I want to stream files to the user, I also wanted to be able to generate ETags and conditionally serve files right out of memory (rather than going back to the disk). This gave me a great opportunity to deal with multi-destination streaming. As files were pulled off the disk, I could stream them to as many as three destinations:
- The response stream (ie, to the user).
- The ETag generation stream (for generating and caching ETags).
- The content aggregation stream (for caching file content in memory).
When I first started coding, my initial approach was to create a single chain of streams, piping one into the next:
file.pipe( etagStream ).pipe( contentStream ).pipe( response )
When you first get into Node.js, I think the allure of streams kind of steers you in this direction. Pipe all the things! But, the philosophical simplicity of this falls flat when you start to implement your own solution. What happens when one of those middle pipes is optional? What happens when one of the pipes throws an error?
You can get the single-pipe-chain to work; but, after some noodling on the topic, I decided that a much more sane approach would be to pipe the file into individual streams that would each be optional and have their own error handling:
file.pipe( response )
[optional] file.pipe( etagStream )
[optional] file.pipe( contenStream )
Using this approach allows "error" event handlers to be bound based on the type of cleanup that has to take place. The main response doesn't have to know about the use of ETags or the caching of content - it just streams the file to the user. The optional streams then bind their own "error" event handlers and perform their own contextual cleanup.
To get this up an running, I created a simple HTTP server that pipes all requests into the static file server:
// Require node modules.
var http = require( "http" );
var staticFileServer = require( "./lib/static-file-server" );
var chalk = require( "chalk" );
// Create an instance of our static file server.
var fileServer = staticFileServer.createServer({
// I tell the static file server which directory to use when resolving paths.
documentRoot: ( __dirname + "/wwwroot/" ),
// I tell the static file server which default document to use when the user requests
// a directory instead of a file.
defaultDocument: "index.htm",
// I tell the static file server the max-age of the Cache-Control header.
maxAge: 604800, // 7 days.
// I tell the static file server which portions of the URL path to strip out before
// resolving the path against the document root. This allows parts of the URL to serve
// as a cache-busting mechanism without having to alter the underlying file structure.
magicPattern: /build-[\d.-]+/i,
// I tell the static file server the maximum size of the file that can be cached in
// memory (larger files will be piped directly from the file system).
maxCacheSize: ( 1024 * 100 ) // 100Kb.
});
// Create an instance of our http server.
var httpServer = http.createServer(
function handleRequest( request, response ) {
// For now, just pass the incoming request off to the static file server; we'll
// assume that all requests to this app are for static files.
fileServer.serveFile( request, response );
}
);
httpServer.listen( 8080 );
console.log( chalk.cyan( "Server running on port 8080" ) );
As you can see, each request that comes into the HTTP server is handed off to our instance of StaticFileServer. The StaticFileServer then uses the following workflow logic:
- Resolve the scriptName against the file system (returns a promise).
- If ETag is present, try to return a 304 Not Modified (if possible).
- If content is cached, try to return file from memory.
- Stream file to user.
- Stream file to ETag generation (if not already cached).
- Stream file to content aggregation (if not already cached).
As I implemented this, the hardest part was trying to figure out what to do about errors. When a piped-stream encounters an error event, very little happens automatically. The destination stream is unpiped, but both streams continue to work (unless otherwise coded). As such, we have to listen for "error" events and then clean up the existing streams as best we can. Using multi-destination streaming - as opposed to a single chain of stream pipes - makes this much easier to reason about.
That said, here's my current implementation of a static file server in Node.js. It's not perfect; but, again, this was just a learning experiment:
// Require the core node modules.
var fileSystem = require( "fs" );
var url = require( "url" );
var stream = require( "stream" );
var util = require( "util" );
var crypto = require( "crypto" );
var Q = require( "q" );
// Require our utility classes.
var mimeTypes = require( "./mime-types" );
var ContentCache = require( "./content-cache" ).ContentCache;
var ETagStream = require( "./etag-stream" ).ETagStream;
var BufferReadStream = require( "./buffer-read-stream" ).BufferReadStream;
var BufferWriteStream = require( "./buffer-write-stream" ).BufferWriteStream;
// I am a convenience method that creates a new static file server.
exports.createServer = function( config ) {
return( new StaticFileServer( config ) );
};
// Export the constructor as well.
exports.StaticFileServer = StaticFileServer;
// ----------------------------------------------------------------------------------- //
// ----------------------------------------------------------------------------------- //
// Set up Q-proxied methods. These will allow the standard callback-oriented methods
// to be used as if they returned promises.
var fileSystemStat = Q.nbind( fileSystem.stat, fileSystem );
// I provide a static file server that will resolve incoming requests against the given
// document root and stream files to the response.
function StaticFileServer( config ) {
this._config = config;
// I cache data associated with the requests. At a minimum, this will be used to
// cache ETag values; but, it may also be used to cache full file content.
this._contentCache = new ContentCache();
}
StaticFileServer.prototype = {
constructor: StaticFileServer,
// ---
// PUBLIC METHODS.
// ---
// I stream the request for the associated static file into the response.
serveFile: function( request, response ) {
// Setup a reference to "this" within the closure (to help with de-bound methods).
var server = this;
// Calculate the file that we are supposed to be accessing.
var parsedUrl = url.parse( request.url );
var scriptName = this._resolvePath( parsedUrl.path );
// Stat the requested file to make sure that it exists. This will resolve with
// both the scriptName and the stat of the file.
// --
// CAUTION: The resolved scriptName may not be the same value as the original
// scriptName if the resolution had to traverse a directory.
this._resolveScriptName( scriptName ).then(
function handleScriptNameResolve( resolution ) {
// Check to see if we have a cached ETag for this script.
// --
// NOTE: We cache the ETag with the associated mtime (modified date)
// of the file. This way, if the file is modified, the ETag won't be
// returned until the ETag is re-cached.
var etag = server._contentCache.getETag( resolution.scriptName, resolution.stat.mtime );
// If we have the ETag, we can set the header and try to compare the
// ETag against the incoming request.
if ( etag ) {
response.setHeader( "ETag", etag );
// If the incoming ETag matches the one we have cached, we can
// stop processing the request and return Not Modified response.
if ( etag === request.headers[ "if-none-match" ] ) {
response.writeHead( 304, "Not Modified" );
return( response.end() );
}
}
// Set the headers that we know we always need.
// --
// CAUTION: Once we set the content-length, the browser will continue
// to expect data even if the request dies half-way.
response.setHeader( "Content-Type", mimeTypes.getFromFilePath( resolution.scriptName ) );
response.setHeader( "Content-Length", resolution.stat.size );
// If the user provided a max-age for caching, add the cache header.
if ( server._config.maxAge ) {
response.setHeader( "Cache-Control", ( "max-age=" + server._config.maxAge ) );
}
// Default to a 200 OK response until we catch any errors.
// --
// NOTE: This allows us to set the status code without calling the
// .writeHead() method which will commit the headers early.
response.statusCode = 200;
// Check to see if we have cached file content for this script.
// --
// NOTE: We cache the content with the associated mtime (modified date)
// of the file. This way, if the file is modified, the content won't be
// returned until the content is re-cached.
var content = server._contentCache.getContent( resolution.scriptName, resolution.stat.mtime );
// If we have cached content, we can use it to stream to the file into
// request without having to go back to disk.
if ( content ) {
// NOTE: I'm not binding to the "error" event here since there is no
// reason that this stream should raise any error.
var bufferReadStream = new BufferReadStream( content );
return( bufferReadStream.pipe( response ) );
}
// If we've made it this far, we couldn't reply with a 304 Not Modified
// and we couldn't stream the file from memory. As such, we'll have to
// get the file from the disk and stream it into the response.
var contentStream = fileSystem.createReadStream( resolution.scriptName )
.on(
"open",
function handleContentReadStreamOpen() {
contentStream.pipe( response );
}
)
.on(
"error",
function handleContentReadStreamError( error ) {
// NOTE: If an error occurs on the read-stream, it will take
// care of destroying itself. As such, we only have to worry
// about cleaning up the possible down-stream connections
// that have been established.
try {
response.setHeader( "Content-Length", 0 );
response.setHeader( "Cache-Control", "max-age=0" );
response.writeHead( 500, "Server Error" );
} catch ( headerError ) {
// We can't set a header once the headers have already
// been sent - catch failed attempt to overwrite the
// response code.
} finally {
response.end( "500 Server Error" );
}
}
)
;
// If we didn't have a cached ETag for this script, then ALSO pipe the
// file content into an ETag stream so we can accumulate the ETag while
// we stream the file to the user.
if ( ! etag ) {
// The ETagStream is a writable stream that emits an "etag" event
// once the content pipe closes the stream.
// --
// NOTE: I am not binding any error event since there is no reason
// that the ETag stream should emit an error.
var etagStream = new ETagStream()
.on(
"etag",
function handleETag( etag ) {
// When we cache the ETag, cache it with both the script
// name and the date the file was modified. This way,
// when / if the file is modified during the lifetime of
// the app, the ETag will naturally be expired and replaced.
server._contentCache.putETag( resolution.scriptName, etag, resolution.stat.mtime );
}
)
;
// Now that we're dealing with a read-stream that may error (and
// cause the etagStream to be unpiped), we have to catch that error
// event and use it to destroy the etagStream.
contentStream
.on(
"error",
function handleContentStreamError( error ) {
etagStream.destroy();
}
)
.pipe( etagStream )
;
}
// If we made it this far, we couldn't serve the file from memory. As
// such, we may need to cache the file in memory for subsequent use.
// However, we only want to do this if caching is enabled and the given
// file is smaller than the maxCacheSize.
if ( server._config.maxCacheSize && ( resolution.stat.size <= server._config.maxCacheSize ) ) {
// The BufferWriteStream is a writable stream that emits a "buffer"
// event once the content pipe closes the stream.
// --
// NOTE: Not binding any error event since I there is no reason that
// the write-stream will emit an error.
var bufferWriteStream = new BufferWriteStream()
.on(
"buffer",
function handleBuffer( content ) {
server._contentCache.putContent( resolution.scriptName, content, resolution.stat.mtime );
}
)
;
// Now that we're dealing with a read-stream that may error (and
// cause the bufferWriteStream to be unpiped), we have to catch that
// error event and use it to destroy the bufferWriteStream.
contentStream
.on(
"error",
function handleContentStreamError( error ) {
bufferWriteStream.destroy();
}
)
.pipe( bufferWriteStream )
;
}
},
// If the file / directory couldn't be found, return a 404.
function handleScriptNameReject( error ) {
response.writeHead( 404, "Not Found" );
response.end( "404 File Not Found" );
}
);
},
// ---
// PRIVATE METHODS.
// ---
// I normalize the path and the resolve it against the document root, returning
// the full scriptName for the requested file.
_resolvePath: function( path ) {
// Unescape the url-encoded characters (must explicitly replace spaces as those
// are not decoded automatically).
path = decodeURIComponent( path.replace( /\+/g, " " ) );
// Normalize the slashes.
path = path.replace( /\\/g, "/" );
// If a magic pattern was provided, remove it before normalizing. This will
// allow things like build-numbers to be pulled out of the paths before they
// are mapped onto file name. Example:
// --
// ./assets/build-123/header.png --> ./assets/header.png
// --
// Notice that "build-123" is replaced out of the path before the script name
// is resolved against the document-root.
if ( this._config.magicPattern ) {
path = path.replace( this._config.magicPattern, "" );
}
// Strip out double-slashes.
path = path.replace( /[/]{2,}/g, "/" );
// Strip out any leading or trailing slashes.
path = path.replace( /^[/]|[/]$/g, "" );
// Strip out any path traversal entities.
path = path.replace( /\.\.\//g, "/" );
// Resolve this against the configured document root.
return( url.resolve( this._config.documentRoot, path ) );
},
// I resolve the script name against what actually exists on the file system. Since
// this action will attempt to negotiate directories and default documents, the result
// is an object that contains both the script name and the stat object.
_resolveScriptName: function( scriptName ) {
// In the event that we stat a directory, this will be the path to the default
// document in that directory.
// --
// NOTE: Even though we normalized the path, we still need to check for a trailing
// slash in the event that the root directory was requested.
var defaultScriptName = ( scriptName.slice( -1 ) === "/" )
? ( scriptName + this._config.defaultDocument )
: ( scriptName + "/" + this._config.defaultDocument )
;
// But, start off trying to stat the file.
var promise = fileSystemStat( scriptName ).then(
function handleFileResolve( stat ) {
// If the script name is a file, we are done.
if ( stat.isFile() ) {
return({
scriptName: scriptName,
stat: stat
});
}
// The script name was a directory, try to state the default document
// within that directory.
var directoryPromise = fileSystemStat( defaultScriptName ).then(
function handleDirectoryResolve( stat ) {
return({
scriptName: defaultScriptName,
stat: stat
});
}
);
return( directoryPromise );
}
);
return( promise );
}
};
Even after all of this, I still very shaky about error handling. The documentation on errors and error handing is very scattered. It's also not quite evident if a given stream can emit an error event. For example, can the HTTP response stream ever emit an error? I couldn't find that anywhere in the documentation (and digging through the Node.js source code isn't the easiest task). That said, I think this experiment forced me to think deeply about a whole bunch of Node.js fundamentals and, for that, I'm pretty excited!
Want to use code from this post? Check out the license.
Reader Comments
Great post .. and thanks for sharing. It was time to upgrade my static content module, so went in search of good ways to add depth to its capabilities, and came across your solution.
I noticed that the _resolvePath function URI encodes the return value through use of url.resolve, so it will not support spaces in the __dirname or request.url (i.e. the stat of files with the %20 in the path will fail).
Resolved by wrapping the url.resolve in a decodeURI and all good (line 320).
return( decodeURI( url.resolve( this._config.documentRoot, path ) ) );
Just sharing for others who may get 404's unexpectedly!
If i want to change cache time based on file's mime type ? how can i do that?