Creating a Lazy Sequence of Directory Descendants in C#

My dear friend Craig Andera posted an implementation of a function that descends into a directory in a "lazy" manner, i.e. you get the first descendant back right away and not after all descendants have been calculated. His implementation was in Clojure, a Lisp variant that runs on the Java VM:

(import [java.io File])

(defn dir-descendants [dir]
  (let [children (.listFiles (File. dir))]
    (lazy-cat
     (map (memfn getPath) (filter (memfn isFile) children))
     (mapcat dir-descendants
      (map (memfn getPath) (filter (memfn isDirectory) children))))))

Craig was happy with this function because, even though it was a "mind-bender to write," it's easy to read and because C# "would almost certainly be at least somewhat more verbose."

Taking this as a challenge, I rewrote this program in C#, maintaining the laziness:

using System.IO;
using System.Collections.Generic;

namespace ConsoleApplication1 {
    class Program {
        static IEnumerable<string> GetDirectoryDecendants(string path) {
            foreach (var file in Directory.GetFiles(path)) { yield return file; }
            foreach (var directory in Directory.GetDirectories(path)) {
                foreach (var file in GetDirectoryDecendants(directory)) { yield return file; }
            }
        }
    }
}

The key hit here is the use of "yield return" which lets me return elements of an IEnumerable as I calculate them. This took me 5 minutes to write, my mind is straight and I find it fairly easy to read. I'll let you decide on the relative verbosity of each implementation.



Comment Feed 18 comments on this post

Michael McMullen:


This was just begging for an F# solution...

open System.IO;

let rec GetDirectoryDecendants path =
    seq {
        yield! Directory.GetFiles(path)
        for p in Directory.GetDirectories(path) do
            yield! GetDirectoryDecendants p
    }

Tuesday, Mar 9, 2010, 3:21 PM


Max C:


The C# version will take forever to return the first result, or crash with an out of memory condition, if any one directory contains a lot of files. Don't know about the other one. http://s3.codeplex.com has a C# implementation that works in the general case.

Tuesday, Mar 9, 2010, 5:08 PM


Justin Van Patten:


With .NET 4 you can simply call:

Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories);

This will lazily enumerate all files, recursively. The only hiccup you might run into is that like GetFiles, EnumerateFiles will throw an exception when it comes across a path the caller doesn't have access to, which terminates the enumeration. We're considering adding a "continue on error" option to a future release to address this.

To learn more, check out my MSDN Magazine CLR Inside Out column at http://msdn.microsoft.com/en-us/magazine/ee428166.aspx

Tuesday, Mar 9, 2010, 7:59 PM


Aaron N.:


It's still getting all the directories and files in a directory, before returning data, because you're not using the APIs in .NET 4 that truly return as items are found.

Tuesday, Mar 9, 2010, 8:07 PM


Seth:


Epic Win!

Wednesday, Mar 10, 2010, 2:01 AM


Thomas:


what about exception handling?

need to catch UnauthorizedAccessException,
but yield is not allowed in a try block..
=> not working "real world" code..
:(

Wednesday, Mar 10, 2010, 4:33 AM


Christof Jans:


In .net 4.0 you can make it even more lazy by using Directory.EnumerateFiles

Wednesday, Mar 10, 2010, 5:18 AM


Chris Hollander:


I think this deserves an "OH SNAP!!"

Wednesday, Mar 10, 2010, 5:22 AM


David Hayes:


Nice, I prefer the C# version. Of course it's only 1 line in Powershell

Get-ChildItem -recurse

or for the old schoolers

dir -recurse

Wednesday, Mar 10, 2010, 6:37 AM


Jeremy Gray:


Here's another in C# for the pile:

return Directory.GetFiles(path)
    .Concat(Directory.GetDirectories(path)
                .SelectMany(directory => GetDirectoryDescendants(directory)));

(Though I'm uncertain how the indentation will look once posted, and had to trim the surrounding in order to not receive an error from your blog.)

Wednesday, Mar 10, 2010, 9:43 AM


Judah Himango:


Wouldn't System.IO.Directory.EnumerateDirectories do the trick?

http://msdn.microsoft.com/en-us/library/dd383304(VS.100).aspx

Wednesday, Mar 10, 2010, 11:11 AM


Dave Thomas:


1) GetFiles supports a SearchOption of AllDirectories which will recurse the entire file system. This will eliminate the loop over the directories and the recursive call.

2) .Net 4 EnumerateFiles will smooth things out a bit. From MSDN docs.

"The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient." © 2010 Microsoft Corporation. All rights reserved.

The body is just one line:

return Directory.EnumerateFiles(path,"*", SearchOption.AllDirectories);

3) Alas, in practice, you will have to handle exceptions (e.g. UnauthorizedAccessException) which is a messier problem.

Thursday, Mar 11, 2010, 11:09 AM


Gudge:


Craig has a lazy-cat in his code. Therefore he is the winner!

Wednesday, Mar 17, 2010, 10:01 PM


Mike Bouck:


I always hated the parens in lisp and and still do!

Monday, Mar 22, 2010, 10:16 AM


Anders:


I guess the LINQ way would also do the trick? Something like:

GetDirectoryDecendants(string path) {
   return Directory.GetFiles(path).Concat(Directory.GetDirectories(path).SelectMany(GetDirectoryDecendants));
}

Sunday, Mar 28, 2010, 12:31 PM


Anders:


So how would you write a true breadth-first version, i.e. a version returning first all depth 1 files, then all depth 2 files, ..., still with the constraint that no directory at level i may be descended into/listed, before all files at level i have been returned?

The current implementation will return all level 1 files first, but then return all files belonging to the first directory and its subfolders (at all levels) then the all files of the next level-1 directory, etc.

Sunday, Mar 28, 2010, 12:41 PM


Stuart Halloway:


It seems that most of the approaches being discussed here "forget" the most important fact of the problem: that directory structures are trees! Clojure's core function file-seq keeps this distinction clear:

(defn file-seq
  "A tree seq on java.io.Files"
  [dir]
    (tree-seq
     (fn [#^java.io.File f] (. f (isDirectory)))
     (fn [#^java.io.File d] (seq (. d (listFiles))))
     dir))

Tree-seq happens to be depth-first, but it would be easy to switch to breadth- first, because the structure of the data is cleanly separated from the traversal strategy.

Sunday, Apr 4, 2010, 8:23 PM


This helper will save you're stack from recursive method calls:



public static IEnumerable<TSource> SelectRecursive<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, IEnumerable<TSource>> recursiveSelector) {

  Stack<IEnumerator<TSource>> stack = new Stack<IEnumerator<TSource>>();
  stack.Push(source.GetEnumerator());

  try {
    while (stack.Count > 0) {
      if (stack.Peek().MoveNext()) {
        TSource current = stack.Peek().Current;

        yield return current;

        stack.Push(recursiveSelector(current).GetEnumerator());
      }
      else {
        stack.Pop().Dispose();
      }
    }
  }
  finally {
    while (stack.Count > 0) {
      stack.Pop().Dispose();
    }
  }

}

Thursday, May 20, 2010, 4:28 PM





comment on this post

HTML tags will be escaped.

Powered By ASP.NET

Hosted by SecureWebs

Mensa

IEEE