Tuesday, Mar 9, 2010, 1:47 PM in The Spout
Creating a Lazy Sequence of Directory Descendants in C#
My dear friend Craig Andera posted an implementation of a function that descends into a directory in a "lazy" manner, i.e. you get the first descendant back right away and not after all descendants have been calculated. His implementation was in Clojure, a Lisp variant that runs on the Java VM:
(import [java.io File])
(defn dir-descendants [dir]
(let [children (.listFiles (File. dir))]
(lazy-cat
(map (memfn getPath) (filter (memfn isFile) children))
(mapcat dir-descendants
(map (memfn getPath) (filter (memfn isDirectory) children))))))
Craig was happy with this function because, even though it was a "mind-bender to write," it's easy to read and because C# "would almost certainly be at least somewhat more verbose."
Taking this as a challenge, I rewrote this program in C#, maintaining the laziness:
using System.IO;
using System.Collections.Generic;
namespace ConsoleApplication1 {
class Program {
static IEnumerable<string> GetDirectoryDecendants(string path) {
foreach (var file in Directory.GetFiles(path)) { yield return file; }
foreach (var directory in Directory.GetDirectories(path)) {
foreach (var file in GetDirectoryDecendants(directory)) { yield return file; }
}
}
}
}
The key hit here is the use of "yield return" which lets me return elements of an IEnumerable as I calculate them. This took me 5 minutes to write, my mind is straight and I find it fairly easy to read. I'll let you decide on the relative verbosity of each implementation.
18 comments
on this post
Michael McMullen:
open System.IO;
let rec GetDirectoryDecendants path =
seq {
yield! Directory.GetFiles(path)
for p in Directory.GetDirectories(path) do
yield! GetDirectoryDecendants p
}
Tuesday, Mar 9, 2010, 3:21 PM
Max C:
Tuesday, Mar 9, 2010, 5:08 PM
Justin Van Patten:
Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories);
This will lazily enumerate all files, recursively. The only hiccup you might run into is that like GetFiles, EnumerateFiles will throw an exception when it comes across a path the caller doesn't have access to, which terminates the enumeration. We're considering adding a "continue on error" option to a future release to address this.
To learn more, check out my MSDN Magazine CLR Inside Out column at http://msdn.microsoft.com/en-us/magazine/ee428166.aspx
Tuesday, Mar 9, 2010, 7:59 PM
Aaron N.:
Tuesday, Mar 9, 2010, 8:07 PM
Seth:
Wednesday, Mar 10, 2010, 2:01 AM
Thomas:
need to catch UnauthorizedAccessException,
but yield is not allowed in a try block..
=> not working "real world" code..
:(
Wednesday, Mar 10, 2010, 4:33 AM
Christof Jans:
Wednesday, Mar 10, 2010, 5:18 AM
Chris Hollander:
Wednesday, Mar 10, 2010, 5:22 AM
David Hayes:
Get-ChildItem -recurse
or for the old schoolers
dir -recurse
Wednesday, Mar 10, 2010, 6:37 AM
Jeremy Gray:
return Directory.GetFiles(path)
.Concat(Directory.GetDirectories(path)
.SelectMany(directory => GetDirectoryDescendants(directory)));
(Though I'm uncertain how the indentation will look once posted, and had to trim the surrounding in order to not receive an error from your blog.)
Wednesday, Mar 10, 2010, 9:43 AM
Judah Himango:
http://msdn.microsoft.com/en-us/library/dd383304(VS.100).aspx
Wednesday, Mar 10, 2010, 11:11 AM
Dave Thomas:
2) .Net 4 EnumerateFiles will smooth things out a bit. From MSDN docs.
"The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient." © 2010 Microsoft Corporation. All rights reserved.
The body is just one line:
return Directory.EnumerateFiles(path,"*", SearchOption.AllDirectories);
3) Alas, in practice, you will have to handle exceptions (e.g. UnauthorizedAccessException) which is a messier problem.
Thursday, Mar 11, 2010, 11:09 AM
Gudge:
Wednesday, Mar 17, 2010, 10:01 PM
Mike Bouck:
Monday, Mar 22, 2010, 10:16 AM
Anders:
GetDirectoryDecendants(string path) {
return Directory.GetFiles(path).Concat(Directory.GetDirectories(path).SelectMany(GetDirectoryDecendants));
}
Sunday, Mar 28, 2010, 12:31 PM
Anders:
The current implementation will return all level 1 files first, but then return all files belonging to the first directory and its subfolders (at all levels) then the all files of the next level-1 directory, etc.
Sunday, Mar 28, 2010, 12:41 PM
Stuart Halloway:
(defn file-seq
"A tree seq on java.io.Files"
[dir]
(tree-seq
(fn [#^java.io.File f] (. f (isDirectory)))
(fn [#^java.io.File d] (seq (. d (listFiles))))
dir))
Tree-seq happens to be depth-first, but it would be easy to switch to breadth- first, because the structure of the data is cleanly separated from the traversal strategy.
Sunday, Apr 4, 2010, 8:23 PM
This helper will save you're stack from recursive method calls:
public static IEnumerable<TSource> SelectRecursive<TSource>(
this IEnumerable<TSource> source,
Func<TSource, IEnumerable<TSource>> recursiveSelector) {
Stack<IEnumerator<TSource>> stack = new Stack<IEnumerator<TSource>>();
stack.Push(source.GetEnumerator());
try {
while (stack.Count > 0) {
if (stack.Peek().MoveNext()) {
TSource current = stack.Peek().Current;
yield return current;
stack.Push(recursiveSelector(current).GetEnumerator());
}
else {
stack.Pop().Dispose();
}
}
}
finally {
while (stack.Count > 0) {
stack.Pop().Dispose();
}
}
}
Thursday, May 20, 2010, 4:28 PM



