Saturday 3 March 2012

NDataFlow - Open Source .NET Dataflow Library

Last year, a colleague of mine showed me an open source .NET extract, transform and load (ETL) helper library that he's working on. It is called NDataFlow and allows you to annotate methods with attributes to create a dataflow in your application. It's a nice lightweight library that you can use whenever you are developing simple or complex ETL programs. The example below simulates a very simple ETL scenario where a set of people (hard-coded in the example) are filtered based on their location and then output to a file.

class Program : DataflowComponent
{
  static void Main(string[] args)
  {
    new Program().Run();
  }

  //First method in the flow
  [Operation]
  public void DataSource(Output output)
  {
    //Imagine retrieving people from a real data source
    //e.g. database, xml file, csv file, etc.
    output.Emit(new Person() { Name = "Alice", City = "London" });
    output.Emit(new Person() { Name = "Bob", City = "New York" });
    output.Emit(new Person() { Name = "Foo", City = "London" });
    output.Emit(new Person() { Name = "Bar", City = "Sydney" });
  }

  [Operation]
  public IEnumerable FilterForLondonPeople
    ([Link("DataSource")] IEnumerable input)
  {
    return input.Where
      (p => p.City.Equals("London", 
        StringComparison.InvariantCultureIgnoreCase));
  }

  [Operation]
  public void OutputResults
    ([Link("FilterForLondonPeople")] IEnumerable results)
  {
    using (var sw = new StreamWriter(@"C:\LondonPeople.txt", false)
    {
      foreach (var p in results)
        sw.WriteLine("{0} {1}", p.Name, p.City);
    }
  }
}
The example shows that there is little work needed to get a simple dataflow setup and running. You inherit the Run method by deriving from the NDataFlow.DataflowComponent class. Then, if you've setup your method and attributes correctly using the LinkAttribute it's a simple case of calling Run to start your dataflow. In this case, the first method in the dataflow would be DataSource, whose output is sent to FilterForLondonPeople and finally whose output is sent to the OutputResults method.