Thursday, April 17, 2014

Using RMarkdown in an Analysis Archive and Retrieval System

Hopefully, you've already heard of RMarkdown. If not, you'll understand it pretty quickly by looking at this simple example:

http://rpubs.com/medined/replacing_part_of_time_series_using_time_based_selection

Essentially, you mix R code with Markdown markup to create a 'living' document. The R code is executed when the page is displayed. The full power of R (and all of its extensions) can be used. There are many examples of this online.

RMarkdown pages can be computer-generated. Imagine if any given analytic documented the intermediate steps from Data Load to Final Visualization using RMarkdown? I bet user confidence in the final product would increase. It would also be trivial for the document to be duplicated and tweaked (draft mode) before being republished. Since RMarkdown is text-based, you could provide human-readable diff reports between analyses. Another advantage of this text-based system would be full-text search across all analyses.

I could also point out the value in being able to produce an analytic report without needing to know Java, Python, or another programming language. Just knowing the math is complex enough.

Update: http://studio.sketchpad.cc/sp/pad/view/ro.9QNw0rsxwki4J/rev.480 - The archive timeline widget allows visitors to view all versions of the source document.

Update: You can do this same kind of thing with Python code. Check out http://ipython.org/notebook.html.

I'm not saying R is the answer to all problems. But this idea of archivable, diffable analytic solutions was interesting.

Wednesday, April 16, 2014

Example of Replacing The Middle of a Time Series in R

Exploration of replacing part of a Time Series

install.packages('zoo')
library('zoo')
timeSeries <- ts(1:96, freq=12, start=2001); timeSeries

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001   1   2   3   4   5   6   7   8   9  10  11  12
2002  13  14  15  16  17  18  19  20  21  22  23  24
2003  25  26  27  28  29  30  31  32  33  34  35  36
2004  37  38  39  40  41  42  43  44  45  46  47  48
2005  49  50  51  52  53  54  55  56  57  58  59  60
2006  61  62  63  64  65  66  67  68  69  70  71  72
2007  73  74  75  76  77  78  79  80  81  82  83  84
2008  85  86  87  88  89  90  91  92  93  94  95  96

#
# If you already know the indexes of the elements to 
# replace, just do it:
#
timeSeries[13:36] <- NA
timeSeries

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001   1   2   3   4   5   6   7   8   9  10  11  12
2002  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
2003  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
2004  37  38  39  40  41  42  43  44  45  46  47  48
2005  49  50  51  52  53  54  55  56  57  58  59  60
2006  61  62  63  64  65  66  67  68  69  70  71  72
2007  73  74  75  76  77  78  79  80  81  82  83  84
2008  85  86  87  88  89  90  91  92  93  94  95  96

#
# However, sometimes you might want to refer to the
# Time Series part by date. Below is one way to do
# that.
#

#
# Reset the Time Series and then look at just two years, 2002 and 2003
#
timeSeries <- ts(1:96, freq=12, start=2001);
window(timeSeries, start=c(2002,1), end=c(2003,12))

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2002  13  14  15  16  17  18  19  20  21  22  23  24
2003  25  26  27  28  29  30  31  32  33  34  35  36

#
# Copy this part of the timeSeries for safety.
#
original = window(timeSeries, start=c(2002,1), end=c(2003,12))

#
# Change 2002 and 2003 to NA because a lawsuit is pending.
# 
window(timeSeries, start=c(2002,1), end=c(2003,12)) <- NA
timeSeries

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001   1   2   3   4   5   6   7   8   9  10  11  12
2002  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
2003  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
2004  37  38  39  40  41  42  43  44  45  46  47  48
2005  49  50  51  52  53  54  55  56  57  58  59  60
2006  61  62  63  64  65  66  67  68  69  70  71  72
2007  73  74  75  76  77  78  79  80  81  82  83  84
2008  85  86  87  88  89  90  91  92  93  94  95  96

#
# The lawsuit is over and you won. Retrieve the data.
#

window(timeSeries, start=c(2002,1), end=c(2003,12)) <- original

> timeSeries
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001   1   2   3   4   5   6   7   8   9  10  11  12
2002  13  14  15  16  17  18  19  20  21  22  23  24
2003  25  26  27  28  29  30  31  32  33  34  35  36
2004  37  38  39  40  41  42  43  44  45  46  47  48
2005  49  50  51  52  53  54  55  56  57  58  59  60
2006  61  62  63  64  65  66  67  68  69  70  71  72
2007  73  74  75  76  77  78  79  80  81  82  83  84
2008  85  86  87  88  89  90  91  92  93  94  95  96

Friday, April 04, 2014

Resolving Bad Request In Spring MVC Controller

I hardly believe it's 2014 and I've run into an issue that a web search didn't quickly solve.

@RequestMapping(value = "/district/{name}", method = RequestMethod.GET)
public String communityHandler(Model model, @PathVariable String districtName) ... {

Notice the mismatch between {name} and the PathVariable? Yeah, it took me too long to spot the difference.

Sunday, February 23, 2014

Writing a nodejs script file

This is a short post about a simple topic but it's one that I hadn't thought about so sharing seems good. I like computer languages that run from back-end processing to front-end web site development. For example, Ruby handles the both environments, but Java doesn't. Today I realized that nodejs can also handle both. And it's simple. A nodejs script looks like this:

1. Create a file called helloWorld with the following contents:

#!/usr/bin/env node
console.log("Hello World.");

2. chmod +x helloWorld

3. ./helloWorld

Of course, you have the full power of javascript and nodejs to play with.

Have fun!

Monday, December 02, 2013

Watching Accumulo Recover From a Killed Master Process In a Multi-Master Configuration.

Accumulo can easily run in a multiple-master configuration. This post shows how to watch it recover when a master process is killed.

The steps below show how to convert from a single-master cluster to a two-master cluster. Then you'll kill the active master and watch the monitor page to see Accumulo automatically switch to the backup master.
  1. Start a cluster with a master and two nodes using https://github.com/medined/Accumulo_1_5_0_By_Vagrant.
  2. vagrant ssh master
  3. cd accumulo_home/bin/accumulo 
  4. bin/stop-all.sh
  5. echo "affy-slave1" >> conf/masters
  6. bin/start-all.sh
  7. Visit http://affy-master:50095/master to see which node is the current master. Note that you are connecting to the monitor process not the master process. Don't let the hostnames confuse you.
  8. Enable auto-refresh.
  9. SSH to whichever node is listed as the master.
  10. ps fax | grep app=master | grep -v grep | cut -d' ' -f1 | xargs kill -9
  11. Visit http://affy-master:50095/master and you should see a 'Master Server Not Running' message. Reload the page if needed.
  12. Within a few seconds, the alternate master process should be active.
Normally you'd copy the conf/masters to all nodes. However for this tiny demonstration it is not needed.

Restarting the killed master process is easy. Following the steps below:
  1. vagrant ssh master
  2. cd accumulo_home/bin/accumulo 
  3. bin/start-all.sh