org.pentaho.di.trans.steps.reservoirsampling
Class ReservoirSampling

java.lang.Object
  extended by java.lang.Thread
      extended by org.pentaho.di.trans.step.BaseStep
          extended by org.pentaho.di.trans.steps.reservoirsampling.ReservoirSampling
All Implemented Interfaces:
Runnable, org.pentaho.di.core.variables.VariableSpace, StepInterface

public class ReservoirSampling
extends BaseStep
implements StepInterface


Nested Class Summary
 
Nested classes/interfaces inherited from class java.lang.Thread
Thread.State, Thread.UncaughtExceptionHandler
 
Field Summary
 
Fields inherited from class org.pentaho.di.trans.step.BaseStep
category_order, errorRowSet, first, init, inputRowSets, linesInput, linesOutput, linesRead, linesRejected, linesSkipped, linesUpdated, linesWritten, outputRowSets, paused, remoteInputSteps, remoteOutputSteps, statusDesc, steps, stopped, terminator, terminator_rows, thr, waiting
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Constructor Summary
ReservoirSampling(StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta, Trans trans)
          Creates a new ReservoirSampling instance.
 
Method Summary
 boolean init(StepMetaInterface smi, StepDataInterface sdi)
          Initialize the step.
 boolean processRow(StepMetaInterface smi, StepDataInterface sdi)
          Process an incoming row of data.
 void run()
          Run is where the action happens!
 
Methods inherited from class org.pentaho.di.trans.step.BaseStep
addResultFile, addRowListener, addStepListener, buildLog, cleanup, copyVariablesFrom, decrementLinesRead, decrementLinesWritten, dispatch, dispose, environmentSubstitute, environmentSubstitute, findInputRowSet, findInputRowSet, findOutputRowSet, findOutputRowSet, getBooleanValueOfVariable, getClusterSize, getCopy, getDispatcher, getErrorRowMeta, getErrors, getIconFilename, getInputRowMeta, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesSkipped, getLinesUpdated, getLinesWritten, getLogFields, getNextClassNr, getOutputRowSets, getParentVariableSpace, getPartitionID, getPartitionTargets, getPreviewRowMeta, getProcessed, getRepartitioning, getResultFiles, getRow, getRowFrom, getRowListeners, getRuntime, getServerSockets, getSlaveNr, getSocketRepository, getStatus, getStatusDescription, getStepDataInterface, getStepID, getStepInfo, getStepListeners, getStepMeta, getStepMetaInterface, getStepname, getThread, getTrans, getTransMeta, getTypeId, getUniqueStepCountAcrossSlaves, getUniqueStepNrAcrossSlaves, getVariable, getVariable, incrementLinesInput, incrementLinesOutput, incrementLinesRead, incrementLinesRejected, incrementLinesSkipped, incrementLinesUpdated, incrementLinesWritten, initBeforeStart, initializeVariablesFrom, injectVariables, isDistributed, isInitialising, isMapping, isPartitioned, isPaused, isSafeModeEnabled, isStopped, isUsingThreadPriorityManagment, listVariables, logBasic, logDebug, logDetailed, logError, logError, logMinimal, logRowlevel, logSummary, markStart, markStop, outputIsDone, pauseRunning, putError, putRow, putRowTo, removeRowListener, resumeRunning, rowsetInputSize, rowsetOutputSize, runStepThread, safeModeChecking, setCopy, setDistributed, setErrorRowMeta, setErrors, setInputRowMeta, setInputRowSets, setInternalVariables, setLinesInput, setLinesOutput, setLinesRead, setLinesRejected, setLinesSkipped, setLinesUpdated, setLinesWritten, setOutputDone, setOutputRowSets, setParentVariableSpace, setPartitioned, setPartitionID, setPartitionTargets, setPaused, setPaused, setPreviewRowMeta, setRepartitioning, setSafeModeEnabled, setServerSockets, setSocketRepository, setStepDataInterface, setStepListeners, setStepMeta, setStepMetaInterface, setStepname, setStopped, setStopped, setTransMeta, setUsingThreadPriorityManagment, setVariable, shareVariablesWith, stopAll, stopRunning, stopRunning, toString
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, yield
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.pentaho.di.trans.step.StepInterface
addRowListener, addStepListener, cleanup, dispose, getCopy, getErrors, getInputRowSets, getLinesInput, getLinesOutput, getLinesRead, getLinesRejected, getLinesUpdated, getLinesWritten, getOutputRowSets, getPartitionID, getRow, getRowListeners, getStepID, getStepMeta, getStepname, initBeforeStart, isAlive, isMapping, isPartitioned, isStopped, markStart, markStop, pauseRunning, putRow, removeRowListener, resumeRunning, setErrors, setLinesRejected, setOutputDone, setPartitionID, start, stopAll, stopRunning
 

Constructor Detail

ReservoirSampling

public ReservoirSampling(StepMeta stepMeta,
                         StepDataInterface stepDataInterface,
                         int copyNr,
                         TransMeta transMeta,
                         Trans trans)
Creates a new ReservoirSampling instance.

Implements the reservoir sampling algorithm "R" by Jeffrey Scott Vitter. (algorithm is implemented in ReservoirSamplingData.java

For more information see:

Vitter, J. S. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, Vol. 11, No. 1, March 1985. Pages 37-57.

Parameters:
stepMeta - holds the step's meta data
stepDataInterface - holds the step's temporary data
copyNr - the number assigned to the step
transMeta - meta data for the transformation
trans - a Trans value
Method Detail

processRow

public boolean processRow(StepMetaInterface smi,
                          StepDataInterface sdi)
                   throws org.pentaho.di.core.exception.KettleException
Process an incoming row of data.

Specified by:
processRow in interface StepInterface
Overrides:
processRow in class BaseStep
Parameters:
smi - a StepMetaInterface value
sdi - a StepDataInterface value
Returns:
a boolean value
Throws:
org.pentaho.di.core.exception.KettleException - if an error occurs

init

public boolean init(StepMetaInterface smi,
                    StepDataInterface sdi)
Initialize the step.

Specified by:
init in interface StepInterface
Overrides:
init in class BaseStep
Parameters:
smi - a StepMetaInterface value
sdi - a StepDataInterface value
Returns:
a boolean value

run

public void run()
Run is where the action happens!

Specified by:
run in interface Runnable
Specified by:
run in interface StepInterface
Overrides:
run in class Thread