New Grove icon Stuff by Peter Stone

Home
(New Grove)

Home Stuff

Get Inventory Script (to file)

Get Inventory Script (to DB)

Get Inventory Search Script

Combine Data Files Console Application

Unix Date Time Convertion

TimeKeeper

Clicker

Central Administrative Host Updating

Randomly Select Data From a List

VMware Windows Services Modifier

Federated Search

Insert Image Metadata Script

PHP Search Engine

Detect High Memory Usage

Randomly Select Rows of Data from a List

I was asked recently to provide a script that could select rows of data from a list randomly. The process needed to be able to select a row of data for a number of times, determined by the user. In this way it would be possible for instance to randomly select 40 names out of a list of several hundred... The original use of this script was to be able to randomly select a sample portion of usernames from a database for emailing in a survey...

My solution was to write this Powershell script.

The script expects to find a text file called "Possible.txt" in the same directory as the script. It is this text file that contains the lines of data from which the script will select at random. The script is run with one argument, that is the number of lines to be randomly selected. Once run, the script creates a second file called "Selection.txt" which will contain the selected lines of data in the order they were selected. (There are 3 lines of code at the end of the script that can be "uncommented" to effect a sort of the output if desired.)

The script assigns the lines of data from the input file into a hashtable using numbers for the index values. By using powershell's random number generator to select a number in the range of the index values in the hashtable, the script extracts the value that is indexed by the random number. The process is repeated until all the required lines have been selected.

Used with large lists of data this process can take some time to run so I developed a few improvements to track progress and optimise performance (details below). Some examples of time to run are: To randomly extract 20% of the lines (that is 634 lines) in a list of 3170 lines, took just 45 seconds; In a larger sample list, the script was able to extract 2213 lines from a list of 11068 lines, in 13 minutes 16 seconds. Speed here is of course somewhat relative as the amount of data on the line can impact the time to write it to disk and the power of the workstation will also play. I used a P4 3.2GHz with 1GB of RAM to run these tests. Indeed others my do much better...

Improvements
27/07/2007 Inserted a marker to display progress on the console. The display is of the total number of lines processed and a time stamp together with a progress value expressed as a percentage.
28/07/2007 Adjusted processing to use hashtables instead of writing to a temporary file - decreasing the processing time significantly.
30/07/2007 Adjusted processing to use a single hashtable - decreasing the processing time further from what I quote above!

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.