Lazy Physics
code clojure physics
Thursday, February 12, 2015
… in which we explore lazy sequences and common functional idioms in Clojure via the example of looking for (nearly) coincident clusters of times in a series.
A fundamental technical problem in experimental particle physics is how to distinguish the signatures of particles from instrumental noise.
Imagine a tree full of hundreds of sparrows, each nesting on a branch, each chirping away occasionally. Suddenly, for a brief moment, they all start chirping vigorously (maybe a hawk flew past). A clustering of chirps in time is the signal that something has happened! The analogous situation occurs in instruments consisting of many similar detector elements, each generating some amount of random noise that, on its own, is indistinguishable from any evidence left by particles, but which, taken together, signals that, again, something has happened —a muon, an electron, a neutrino has left a sudden spume of electronic evidence in your instrument, waiting to be read out and distinguished from the endless noise.
This process of separating the noise from the signal is known in physics as triggering and is typically done through some combination of spatial or time clustering; in many cases, time is the simplest to handle and the first “line of defense” against being overrun by too much data. (It is often impractical to consume all the data generated by all the elements —data reduction is the name of the game at most stages of these experiments.)
This data is typically generated continously ad infinitum, and must therefore be processed differently than, say, a single file on disk. Such infinite sequences of data are an excellent fit for the functional pattern known as laziness, in which, rather than chewing up all your RAM and/or hard disk space, data is consumed and transformed only as needed / as available. This kind of processing is baked into Clojure at many levels and throughout its library of core functions, dozens of which can be combined (“composed”) to serve an endless variety of data transformations. (This style of data wrangling is also available in Python via generators and functional libraries such as Toolz.)
Prompted by a recent question on the topic from a physicist and former colleague, I got to thinking about the classic problem of triggering, and realized that the time series trigger provides a nice showcase for Clojure’s core library and for processing lazy sequences. The rest of this post will describe a simple trigger, essentially what particle astrophysicists I know call a “simple majority trigger”; or a “simple multiplicity trigger” (depending on whom you talk to).
Now for some Clojure code. (A small amount of familiarity with
Clojure’s simple syntax is recommended for maximum understanding of
what follows.) We will build up our understanding through a series of
successively more complex code snippets. The exposition follows
closely what one might do in the Clojure REPL, building up
successively more complete examples. In each case, we use take
to
limit what would otherwise be infinite sequences of data (so that our
examples can terminate without keeping us waiting forever…).
First we create a sorted, infinite series of ever-increasing times (in, say, nsec):
(def times (iterate #(+ % (rand-int 1000)) 0)) ;; Caution: infinite sequence... (take 30 times) ;;=> (0 955 1559 2063 2735 2858 3542 4067 4366 5246 5430 6168 7127 7932 8268 8929 9426 9918 10436 10850 11680 12367 12569 13343 14155 14420 15062 15171 15663 16355)
times
is an infinite (but “unrealized”) series, constructed by
iterating the anonymous function #(+ % (rand-int 1000))
which adds a
random integer from 0 to 999 to its argument (starting with zero). The
fact that it is infinite does not prevent us from defining it or
(gingerly) interrogating it via take1.
Now, the way we’ll look for excesses is to look for groupings of hits
(say, eight of them) whose first and last hit times are within 1
microsecond (1000 nsec) of each other. To start, there is a handy
function called partition
which groups a series in blocks of fixed
length:
(take 10 (partition 8 times)) ;;=> ((0 955 1559 2063 2735 2858 3542 4067) (4366 5246 5430 6168 7127 7932 8268 8929) (9426 9918 10436 10850 11680 12367 12569 13343) (14155 14420 15062 15171 15663 16355 16700 16947) (17919 17949 18575 18607 18849 19597 20410 20680) (20737 21289 21315 21323 21426 21637 22422 23000) (23477 24351 24426 25106 25861 26568 27511 28332) (29071 29831 29957 30761 31073 31914 32591 33187) (33878 34739 34842 35674 36444 36960 36983 37400) (37587 38012 38969 39131 39317 40135 40587 40759))
We’ll rewrite this using Clojure’s thread-last macro ->>
, which is a
very helpful tool for rewriting nested expressions as a more readable
pipeline of successive function applications:
(->> times (partition 8) (take 10)) ;;=> ((0 955 1559 2063 2735 2858 3542 4067) (4366 5246 5430 6168 7127 7932 8268 8929) ...same as above...)
However, this isn’t quite what we want, because it won’t find clusters
of times close together who don’t happen to begin on our partition
boundaries. To fix this, we use the optional step
argument to
partition
:
(->> times (partition 8 1) (take 10)) ;;=> ((0 955 1559 2063 2735 2858 3542 4067) (955 1559 2063 2735 2858 3542 4067 4366) (1559 2063 2735 2858 3542 4067 4366 5246) (2063 2735 2858 3542 4067 4366 5246 5430) (2735 2858 3542 4067 4366 5246 5430 6168) (2858 3542 4067 4366 5246 5430 6168 7127) (3542 4067 4366 5246 5430 6168 7127 7932) (4067 4366 5246 5430 6168 7127 7932 8268) (4366 5246 5430 6168 7127 7932 8268 8929) (5246 5430 6168 7127 7932 8268 8929 9426))
This is getting closer to what we want—if you look carefully, you’ll
see that each row consists of the previous one shifted by one
element. The next step is to grab (via map
) the first and last times
of each group, using juxt
to apply both first
and last
to each
subsequence….
(->> times (partition 8 1) (map (juxt last first)) (take 10)) ;;=> ([4067 0] [4366 955] [5246 1559] [5430 2063] [6168 2735] [7127 2858] [7932 3542] [8268 4067] [8929 4366] [9426 5246])
… and turn these into time differences:
(->> times (partition 8 1) (map (comp (partial apply -) (juxt last first))) (take 10)) ;;=> (4067 3411 3687 3367 3433 4269 4390 4201 4563 4180)
Note that so far these time differences are all > 1000. comp
, above,
turns a collection of multiple functions into a new function which is
the composition of these functions, applied successively one after the
other (right-to-left). partial
turns a function of multiple arguments
into a function of fewer arguments, by binding one or more of the
arguments in a new function. For example,
((partial + 2) 3) ;;=> 5 ((comp (partial apply -) (juxt last first)) [3 10]) ;;=> 7
Recall that we only want events whose times are close to each other;
say, whose duration is under a maximum limit of 1000 nsec. In general,
to select only the elements of a sequence which satisfy a filter
function, we use filter
:
(->> times (partition 8 1) (map (comp (partial apply -) (juxt last first))) (filter (partial > 1000)) (take 10)) ;;=> (960 942 827 763 597 682 997 836 986 966)
((partial > 1000)
is a function of one argument which returns true if
that argument is strictly less than 1000.)
Great! We now have total “durations”; for subsequences of 8 times, where the total durations are less than 1000 nsec.
But this is not actually that helpful. It would be better if we could get both the total durations and the actual subsequences satisfying the requirement (the analog of this in a real physics experiment would be returning the actual hit data falling inside the trigger window).
To do this, juxt
once again comes to the rescue, by allowing us to
juxt
-apose the original data alongside the total duration to show both
together….
(->> times (partition 8 1) (map (juxt identity (comp (partial apply -) (juxt last first)))) (take 10)) ;;=> ([(0 309 410 562 979 1423 2180 3159) 3159] [(309 410 562 979 1423 2180 3159 3585) 3276] [(410 562 979 1423 2180 3159 3585 4325) 3915] [(562 979 1423 2180 3159 3585 4325 4573) 4011] [(979 1423 2180 3159 3585 4325 4573 5074) 4095] [(1423 2180 3159 3585 4325 4573 5074 5942) 4519] [(2180 3159 3585 4325 4573 5074 5942 6599) 4419] [(3159 3585 4325 4573 5074 5942 6599 7458) 4299] [(3585 4325 4573 5074 5942 6599 7458 8128) 4543] [(4325 4573 5074 5942 6599 7458 8128 8439) 4114])
… and adapt our filter slightly to apply our filter only to the time rather than the original data:
(->> times (partition 8 1) (map (juxt identity (comp (partial apply -) (juxt last first)))) (filter (comp (partial > 1000) second)) (take 3)) ;;=> ([(1577315 1577322 1577514 1577570 1577793 1577817 1577870 1578151) 836] [(3119967 3120203 3120416 3120469 3120471 3120620 3120715 3120937) 970] [(6752453 6752483 6752522 6752918 6752966 6753008 6753026 6753262) 809])
Finally, to turn this into a function for later use, use defn
and
remove take
:
(defn smt-8 [times] (->> times (partition 8 1) (map (juxt identity (comp (partial apply -) (juxt last first)))) (filter (comp (partial > 1000) second))))
smt-8
consumes one, potentially infinite sequence and outputs another,
“smaller” (but also potentially infinite) lazy sequence of
time-clusters-plus-durations, in the form shown above.
Some contemplation will suggest many variants; for example, one in which some number of hits outside the trigger “window” are also included in the output. This is left as an exercise for the advanced reader.
A “real” physics trigger would have to deal with many other details:
each hit, in addition to its time, would likely have an amplitude, a
sensor ID, and other data associated with it. Also, the data may not
be perfectly sorted, some sensors may drop out of the data stream,
etc. But in some sense this prototypical time clustering algorithm is
one of the fundamental building blocks of experimental high energy
physics and astrophysics and was used (in some variant) in every
experiment I worked on over a 25+ year period. The representation
above is certainly one of the most succinct, and shows off the power
and elegance of the language, its core library, and lazy
sequences. (It is also reasonably fast for such a simple algorithm;
smt-8
consumes input times at a rate of about 250 kHz. This is not,
however, fast enough for an instrument like IceCube, whose 5160
sensors each count at a rate of roughly 300 Hz, for a total rate of
1.5 MHz. A future post may look at ways to get better performance.)
Footnotes:
To model a Poisson process — one in which any given event time is independent of the future or past times — one would normally choose an exponential rather than a uniformly flat distribution of time differences, but this is not important for our discussion, so, in the interest of simplicity, we’ll go with what we have here.
Blog Posts (170)
Select from below, view all posts, or choose only posts for:art clojure code emacs lisp misc orgmode physics python ruby sketchup southpole writing
From Elegance to Speed code lisp clojure physics Wednesday, September 25, 2019
Common Lisp How-Tos code lisp Sunday, September 15, 2019
Implementing Scheme in Python code python lisp Friday, September 13, 2019
A Daily Journal in Org Mode writing code emacs Monday, September 2, 2019
Show at Northwestern University Prosthetics-Orthotics Center art Saturday, October 20, 2018
Color Notations art Thursday, September 20, 2018
Painting and Time art Tuesday, May 29, 2018
Learning Muscular Anatomy code clojure art emacs orgmode Thursday, February 22, 2018
Reflections on a Year of Daily Memory Drawings art Monday, September 18, 2017
Repainting art Sunday, May 14, 2017
Daily Memory Drawings art Tuesday, January 3, 2017
Questions to Ask art Thursday, February 25, 2016
Macro-writing Macros code clojure Wednesday, November 25, 2015
Time Limits art Saturday, April 25, 2015
Fun with Instaparse code clojure Tuesday, November 12, 2013
Nucleotide Repetition Lengths code clojure Sunday, November 3, 2013
Updating the Genome Decoder code clojure Saturday, July 13, 2013
Getting Our Hands Dirty (with the Human Genome) code clojure Wednesday, July 10, 2013
Validating the Genome Decoder code clojure Sunday, July 7, 2013
A Two Bit Decoder code clojure Saturday, July 6, 2013
Exploratory Genomics with Clojure code clojure Friday, July 5, 2013
Rosalind Problems in Clojure code clojure Sunday, June 9, 2013
Introduction to Context Managers in Python code python Saturday, April 20, 2013
Processes vs. Threads for Integration Testing code python Friday, April 19, 2013
Resources for Learning Clojure code clojure Monday, May 21, 2012
Continuous Testing in Python, Clojure and Blub code clojure python Saturday, March 31, 2012
Programming Languages code clojure python Thursday, December 22, 2011
Milvans and Container Malls southpole Friday, December 2, 2011
Oxygen southpole Tuesday, November 29, 2011
Ghost southpole Monday, November 28, 2011
Turkey, Stuffing, Eclipse southpole Sunday, November 27, 2011
Wind Storm and Moon Dust southpole Friday, November 25, 2011
Shower Instructions southpole Wednesday, November 23, 2011
Fresh Air and Bananas southpole Sunday, November 20, 2011
Traveller and the Human Chain southpole Thursday, November 17, 2011
Reveille southpole Monday, November 14, 2011
Drifts southpole Sunday, November 13, 2011
Bon Voyage southpole Friday, November 11, 2011
A Nicer Guy? southpole Friday, November 11, 2011
The Quiet Earth southpole Monday, November 7, 2011
Ten southpole Tuesday, November 1, 2011
The Wheel art Wednesday, October 19, 2011
Plein Air art Saturday, August 27, 2011
ISO50 southpole art Thursday, July 28, 2011
SketchUp and MakeHuman sketchup art Monday, June 27, 2011
In Defense of Hobbies misc code art Sunday, May 29, 2011
Closure southpole Tuesday, January 25, 2011
Takeoff southpole Sunday, January 23, 2011
Mummification southpole Sunday, January 23, 2011
Eleventh Hour southpole Saturday, January 22, 2011
Diamond southpole Thursday, January 20, 2011
Baby, It's Cold Outside southpole Wednesday, January 19, 2011
Fruition southpole Tuesday, January 18, 2011
Friendly Radiation southpole Tuesday, January 18, 2011
A Place That Wants You Dead southpole Monday, January 17, 2011
Marathon southpole Sunday, January 16, 2011
Deep Fried Macaroni and Cheese Balls southpole Saturday, January 15, 2011
Retrograde southpole Thursday, January 13, 2011
Three southpole Wednesday, January 12, 2011
Transitions southpole Tuesday, January 11, 2011
The Future southpole Monday, January 10, 2011
Sunday southpole Sunday, January 9, 2011
Radio Waves southpole Saturday, January 8, 2011
Settling In southpole Thursday, January 6, 2011
Revolution Number Nine southpole Thursday, January 6, 2011
Red Eye to McMurdo southpole Wednesday, January 5, 2011
That's the Way southpole Tuesday, January 4, 2011
Faults in Ice and Rock southpole Monday, January 3, 2011
Bardo southpole Monday, January 3, 2011
Chasing the Sun southpole Sunday, January 2, 2011
Downhole southpole Monday, December 13, 2010
Coming Out of Hibernation southpole Monday, September 13, 2010
Managing the Most Remote Data Center in the World code southpole Friday, July 30, 2010
Ruby Plugins for Sketchup art sketchup ruby code Saturday, April 3, 2010
The Cruel Stranger misc Tuesday, November 10, 2009
Photoshop on a Dime art Monday, October 12, 2009
Man on Wire misc Friday, October 9, 2009
Videos southpole Friday, May 1, 2009
Posing Rigs art Sunday, April 5, 2009
Metric art Monday, March 2, 2009
Cuba southpole Sunday, February 22, 2009
Wickets southpole Monday, February 16, 2009
Safe southpole Sunday, February 15, 2009
Broken Glasses southpole Thursday, February 12, 2009
End of the Second Act southpole Friday, February 6, 2009
Pigs and Fish southpole Wednesday, February 4, 2009
Last Arrivals southpole Saturday, January 31, 2009
Lily White southpole Saturday, January 31, 2009
In a Dry and Waterless Place southpole Friday, January 30, 2009
Immortality southpole Thursday, January 29, 2009
Routine southpole Thursday, January 29, 2009
Tourists southpole Wednesday, January 28, 2009
Passing Notes southpole Thursday, January 22, 2009
Translation southpole Tuesday, January 20, 2009
RNZAF southpole Monday, January 19, 2009
The Usual Delays southpole Monday, January 19, 2009
CHC southpole Saturday, January 17, 2009
Wyeth on Another Planet art Saturday, January 17, 2009
Detox southpole Friday, January 16, 2009
Packing southpole Tuesday, January 13, 2009
Nails southpole Friday, January 9, 2009
Gearing Up southpole Tuesday, January 6, 2009
Gouache, and a new system for conquering the world art Sunday, November 30, 2008
Fall 2008 HPAC Studies art Friday, November 21, 2008
YABP (Yet Another Blog Platform) southpole Thursday, November 20, 2008
A Bath southpole Monday, February 18, 2008
Green Marathon southpole Saturday, February 16, 2008
Sprung southpole Friday, February 15, 2008
Outta Here southpole Wednesday, February 13, 2008
Lame Duck DAQer southpole Tuesday, February 12, 2008
Eclipse Town southpole Saturday, February 9, 2008
One More Week southpole Wednesday, February 6, 2008
IceCube Laboratory Video Tour; Midrats Finale southpole Friday, February 1, 2008
SPIFF, Party, Shift Change southpole Monday, January 28, 2008
Good things come in threes, or 18s southpole Saturday, January 26, 2008
Sleepless in the Station southpole Thursday, January 24, 2008
Post Deploy southpole Monday, January 21, 2008
IceCube and The Beatles southpole Saturday, January 19, 2008
Midrats southpole Saturday, January 19, 2008
Video: Flight to South Pole southpole Thursday, January 17, 2008
Almost There southpole Wednesday, January 16, 2008
The Pure Land southpole Wednesday, January 16, 2008
There are no mice in the Hotel California Bunkroom southpole Sunday, January 13, 2008
Short Timer southpole Sunday, January 13, 2008
Sleepy in MacTown southpole Saturday, January 12, 2008
Sir Ed southpole Friday, January 11, 2008
Pynchon, Redux southpole Friday, January 11, 2008
Superposition of Luggage States southpole Friday, January 11, 2008
Shortcut to Toast southpole Friday, January 11, 2008
Flights: Round 1 southpole Thursday, January 10, 2008
Packing for the Pole southpole Monday, January 7, 2008
Goals for Trip southpole Sunday, January 6, 2008
Balaklavas southpole Friday, January 4, 2008
Tree and Man (Test Post) southpole Friday, December 28, 2007
Schedule southpole Sunday, December 16, 2007
How to mail stuff to John at the South Pole southpole Sunday, November 25, 2007
Summer and Winter southpole Tuesday, March 6, 2007
Homeward Bound southpole Thursday, February 22, 2007
Redeployment southpole Monday, February 19, 2007
Short-timer southpole Sunday, February 18, 2007
The Cleanest Air in the World southpole Saturday, February 17, 2007
One more day (?) southpole Friday, February 16, 2007
One more week (?) southpole Thursday, February 15, 2007
Closing Softly southpole Monday, February 12, 2007
More Photos southpole Friday, February 9, 2007
Super Bowl Wednesday southpole Thursday, February 8, 2007
Night Owls southpole Tuesday, February 6, 2007
First Week southpole Friday, February 2, 2007
More Ice Pix southpole Wednesday, January 31, 2007
Settling In southpole Tuesday, January 30, 2007
NPX southpole Monday, January 29, 2007
Pole Bound southpole Sunday, January 28, 2007
Bad Dirt southpole Saturday, January 27, 2007
The Last Laugh southpole Friday, January 26, 2007
First Delay southpole Thursday, January 25, 2007
Nope southpole Thursday, January 25, 2007
Batteries and Sheep southpole Wednesday, January 24, 2007
All for McNaught southpole Tuesday, January 23, 2007
t=0 southpole Monday, January 22, 2007
The Big (Really really big...) Picture southpole Monday, January 22, 2007
Giacometti southpole Monday, January 22, 2007
Descent southpole Monday, January 22, 2007
Video Tour southpole Saturday, January 20, 2007
How to subscribe to blog updates southpole Monday, December 11, 2006
What The Blog is For southpole Sunday, December 10, 2006
Auckland southpole Tuesday, January 11, 2005
Halfway Around the World; Dragging the Soul Behind southpole Monday, January 10, 2005
Launched southpole Sunday, January 9, 2005
Getting Ready (t minus 2 days) southpole Friday, January 7, 2005
Subscribe: RSS feed ... all topics ... or Clojure only / Lisp only