Letting the Machines Search for New Physics

Article: “Anomaly Detection for Resonant New Physics with Machine Learning”

Authors: Jack H. Collins, Kiel Howe, Benjamin Nachman

Reference : https://arxiv.org/abs/1805.02664

One of the main goals of LHC experiments is to look for signals of physics beyond the Standard Model; new particles that may explain some of the mysteries the Standard Model doesn’t answer. The typical way this works is that theorists come up with a new particle that would solve some mystery and they spell out how it interacts with the particles we already know about. Then experimentalists design a strategy of how to search for evidence of that particle in the mountains of data that the LHC produces. So far none of the searches performed in this way have seen any definitive evidence of new particles, leading experimentalists to rule out a lot of the parameter space of theorists favorite models.

A summary of searches the ATLAS collaboration has performed. The left columns show model being searched for, what experimental signature was looked at and how much data has been analyzed so far. The color bars show the regions that have been ruled out based on the null result of the search. As you can see, we have already covered a lot of territory.

Despite this extensive program of searches, one might wonder if we are still missing something. What if there was a new particle in the data, waiting to be discovered, but theorists haven’t thought of it yet so it hasn’t been looked for? This gives experimentalists a very interesting challenge, how do you look for something new, when you don’t know what you are looking for? One approach, which Particle Bites has talked about before, is to look at as many final states as possible and compare what you see in data to simulation and look for any large deviations. This is a good approach, but may be limited in its sensitivity to small signals. When a normal search for a specific model is performed one usually makes a series of selection requirements on the data, that are chosen to remove background events and keep signal events. Nowadays, these selection requirements are getting more complex, often using neural networks, a common type of machine learning model, trained to discriminate signal versus background. Without some sort of selection like this you may miss a smaller signal within the large amount of background events.

This new approach lets the neural network itself decide what signal to  look for. It uses part of the data itself to train a neural network to find a signal, and then uses the rest of the data to actually look for that signal. This lets you search for many different kinds of models at the same time!

If that sounds like magic, lets try to break it down. You have to assume something about the new particle you are looking for, and the technique here assumes it forms a resonant peak. This is a common assumption of searches. If a new particle were being produced in LHC collisions and then decaying, then you would get an excess of events where the invariant mass of its decay products have a particular value. So if you plotted the number of events in bins of invariant mass you would expect a new particle to show up as a nice peak on top of a relatively smooth background distribution. This is a very common search strategy, and often colloquially referred to as a ‘bump hunt’. This strategy was how the Higgs boson was discovered in 2012.

A histogram showing the invariant mass of photon pairs. The Higgs boson shows up as a bump at 125 GeV. Plot from here

The other secret ingredient we need is the idea of Classification Without Labels (abbreviated CWoLa, pronounced like koala). The way neural networks are usually trained in high energy physics is using fully labeled simulated examples. The network is shown a set of examples and then guesses which are signal and which are background. Using the true label of the event, the network is told which of the examples it got wrong, its parameters are updated accordingly, and it slowly improves. The crucial challenge when trying to train using real data is that we don’t know the true label of any of data, so its hard to tell the network how to improve. Rather than trying to use the true labels of any of the events, the CWoLA technique uses mixtures of events. Lets say you have 2 mixed samples of events, sample A and sample B, but you know that sample A has more signal events in it than sample B. Then, instead of trying to classify signal versus background directly, you can train a classifier to distinguish between events from sample A and events from sample B and what that network will learn to do is distinguish between signal and background. You can actually show that the optimal classifier for distinguishing the two mixed samples is the same as the optimal classifier of signal versus background. Even more amazing, this technique actually works quite well in practice, achieving good results even when there is only a few percent of signal in one of the samples.

An illustration of the CWoLa method. A classifier trained to distinguish between two mixed samples of signal and background events learns can learn to classify signal versus background. Taken from here

The technique described in the paper combines these two ideas in a clever way. Because we expect the new particle to show up in a narrow region of invariant mass, you can use some of your data to train a classifier to distinguish between events in a given slice of invariant mass from other events. If there is no signal with a mass in that region then the classifier should essentially learn nothing, but if there was a signal in that region that the classifier should learn to separate signal and background. Then one can apply that classifier to select events in the rest of your data (which hasn’t been used in the training) and look for a peak that would indicate a new particle. Because you don’t know ahead of time what mass any new particle should have, you scan over the whole range you have sufficient data for, looking for a new particle in each slice.

The specific case that they use to demonstrate the power of this technique is for new particles decaying to pairs of jets. On the surface, jets, the large sprays of particles produced when quark or gluon is made in a LHC collision, all look the same. But actually the insides of jets, their sub-structure, can contain very useful information about what kind of particle produced it. If a new particle that is produced decays into other particles, like top quarks, W bosons or some a new BSM particle, before decaying into quarks then there will be a lot of interesting sub-structure to the resulting jet, which can be used to distinguish it from regular jets. In this paper the neural network uses information about the sub-structure for both of the jets in event to determine if the event is signal-like or background-like.

The authors test out their new technique on a simulated dataset, containing some events where a new particle is produced and a large number of QCD background events. They train a neural network to distinguish events in a window of invariant mass of the jet pair from other events. With no selection applied there is no visible bump in the dijet invariant mass spectrum. With their technique they are able to train a classifier that can reject enough background such that a clear mass peak of the new particle shows up. This shows that you can find a new particle without relying on searching for a particular model, allowing you to be sensitive to particles overlooked by existing searches.

Demonstration of the bump hunt search. The shaded histogram is the amount of signal in the dataset. The different levels of blue points show the data remaining after applying tighter and tighter selection based on the neural network classifier score. The red line is the predicted amount of background events based on fitting the sideband regions. One can see that for the tightest selection (bottom set of points), the data forms a clear bump over the background estimate, indicating the presence of a new particle

This paper was one of the first to really demonstrate the power of machine-learning based searches. There is actually a competition being held to inspire researchers to try out other techniques on a mock dataset. So expect to see more new search strategies utilizing machine learning being released soon. Of course the real excitement will be when a search like this is applied to real data and we can see if machines can find new physics that us humans have overlooked!

Read More:

  1. Quanta Magazine Article “How Artificial Intelligence Can Supercharge the Search for New Particles”
  2. Blog Post on the CWoLa Method “Training Collider Classifiers on Real Data”
  3. Particle Bites Post “Going Rogue: The Search for Anything (and Everything) with ATLAS”
  4. Blog Post on applying ML to top quark decays “What does Bidirectional LSTM Neural Networks has to do with Top Quarks?”
  5. Extended Version of Original Paper “Extending the Bump Hunt with Machine Learning”

A new anomaly: the electromagnetic duality anomaly

Article: Electromagnetic duality anomaly in curved spacetimes
Authors: I. Agullo, A. del Rio and J. Navarro-Salas
Reference: arXiv:1607.08879

Disclaimer: this blogpost requires some basic knowledge of QFT (or being comfortable with taking my word at face value for some of the claims made :))

Anomalies exists everywhere. Probably the most intriguing ones are medical, but in particle physics they can be pretty fascinating too. In physics, anomalies refer to the breaking of a symmetry. There are basically two types of anomalies:

  • The first type, gauge anomalies, are red-flags: if they show up in your theory, they indicate that the theory is mathematically inconsistent.
  • The second type of anomaly does not signal any problems with the theory and in fact can have experimentally observable consequences. A prime example is the chiral anomaly. This anomaly nicely explains the decay rate of the neutral pion into two photons.
    Fig. 1: Illustration of pion decay into two photons. [Credit: Wikimedia Commons]

In this paper, a new anomaly is discussed. This anomaly is related to the polarization of light and is called the electromagnetic duality anomaly.

Chiral anomaly 101
So let’s first brush up on the basics of the chiral anomaly. How does this anomaly explain the decay rate of the neutral pion into two photons? For that we need to start with the Lagrangian for QED that describes the interactions between the electromagnetic field (that is, the photons) and spin-½ fermions (which pions are build from):

\displaystyle \mathcal L = \bar\psi \left( i \gamma^\mu \partial_\mu - i e \gamma^\mu A_\mu \right) \psi + m \bar\psi \psi

where the important players in the above equation are the \psis that describe the spin-½ particles and the vector potential A_\mu that describes the electromagnetic field. This Lagrangian is invariant under the chiral symmetry:

\displaystyle \psi \to e^{i \gamma_5} \psi .

Due to this symmetry the current density j^\mu = \bar{\psi} \gamma_5 \gamma^\mu \psi is conserved: \nabla_\mu j^\mu = 0. This then immediately tells us that the charge associated with this current density is time-independent. Since the chiral charge is time-independent, it prevents the \psi fields to decay into the electromagnetic fields, because the \psi field has a non-zero chiral charge and the photons have no chiral charge. Hence, if this was the end of the story, a pion would never be able to decay into two photons.

However, the conservation of the charge is only valid classically! As soon as you go from classical field theory to quantum field theory this is no longer true; hence, the name (quantum) anomaly.  This can be seen most succinctly using Fujikawa’s observation that even though the field \psi and Lagrangian are invariant under the chiral symmetry, this is not enough for the quantum theory to also be invariant. If we take the path integral approach to quantum field theory, it is not just the Lagrangian that needs to be invariant but the entire path integral needs to be:

\displaystyle \int D[A] \, D[\bar\psi]\, \int D[\psi] \, e^{i\int d^4x \mathcal L} .

From calculating how the chiral symmetry acts on the measure D \left[\psi \right]  \, D \left[\bar \psi \right], one can extract all the relevant physics such as the decay rate.

The electromagnetic duality anomaly
Just like the chiral anomaly, the electromagnetic duality anomaly also breaks a symmetry at the quantum level that exists classically. The symmetry that is broken in this case is – as you might have guessed from its name – the electromagnetic duality. This symmetry is a generalization of a symmetry you are already familiar with from source-free electromagnetism. If you write down source-free Maxwell equations, you can just swap the electric and magnetic field and the equations look the same (you just have to send  \displaystyle \vec{E} \to \vec{B} and \vec{B} \to - \vec{E}). Now the more general electromagnetic duality referred to here is slightly more difficult to visualize: it is a rotation in the space of the electromagnetic field tensor and its dual. However, its transformation is easy to write down mathematically:

\displaystyle F_{\mu \nu} \to \cos \theta \, F_{\mu \nu} + \sin \theta \, \, ^\ast F_{\mu \nu} .

In other words, since this is a symmetry, if you plug this transformation into the Lagrangian of electromagnetism, the Lagrangian will not change: it is invariant. Now following the same steps as for the chiral anomaly, we find that the associated current is conserved and its charge is time-independent due to the symmetry. Here, the charge is simply the difference between the number of photons with left helicity and those with right helicity.

Let us continue following the exact same steps as those for the chiral anomaly. The key is to first write electromagnetism in variables analogous to those of the chiral theory. Then you apply Fujikawa’s method and… *drum roll for the anomaly that is approaching*…. Anti-climax: nothing happens, everything seems to be fine. There are no anomalies, nothing!

So why the title of this blog? Well, as soon as you couple the electromagnetic field with a gravitational field, the electromagnetic duality is broken in a deeply quantum way. The number of photon with left helicity and right helicity is no longer conserved when your spacetime is curved.

Physical consequences
Some potentially really cool consequences have to do with the study of light passing by rotating stars, black holes or even rotating clusters. These astrophysical objects do not only gravitationally bend the light, but the optical helicity anomaly tells us that there might be a difference in polarization between lights rays coming from different sides of these objects. This may also have some consequences for the cosmic microwave background radiation, which is ‘picture’ of our universe when it was only 380,000 years old (as compared to the 13.8 billion years it is today!). How big this effect is and whether we will be able to see it in the near future is still an open question.

 

 

Further reading 

  • An introduction to anamolies using only quantum mechanics instead of quantum field theory is “Anomalies for pedestrians” by Barry Holstein 
  • The beautiful book “Quantum field theory and the Standard Model” by Michael Schwartz has a nice discussion in the later chapters on the chiral anomaly.
  • Lecture notes by Adal Bilal for graduate students on anomalies in general  can be found here

LIGO and Gravitational Waves: A Hep-ex perspective

The exciting Twitter rumors have been confirmed! On Thursday, LIGO finally announced the first direct observation of gravitational waves, a prediction 100 years in the making. The media storm has been insane, with physicists referring to the discovery as “more significant than the discovery of the Higgs boson… the biggest scientific breakthrough of the century.” Watching Thursday’s press conference from CERN, it was hard not to make comparisons between the discovery of the Higgs and LIGO’s announcement.

 

 

The gravitational-wave event GW150914 observed by the LIGO Collaboration
The gravitational-wave event GW150914 observed by the LIGO Collaboration

 

Long standing Searches for well known phenomena

 

The Higgs boson was billed as the last piece of the Standard Model puzzle. The existence of the Higgs was predicted in the 1960s in order to explain the mass of vector bosons of the Standard Model, and avoid non-unitary amplitudes in W boson scattering. Even if the Higgs didn’t exist, particle physicists expected new physics to come into play at the TeV Scale, and experiments at the LHC were designed to find it.

 

Similarly, gravitational waves were the last untested fundamental prediction of General Relativity. At first, physicists remained skeptical of the existence of gravitational waves, but the search began in earnest with Joseph Webber in the 1950s (Forbes). Indirect evidence of gravitational waves was demonstrated a few decades later. A binary system consisting of a pulsar and neutron star was observed to release energy over time, presumably in the form of gravitational waves. Using Webber’s method for inspiration, LIGO developed two detectors of unprecedented precision in order to finally make direct observation.

 

Unlike the Higgs, General Relativity makes clear predictions about the properties of gravitational waves. Waves should travel at the speed of light, have two polarizations, and interact weakly with matter. Scientists at LIGO were even searching for a very particular signal, described as a characteristic “chirp”. With the upgrade to the LIGO detectors, physicists were certain they’d be capable of observing gravitational waves. The only outstanding question was how often these observations would happen.

 

The search for the Higgs involved more uncertainties. The one parameter essential for describing the Higgs, its mass, is not predicted by the Standard Model. While previous collider experiments at LEP and Fermilab were able to set limits on the Higgs mass, the observed properties of the Higgs were ultimately unknown before the discovery. No one knew whether or not the Higgs would be a Standard Model Higgs, or part of a more complicated theory like Supersymmetry or technicolor.

 

Monumental scientific endeavors

 

Answering the most difficult questions posed by the universe isn’t easy, or cheap. In terms of cost, both LIGO and the LHC represent billion dollar investments. Including the most recent upgrade, LIGO cost a total $1.1 billion, and when it was originally approved in 1992, “it represented the biggest investment the NSF had ever made” according to France Córdova, NSF director. The discovery of the Higgs was estimated by Forbes to cost a total of $13 billion, a hefty price to be paid by CERN’s member and observer states. Even the electricity bill costs more than $200 million per year.

 

The large investment is necessitated by the sheer monstrosity of the experiments. LIGO consists of two identical detectors roughly 4 km long, built 3000 km apart. Because of it’s large size, LIGO is capable of measuring ripples in space 10000 times smaller than an atomic nucleus, the smallest scale ever measured by scientists (LIGO Fact Page). The size of the LIGO vacuum tubes is only surpassed by those at the LHC. At 27 km in circumference, the LHC is the single largest machine in the world, and the most powerful particle accelerator to date. It only took a handful of people to predict the existence of gravitational waves and the Higgs, but it took thousands of physicists and engineers to find them.

 

Life after Discovery

 

Even the language surrounding both announcements is strikingly similar. Rumors were circulating for months before the official press conferences, and the expectations from each respective community were very high. Both discoveries have been touted as the discoveries of the century, with many experts claiming that results would usher in a “new era” of particle physics or observational astronomy.

 

With a few years of hindsight, it is clear that the “new era” of particle physics has begun. Before Run I of the LHC, particle physicists knew they needed to search for the Higgs. Now that the Higgs has been discovered, there is much more uncertainty surrounding the field. The list of questions to try and answer is enormous. Physicists want to understand the source of the Dark Matter that makes up roughly 25% of the universe, from where neutrinos derive their mass, and how to quantize gravity. There are several ad hoc features of the Standard Model that merit additional explanation, and physicists are still searching for evidence of supersymmetry and grand unified theories. While the to-do list is long, and well understood, how to solve these problems is not. Measuring the properties of the Higgs does allow particle physicists to set limits on beyond the Standard Model Physics, but it’s unclear at which scale new physics will come into play, and there’s no real consensus about which experiments deserve the most support. For some in the field, this uncertainty can result in a great deal of anxiety and skepticism about the future. For others, the long to-do list is an absolutely thrilling call to action.

 

With regards to the LIGO experiment, the future is much more clear. LIGO has only published one event from 16 days of data taking. There is much more data already in the pipeline, and more interferometers like VIRGO and (e)LISA, planning to go online in the near future. Now that gravitational waves have been proven to exist, they can be used to observe the universe in a whole new way. The first event already contains an interesting surprise. LIGO has observed two inspriraling black holes of 36 and 29 solar masses, merging into a final black hole of 62 solar masses. The data thus confirmed the existence of heavy stellar black holes, with masses more than 25 times greater than the sun, and that binary black hole systems form in nature (Atrophysical Journal). When VIRGO comes online, it will be possible to triangulate the source of these gravitational waves as well. LIGO’s job is to watch, and see what other secrets the universe has in store.