Batch anomaly detection in static social networks



NISLab has also developed a novel framework based upon a variational Expectation-Maximization algorithm to assess the likelihood of each observation being anomalous. The computational complexity of the proposed method scales linearly with both the number of observed events and the number of people in the network, while most competing methods exhibit far worse complexities which scale quadratically with the number of people in the network or the number of meetings observed. This makes our method much more well-suited to very large networks, and it requires no parameter tuning. In our simulations, we were able to detect anomalous meetings within a 2000-person network with 100% accuracy using just 200 observations, while alternative approaches based on graphs were significantly more computationally expensive and alternative approaches based on Support Vector Machines resulted in large numbers of falsely detected anomalous meetings.

To run this, you need the following file: demo_varem.zip (Just the variational EM). Inside, you'll find two .mat files with the preprocessed Enron dataset. You'll also find three scripts: "demo_synthetic.m", "demo_enron_subject.m" and "demo_enron_day.m", which exemplify how to call all the other functions. Run either of these, for results with synthetic data or the Enron data, organized by subject line or by day, respectively. Additionally, you should download and install the lightspeed toolbox for MATLAB from Tom Minka's page, because of the "logsumexp" function and other improvements: lightspeed toolbox.
With the synthetic example, you can set the parameters by editing "demo_synthetic.m". You shouldn't decrease the number of observations much lower than n=200, unless the dimensionality p is low. You can crank up p as far as your memory will allow. In a laptop with 2 GB RAM, if you keep the MC sample size down at m=1000 and set, say, n=200, you should be able to go as far as p=20,000. The Enron example, organized by day, has p=75,511.

If you intend to run the comparison with the OCSVM algorithm, you'll need to use the file: demo_ocsvm.zip (Variational EM vs. OCSVM) and also download and install the libsvm toolbox for MATLAB, by Chih-Chung Chang and Chih-Jen Lin: libsvm toolbox. Tip: to get libsvm working on some platforms, you may need to edit "make.m" and replace all references to .obj with .o instead. For comparing with L1O-kNNG, we have included a (probably rather crude) implementation: demo_l1oknng.zip (Variational EM vs. L1O-kNNG). Extract all the demo_*.zip files to the same directory. Remember to add the toolbox locations to your MATLAB path. Then, run "demo_ocsvm.m" or "demo_l1o_knng.m". Optionally, you can also just edit "demo_synthetic.m" to set parameters and to mix and match methods for comparison (simply set the corresponding run_* variables equal to one, in the "Initialization" section of the code) and then run the edited "demo_synthetic.m".

Spectral Target and Anomaly Detection from Incoherent Projections 


We have developed computationally efficient approaches and associated theoretical performance guarantees for the detection of known spectral targets and spectral anomalies from few incoherent projections. The proposed approaches accommodate targets of different signal strengths contaminated by a colored Gaussian background, and perform target detection without reconstructing the spectral input from the observations. The theoretical performance bounds of the target detector highlight fundamental tradeoffs among the number of measurements collected, amount of background signal present, signal-to-noise ratio, and similarity among potential targets in a known spectral dictionary. The anomaly detector is designed to control the number of false discoveries below a desired level and can be adapted to uncertainties in the user’s knowledge of the spectral dictionary. Unlike approaches based on the principles of compressed sensing, the proposed approach does not depend on a known sparse representation of targets; rather, the theoretical performance bounds exploit the structure of a known dictionary of targets and the Johnson–Lindenstrauss lemma. Simulation experiments illustrate the practicality and effectiveness of the proposed approaches.


Fast Translation-Invariant Tree-Pruning Reconstruction (for Poisson or Gaussian noise)


(This work was done in collaboration with Dr. Robert Nowak.)
Download information:
Matlab Toolbox
To install, download the appropriate toolbox for your platform, unzip it to the desired location, and add that location to your MATLAB path. To get started, try running Example_1D.m, Example_2D.m, Example_2D_Gaussian.m, and Example_Tomography.m. This toolbox contains MATLAB code for translation invariant Poisson or Gaussian signal and image reconstruction. The method requires O(N log N) computations.
Poisson image denoising example:
The techniques presented here allow multiscale photon-limited image reconstruction methods to be implemented with significantly less computational complexity than previously possible. Multiscale Haar estimation is a promising technique in the context of Poisson data, but the computational burden it imposes makes itimpractical for many applications which involve iterative algorithms, such as deblurring and tomographic reconstruction. With the advent of the proposed implementation techniques, hereditary translation-invariant Haar wavelet-based estimates can be calculated in O(N logN) operations; this complexity is comparable to that of standard translation-invariant wavelet denoising (O(N logN)). Fast translation-invariant Haar denoising for Poisson data is accomplished by deriving the relationship between maximum penalized likelihood tree pruning decisions and the undecimated wavelet transform coefficients.






Example2D_truth
Example2D_data
Example2D_est
OriginalObservationReconstruction
ExampleTomo_truth
ExampleTomo_data
ExampleTomo_est
OriginalObservationReconstruction
confocalTrue
confocalObs
confocalEst
OriginalObservationReconstruction


Coarse-to-Fine Image Reconstruction


(This work was done in collaboration with Prof. Rui Castro and Prof. Robert Nowak.)

Download information: Mac Toolbox - Linux (64-bit) Toolbox - Windows Toolbox. If you have a different platform than one of those listed above, download the Mac or Linux toolbox and run the "compile.m" script to generate the appropriate MEX files. To install, download the appropriate toolbox for your platform, unzip it to the desired location, and add that location to your MATLAB path. To get started, try running Example_CTFWedgeletApprox.m and Example_CTFPlateletApprox.m. This toolbox contains C mex files to be used with MATLAB for coarse-to-fine wedgelet and platelet image reconstruction under Poisson or Gaussian noise models. The method requires O(N) computation time.

For wedgelet image reconstruction, we consider a sequential, coarse-to-fine estimation of a piecewise constant function with smooth boundaries. Accurate detection and localization of the boundary (a manifold) is the key aspect of this problem. In general, algorithms capable of achieving optimal performance require exhaustive searches over large dictionaries that grow exponentially with the dimension of the observation domain. The computational burden of the search hinders the use of such techniques in practice, and motivates our work. We consider a sequential, coarse-to-fine approach that involves first examining the data on a coarse grid, and then refining the analysis and approximation in regions of interest. Our estimators involve an almost linear-time (in two dimensions) sequential search over the dictionary, and converge at the same near-optimal rate as estimators based on exhaustive searches. Specifically, for two dimensions, our algorithm requires O(n^{7/6}) operations for an n-pixel image, much less than the traditional wedgelet approaches, which require O(n^{11/6}) operations. The coarse-to-fine platelet image reconstruction technique is similar, but can be used to estimate images consisiting of (C2) smooth surfaces separated by (C2) smooth boundaries.




CTFWObs

CTFWPreview

CTFWEst

Noisy Image, MSE = 0.01003Coarse Estimate, MSE = 0.1603Image Estimate, MSE = 0.004885
CTFPObs

CTFPPreview

CTFPEst

Noisy Image, MSE = 0.009025Coarse Estimate, MSE = 0.001174Image Estimate, MSE = 0.0002264