Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Upload your resume securely at https://www.dwsimpson.com
to be contacted when our jobs meet your skills and objectives.


Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 07-24-2016, 10:50 PM
Lisa Solomon Lisa Solomon is offline
Member
Non-Actuary
 
Join Date: Nov 2012
College: UCLA, MIT
Posts: 32
Blog Entries: 1
Default Webinar: Improve Your Regression with Modern Regression Analysis Technique

Improve Your Regression with Modern Regression Analysis Technique

Click Here to Register
Alternate Link: http://info.salford-systems.com/impr...tuarialoutpost

July 27th and Aug 10th, 10AM – 11AM PDT
  • If the time is inconvenient, please register and we will send you a recording.

ABSTRACT:
Join us for this two part webinar series on improving your regression using modern regression analysis techniques, presented by Senior Scientist, Mikhail Golovyna. In these webinars you will learn how to drastically improve predication accuracy in your regression with a new model that addresses common concerns such as missing values, interactions, and nonlinearities in your data.

We will demonstrate the techniques using real-world data sets and introduce the main concepts behind Leo Breiman's Random Forests and Jerome Friedman's GPS (Generalized PathSeeker™), MARS® (Multivariate Adaptive Regression Splines), and Gradient Boosting.

Who should attend:
  • Attend if you want to implement data science techniques even without a data science, statistical or programming background.
  • Attend if you want to understand why data science techniques are so important in improving regression models.

Methods covered include:
  • Linear, Non-linear, Regularized regression techniques (Part 1)
  • GPS, LARS, LASSO, Elastic Net, MARS® (Part 1)
  • Boosting (Stochastic Gradient Boosting, TreeNet®), RandomForests®, ISLE™ and RuleLearner® (Part 2)

REGISTER NOW
Alternate Link: http://info.salford-systems.com/impr...tuarialoutpost

Can't make it? Sign up and receive the recording!
Reply With Quote
  #2  
Old 07-25-2016, 06:44 AM
whoanonstop's Avatar
whoanonstop whoanonstop is offline
Member
Non-Actuary
 
Join Date: Aug 2013
Location: Los Angeles, CA
Studying for Spark / Scala
College: College of William and Mary
Favorite beer: Orange Juice
Posts: 5,899
Blog Entries: 1
Default

Quote:
Originally Posted by Lisa Solomon View Post
  • Attend if you want to implement data science techniques even without a data science, statistical or programming background.
That sounds dangerous.

Quote:
Originally Posted by Lisa Solomon View Post
  • Attend if you want to understand why data science techniques are so important in improving regression models.
Most people don't want to trade interpret-ability for prediction accuracy. Might be tough to pry some away from their basic linear/logistic regressions.

I doubt this software will give me the freedom I have when building models in R/Python, but I'll probably attend the webinar anyways to see if the software adds any features that would be difficult for me to do myself. Let's just say I'm a skeptic.

-Riley
__________________
Reply With Quote
  #3  
Old 07-27-2016, 09:06 PM
DiscreteAndDiscreet DiscreteAndDiscreet is offline
Member
AAA
 
Join Date: May 2016
Posts: 478
Default

This MARS thing is basically exactly the same concept as what I was kicking around in the MDL thread. It's a search for piecewise curves based on a criterion trading goodness of fit for model complexity.

I hate to admit it, but the bit about overfitting the data then searching for a pruned model is more clever than what I had come up with so far. That gets you to the point where your search graph has a known diameter, and even though I hadn't gotten as far as implementing that kind of search, I was grappling with how to limit the diameter of the search graph. I'm not sure if that approach has different implications for what kinds of solutions the search algorithm will settle into, but it certainly reduces the computational complexity.

When I search for Multivariate Adaptive Regression Splines on Google, I'm mostly seeing stuff from the early 90's and then some stuff from recently, but then I also see that there are a bunch of implementations of it. I'm a little perplexed here because I kind of feel like people have been hiding this thing from me for 25 years.
Reply With Quote
  #4  
Old 07-28-2016, 08:33 AM
whoanonstop's Avatar
whoanonstop whoanonstop is offline
Member
Non-Actuary
 
Join Date: Aug 2013
Location: Los Angeles, CA
Studying for Spark / Scala
College: College of William and Mary
Favorite beer: Orange Juice
Posts: 5,899
Blog Entries: 1
Default

First of all... Damn I was too busy to get the chance to attend this webinar. Hopefully I can watch it today and get "caught up".

Quote:
Originally Posted by DiscreteAndDiscreet View Post
When I search for Multivariate Adaptive Regression Splines on Google, I'm mostly seeing stuff from the early 90's and then some stuff from recently, but then I also see that there are a bunch of implementations of it. I'm a little perplexed here because I kind of feel like people have been hiding this thing from me for 25 years.
Surely you've heard of recursive partitioning. MARS is very similar to decision trees, which were introduced a few years earlier, I believe. Currently, there has been a huge push in this direction due to the increase in computational power, making these methods more feasible. Boosting/Random Forests/Stacking etc.

-Riley
__________________
Reply With Quote
  #5  
Old 07-28-2016, 08:19 PM
DiscreteAndDiscreet DiscreteAndDiscreet is offline
Member
AAA
 
Join Date: May 2016
Posts: 478
Default

Well, no I hadn't heard of it. My background is more old school CS (computational linguistics, complexity theory, data structures, and graph algorithms) plus related math (information theory, group theory, analysis, linear algebra, topology, and a smidgen of category theory) and optimization and logistics. I am coming into this area starting from a different end of the academic ecosystem. I'm flipping through some of the links on Wikipedia in the vicinity of some of these methods and my eyes are glazing over a bit. But I like MARS.

I mainly want to work with methods that produce a model that can be perceived as reasonable without knowing anything about the algorithm that produced them. This is a bit influenced by the notion of a witness to a decision theory problem in complexity theory. For most practical problems, there is a piece of information that demonstrates the solution of a problem. In human terms, piecewise linearity can be demonstrated visually, and the notion that this particular piecewise linear function was found to be the best description of the data is not that much of a leap of faith, plus to some extent you may be able to collect information that supports the conclusion as you're searching for your fit.

I'm not really interested much in using machine learning algorithms to discover new relationships between variables. I'm very skeptical of how robust that really can be (although I also know that in some contexts that isn't necessarily a substantial problem). What I'm more interested in, is using a data focus as a limitation. I'd like to say that I've used a data set to the extent that it has something useful to say, and I might consider other sources as a supplement if that's insufficient to complete a task. For me, working in an actuarial context, this is the entire story. Being able to document the basis of your expectations is a powerful thing, because when your expectations are wrong, you have a clearer picture of exactly why they were wrong. You can identify that the outcome is distinctly different from what could be learned from the sources of information you used (regardless of whether your reasons were good or bad).

Last edited by DiscreteAndDiscreet; 07-28-2016 at 08:48 PM.. Reason: tweaked a sentence
Reply With Quote
  #6  
Old 07-28-2016, 08:25 PM
whoanonstop's Avatar
whoanonstop whoanonstop is offline
Member
Non-Actuary
 
Join Date: Aug 2013
Location: Los Angeles, CA
Studying for Spark / Scala
College: College of William and Mary
Favorite beer: Orange Juice
Posts: 5,899
Blog Entries: 1
Default

Quote:
Originally Posted by DiscreteAndDiscreet View Post
I'm not really interested much in using machine learning algorithms to discover new relationships between variables.
Machine learning is not heavily focused on interpretation, but instead prediction.

Surprised that you're not at least somewhat interested in those methods as a CS background is quite important.

-Riley
__________________
Reply With Quote
  #7  
Old 07-29-2016, 08:49 AM
DiscreteAndDiscreet DiscreteAndDiscreet is offline
Member
AAA
 
Join Date: May 2016
Posts: 478
Default

Quote:
Originally Posted by whoanonstop View Post
Machine learning is not heavily focused on interpretation, but instead prediction.

Surprised that you're not at least somewhat interested in those methods as a CS background is quite important.

-Riley
I generally have some distaste with bottom up approaches that use some kind of aggregation of little predictions. I'll try to explain that.

The heart of MDL is Kolmogorov complexity. Kolmogorov complexity isn't computable, but you have a sense of it from looking at an algorithm, and identifying ways to remove redundant structure using function calls or finding a new way to express it that cuts closer to the contour of whatever it is you're calculating. Another way to look at it is that the process of taking an algorithm and looking for ways to remove redundant structure or give it a more natural expression can be viewed as defining a domain specific programming language using whatever facilities are available in the parent language.

Now if you look at the equivalent of mutual information for Kolmogorov complexity, what that actually measures is the benefit gained from expressing two different objects with a common domain specific language rather than using whichever domain specific languages are most efficient for them separately.

When you use an uninterpretable approach, you get a domain specific language that is inscrutable. You don't know how well its definitions of terms corresponds with external definitions of terms. This is essentially related to the Sapir-Whorf hypothesis that human thought processes are constrained by the manner in which human languages lump together or split underlying families of concepts.

Another issue I have with bottom up approaches is generally they're stuck somewhere around AC0 in the complexity hierarchy. It's undeniable that bottom up approaches can approximate the problem of deciding higher complexity languages, but I'm left with the sense that they can't participate in interactive proofs where a prover (who is untrusted) convinces a verifier (who is trusted) of a conclusion. (Well, at least they can't do that until they can pass a Turing test.) The only form of proof they can furnish is prediction accuracy.

The problem you've had in promoting machine learning methods is that you have to use ex-post prediction accuracy to convince people to take responsibility for an ex-ante prediction. That's a tough sell without a structural theory of what past conditions were and what current conditions are.

Last edited by DiscreteAndDiscreet; 07-29-2016 at 08:49 AM.. Reason: I hate BB code forums
Reply With Quote
Reply

Tags
data mining, data science, predictive analytics, regression, regression analysis

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 03:14 AM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.27258 seconds with 11 queries