

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 


Thread Tools  Search this Thread  Display Modes 
#1




Webinar: Improve Your Regression with Modern Regression Analysis Technique
Improve Your Regression with Modern Regression Analysis Technique
Click Here to Register Alternate Link: http://info.salfordsystems.com/impr...tuarialoutpost July 27th and Aug 10th, 10AM – 11AM PDT
ABSTRACT: Join us for this two part webinar series on improving your regression using modern regression analysis techniques, presented by Senior Scientist, Mikhail Golovyna. In these webinars you will learn how to drastically improve predication accuracy in your regression with a new model that addresses common concerns such as missing values, interactions, and nonlinearities in your data. We will demonstrate the techniques using realworld data sets and introduce the main concepts behind Leo Breiman's Random Forests and Jerome Friedman's GPS (Generalized PathSeeker™), MARS® (Multivariate Adaptive Regression Splines), and Gradient Boosting. Who should attend:
Methods covered include:
REGISTER NOW Alternate Link: http://info.salfordsystems.com/impr...tuarialoutpost Can't make it? Sign up and receive the recording! 
#2




Quote:
Quote:
I doubt this software will give me the freedom I have when building models in R/Python, but I'll probably attend the webinar anyways to see if the software adds any features that would be difficult for me to do myself. Let's just say I'm a skeptic. Riley 
#3




This MARS thing is basically exactly the same concept as what I was kicking around in the MDL thread. It's a search for piecewise curves based on a criterion trading goodness of fit for model complexity.
I hate to admit it, but the bit about overfitting the data then searching for a pruned model is more clever than what I had come up with so far. That gets you to the point where your search graph has a known diameter, and even though I hadn't gotten as far as implementing that kind of search, I was grappling with how to limit the diameter of the search graph. I'm not sure if that approach has different implications for what kinds of solutions the search algorithm will settle into, but it certainly reduces the computational complexity. When I search for Multivariate Adaptive Regression Splines on Google, I'm mostly seeing stuff from the early 90's and then some stuff from recently, but then I also see that there are a bunch of implementations of it. I'm a little perplexed here because I kind of feel like people have been hiding this thing from me for 25 years. 
#4




First of all... Damn I was too busy to get the chance to attend this webinar. Hopefully I can watch it today and get "caught up".
Quote:
Riley 
#5




Well, no I hadn't heard of it. My background is more old school CS (computational linguistics, complexity theory, data structures, and graph algorithms) plus related math (information theory, group theory, analysis, linear algebra, topology, and a smidgen of category theory) and optimization and logistics. I am coming into this area starting from a different end of the academic ecosystem. I'm flipping through some of the links on Wikipedia in the vicinity of some of these methods and my eyes are glazing over a bit. But I like MARS.
I mainly want to work with methods that produce a model that can be perceived as reasonable without knowing anything about the algorithm that produced them. This is a bit influenced by the notion of a witness to a decision theory problem in complexity theory. For most practical problems, there is a piece of information that demonstrates the solution of a problem. In human terms, piecewise linearity can be demonstrated visually, and the notion that this particular piecewise linear function was found to be the best description of the data is not that much of a leap of faith, plus to some extent you may be able to collect information that supports the conclusion as you're searching for your fit. I'm not really interested much in using machine learning algorithms to discover new relationships between variables. I'm very skeptical of how robust that really can be (although I also know that in some contexts that isn't necessarily a substantial problem). What I'm more interested in, is using a data focus as a limitation. I'd like to say that I've used a data set to the extent that it has something useful to say, and I might consider other sources as a supplement if that's insufficient to complete a task. For me, working in an actuarial context, this is the entire story. Being able to document the basis of your expectations is a powerful thing, because when your expectations are wrong, you have a clearer picture of exactly why they were wrong. You can identify that the outcome is distinctly different from what could be learned from the sources of information you used (regardless of whether your reasons were good or bad). Last edited by DiscreteAndDiscreet; 07282016 at 08:48 PM.. Reason: tweaked a sentence 
#6




Quote:
Surprised that you're not at least somewhat interested in those methods as a CS background is quite important. Riley 
#7




Quote:
The heart of MDL is Kolmogorov complexity. Kolmogorov complexity isn't computable, but you have a sense of it from looking at an algorithm, and identifying ways to remove redundant structure using function calls or finding a new way to express it that cuts closer to the contour of whatever it is you're calculating. Another way to look at it is that the process of taking an algorithm and looking for ways to remove redundant structure or give it a more natural expression can be viewed as defining a domain specific programming language using whatever facilities are available in the parent language. Now if you look at the equivalent of mutual information for Kolmogorov complexity, what that actually measures is the benefit gained from expressing two different objects with a common domain specific language rather than using whichever domain specific languages are most efficient for them separately. When you use an uninterpretable approach, you get a domain specific language that is inscrutable. You don't know how well its definitions of terms corresponds with external definitions of terms. This is essentially related to the SapirWhorf hypothesis that human thought processes are constrained by the manner in which human languages lump together or split underlying families of concepts. Another issue I have with bottom up approaches is generally they're stuck somewhere around AC0 in the complexity hierarchy. It's undeniable that bottom up approaches can approximate the problem of deciding higher complexity languages, but I'm left with the sense that they can't participate in interactive proofs where a prover (who is untrusted) convinces a verifier (who is trusted) of a conclusion. (Well, at least they can't do that until they can pass a Turing test.) The only form of proof they can furnish is prediction accuracy. The problem you've had in promoting machine learning methods is that you have to use expost prediction accuracy to convince people to take responsibility for an exante prediction. That's a tough sell without a structural theory of what past conditions were and what current conditions are. Last edited by DiscreteAndDiscreet; 07292016 at 08:49 AM.. Reason: I hate BB code forums 
Tags 
data mining, data science, predictive analytics, regression, regression analysis 
Thread Tools  Search this Thread 
Display Modes  

