FlexTree
FlexTree is a method for doing “classification” based on a
“learning”
(alternatively “training”) sample of data. By classification is meant
prediction of an unknown outcome “Y” that belongs to one of two, perhaps
more, categories on the basis of “features” X. A learning, or
training, sample of observations like X coupled with known values
Y is given. The prediction, given features but not outcome, is based on
the learning sample. The predictor is of rooted, binary tree-structured
form and resembles that of CARTR, though it differs greatly in its
selection and combination of predictors. They can be qualitative or
quantitative. Motivation comes from studies of the genetics of complex
disease. There is a need to understand complicated, gene-gene,
gene-environment, and other possible interactions that enables ready
interpretation and accurate classification. The challenge is most
concerning when there are many candidate features and accurate
prediction is possible, at least in principle; but the “signal” is carried by only a few features. Each feature that bears upon the signal
carries a score. Scores add. The “reward” for sufficiently high score
is outcome, typically untoward outcome.
FlexTree is an R package. With current algorithms and software,
computation could be an issue. With a typical modern desktop computer, a
learning sample of 2,000 subjects and about 50 features, FlexTree should
run in about 10 hours. The entire procedure involves growing the tree;
suitable validation; and, as with CART, pruning a large tree to one
smaller intended for subsequent use. With much larger learning sample
size or many more features, computer time might be prohibitive, in which
case users should consider other approaches or a pre-filtering of data
to reduce the time involved to a manageable level.
STANFORD ACADEMIC FLEXTREE LICENSE AGREEMENT
- This is a legal agreement between you, RECIPIENT, and STANFORD
UNIVERSITY. By accepting, receiving, and using FlexTree, including any
accompanying information, materials or manuals, you are
agreeing to be bound by the terms of this Agreement. If you do not
agree to the terms of this Agreement, please contact
(flextreesupport@gmail.com) with your concerns.
- STANFORD grants to RECIPIENT a royalty-free, nonexclusive, and
nontransferable license to use the Program furnished hereunder, upon the
terms and conditions set out below.
- RECIPIENT acknowledges that the Program is a research tool still in
the development stage and that it is being supplied as is, without any
accompanying services, support, or improvements from STANFORD.
- RECIPIENT agrees to use the Program solely for internal
non-commerical purposes and shall not distribute or transfer it to
another location or to any other person without prior written permission
from STANFORD.
- RECIPIENT agrees not to reverse engineer, reverse assemble, reverse
compile decompile, disassemble, or otherwise attempt to create the
source code for the Program.
- If permission to transfer the Program is given, under Article 4
above, RECIPIENT warrants that RECIPIENT will not remove or export any
part of the software or Program from the United States except in full
compliance with all United States and other application laws and
regulations.
- RECIPIENT will indemnify, hold harmless, and defend STANFORD against
any claim of any kind arising out of or related to the exercise of any
rights granted under this Agreement or the breach of this Agreement by
RECIPIENT.
- Title and copyright to the Program and any associated documentation
shall at all times remain with STANFORD, and RECIPIENT agrees to
preserve same.
If you AGREE, please write to
flextreesupport@gmail.com
to
- Confirm your agreement
- Provide information about yourself, including name, organization,
title, contact email address, and the purpose of using this software.