Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P1708 R8 Basic Statistics #475

Open
wg21bot opened this issue Jun 23, 2019 · 19 comments
Open

P1708 R8 Basic Statistics #475

wg21bot opened this issue Jun 23, 2019 · 19 comments
Labels
B3 - addition Bucket 3 as described by P0592: material that is not mentioned in P0592 C++26 Targeted at C++26 IS Ship vehicle: IS LEWG Library Evolution ready-for-library-evolution-meeting-review This paper needs to be discussed at a Library Evolution meeting scheduled-for-library-evolution This paper has been scheduled for one of the groups: LEWG, LEWG Incubator, or a Mailing List review size - large paper size estimate
Milestone

Comments

@wg21bot
Copy link
Collaborator

wg21bot commented Jun 23, 2019

P1708R0 Simple Statistics functions (Richard Dosselmann, Michael Wong)

@wg21bot wg21bot added this to the 2019-07 milestone Jun 23, 2019
@wg21bot wg21bot added the SG19 Machine Learning label Jun 23, 2019
@brycelelbach brycelelbach added the LEWGI Library Evolution Incubator label Jul 13, 2019
@brycelelbach
Copy link

brycelelbach commented Jul 18, 2019

Cologne 2019-07 LEWGI Minutes

P1708R0 Simple Statistical Functions For the Standard Library: Direction Review

Champion: Phillipp Ratzloff

Minute Taker: Vincent Reverdy

Start Overview: 07-18 10:40

Add range versions of these algorithms.

Specifying the intermediate type with a template parameter seems problematic. Instead, add a three-argument version that takes an initial value (and uses the type of that initial value as the intermediate type).

Bikeshed all the names.

Rolling algorithm versions?

median should not require pre-sorting, it can be implemented more efficiently with nth_element.

Having median return a pair of iterators is a usability issue.

Why is it useful to get the range of the median? Why do you want the iterator to the median, instead of the value?

This paper should be using ForwardIterators not InputIterators.

Options for intermediate type APIs.

// 0
template <typename T = double, typename I>
T mean(I f, I l);

// 1
template <typename I, typename T>
T mean(I f, I l, T sum = /* iterator value type */, std::size_t n = 0);

Issues to discuss/poll on:

  • Float centricness
  • Convenience versions that require temporary storage.
  • Builtin operators vs functors
  • Intermediate type API
  • Empty range case

Start Review: 10:55

Start Polling: 11:00

POLL: We should promise more committee time to pursuing simple statistical sequence algorithms in the standard library, knowing that our time is scarce and this will leave less time for other work.

NO OBJECTION TO UNANIMOUS CONSENT.

Attendance: 15

More discussion happened.

std::mean(v)
std::accumulate(v) / std::distance(v)

POLL: We should promise more committee time to pursuing std::mean in the standard library, knowing that our time is scarce and this will leave less time for other work.

Strongly For Weakly For Neutral Weakly Against Strongly Against
2 3 4 5 1

Attendance: 17

More discussion happened.

More Polling: 11:47

POLL: We should promise more committee time to pursuing convenient versions of std::mode and std::median that return values not positions, require temporary storage, and do not require their input to be sorted, knowing that our time is scarce and this will leave less time for other work.

Strongly For Weakly For Neutral Weakly Against Strongly Against
0 4 7 5 0

Attendance: 17

POLL: We should promise more committee time to pursuing P1708R0, knowing that our time is scarce and this will leave less time for other work.

Strongly For Weakly For Neutral Weakly Against Strongly Against
3 6 3 4 0

Attendance: 17

End: 11:57

Referral to SG6 for numerics review.

Conor Hoekstra and Vincent Reverdy will help the author out with the next revision.

CONSENSUS: Bring a revision of P1708R0, with the guidance below, to LEWGI for further direction review.

  • Come back with convenient versions of std::mode and std::median that return values not positions, require temporary storage, and do not require their input to be sorted, knowing that our time is scarce and this will leave less time for other work.

@brycelelbach brycelelbach added the needs-revision Paper needs changes before it can proceed label Jul 18, 2019
@jensmaurer jensmaurer removed this from the 2019-07 milestone Aug 23, 2019
@NAThompson
Copy link

NAThompson commented Sep 8, 2019

Friends,

Just got a boost version of this up and running:

boostorg/math#248

This implementation (just of the mean, for now) brings a couple things I think would be very useful: Namely, it adds C++17 parallel execution policies as well as the projections from Eric Niebler's ranges library. (I still do not think I've extracted near the full power of the ranges, but perfect is the enemy of the good, as they say.)

As to the comment that mean should be done via std::accumulate(v) / std::distance(v): I think this is not wrong, but suboptimal. See:

Robert F Ling. Comparison of several algorithms for computing sample means and variances. Journal of the American Statistical Association, 69(348): 859–866, 1974

The algorithm in Boost for the mean is also discussed by Higham in "Accuracy and Stability of Numerical Algorithms". I think it's valuable to have since we cannot expect most people to understand why it's a good idea to do this. In addition, the ideas in this algorithm extend to stable methods of computing variance, skewness, and kurtosis, as well as parallelizable, single pass bivariate statistics. See:

Janine Bennett, Ray Grout, Philippe Pébay, Diana Roe, and David Thompson. Numerically stable, single-pass, parallel statistics algorithms. In 2009 IEEE International Conference on Cluster Computing and Workshops, pages 1–8. IEEE, 2009

Once the expectation is that we deploy Bennett's algorithm, we're well beyond what we can expect an average user to do correctly, so I'd say this would be a nice addition to the standard.

@jensmaurer
Copy link
Member

As a general note, this github issue tracker is not for technical discussions, but for paper management / progress tracking only. Please post your technical discussions to the appropriate reflector.

@wg21bot
Copy link
Collaborator Author

wg21bot commented Oct 30, 2019

P1708R1 Simple Statistical Functions (Michael Wong)

@wg21bot wg21bot added this to the 2019-11 milestone Oct 30, 2019
@jensmaurer jensmaurer removed this from the 2019-11 milestone Dec 12, 2019
@wg21bot
Copy link
Collaborator Author

wg21bot commented Jan 18, 2020

P1708R2 Simple Statistical Functions (Michael Wong, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy)

@wg21bot wg21bot added this to the 2020-02 milestone Jan 18, 2020
@jensmaurer jensmaurer removed the needs-revision Paper needs changes before it can proceed label Jan 21, 2020
@jensmaurer jensmaurer added the SG6 Numerics label Jan 31, 2020
@jensmaurer
Copy link
Member

Prague pre-meeting telecon: This needs review by SG6 as well.

@Cpp-Lisa
Copy link
Collaborator

We looked at this in SG6 Monday in Prague, but without the principal author. We felt that the inclusion of the median and mode clouded the interface, but we think there's room for a class used with accumulate to collect the statistical moments, templated on the number of moments to collect.

@brycelelbach
Copy link

Prague 2020-02 LEWGI Minutes

P1708R2 Simple Statistical Functions: Direction Review

Chair: Billy Baker

Champion: Ryan McDougall

Minute Taker: David Olsen

Start Review: 2020-02-11 10:01

Prior art:

  • Older proposal WEB mentioned (add to paper).

Volunteers to help the author revise the proposal/people to contact:

  • Vincent Reverdy
  • Walter Brown
  • Jolanta Opara
  • Conor Hoekstra
  • Eric Nielber (?)

End: 10:08

CONSENSUS: Further revision is needed before LEWGI can review this.

  • Changes requested during previous LEWGI review and SG6 review in Prague need to be incorporated.
  • SG6 should review any changes before LEWGI sees this again.

@brycelelbach brycelelbach added needs-revision Paper needs changes before it can proceed and removed SG19 Machine Learning labels Feb 18, 2020
@jensmaurer jensmaurer removed this from the 2020-02 milestone Feb 18, 2020
@brycelelbach brycelelbach added LEWG Library Evolution EWGI Evolution Incubator B3 - addition Bucket 3 as described by P0592: material that is not mentioned in P0592 TS Ship vehicle: TS numerics-ts-1 and removed LEWGI Library Evolution Incubator labels Aug 25, 2020
@wg21bot
Copy link
Collaborator Author

wg21bot commented Jan 22, 2021

P1708R3 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy)

@wg21bot wg21bot removed the needs-revision Paper needs changes before it can proceed label Jan 22, 2021
@wg21bot wg21bot added this to the 2021-telecon milestone Jan 22, 2021
@brycelelbach brycelelbach added LEWG Library Evolution library-evolution-deferred Ready for review, but should not be scheduled size - medium paper size estimate labels Aug 1, 2021
@jensmaurer jensmaurer modified the milestones: 2021-telecon, 2022-telecon Jan 1, 2022
@wg21bot
Copy link
Collaborator Author

wg21bot commented Mar 21, 2022

P1708R6 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy, Jens Maurer)

@mattkretz
Copy link
Collaborator

POLL: Any objection to unanimous consent to forward a new revision of P1708R6 containing the discussed changes to LEWG?

No objections to unanimous consent.

# of Authors: 2
# of Participants: 6

Design questions raised in SG6 which could be of interest to LEWG:

  • Accumulator types and the stats_(weighted_)accumulate functions have to match (both weighted or both not weighted). Would it be better if a single accumulator type could work with both stats functions? If the weighted type accumulator would be removed, what that would mean, in terms of implementation trade-offs?
  • Is it possible to reduce the number of types if the only difference is the init value (i.e. no algorithmic change)? It reduces consistency of the interface of the types. But the consistency could be restored via derived types that call the base ctor with the correct init constant.

@mattkretz mattkretz added needs-revision Paper needs changes before it can proceed and removed SG6 Numerics labels Apr 14, 2022
@wxinix-2022
Copy link

Any sample implementation for P1708 so far?

@jensmaurer jensmaurer removed this from the 2022-telecon milestone Jan 25, 2023
@wg21bot
Copy link
Collaborator Author

wg21bot commented Feb 20, 2023

P1708R7 Basic Statistics (Richard Dosselmann)

@wg21bot wg21bot removed the needs-revision Paper needs changes before it can proceed label Feb 20, 2023
@wg21bot wg21bot added this to the 2023-telecon milestone Feb 20, 2023
@brycelelbach brycelelbach added ready-for-library-evolution-meeting-review This paper needs to be discussed at a Library Evolution meeting and removed library-evolution-deferred Ready for review, but should not be scheduled TS Ship vehicle: TS numerics-ts-1 labels Mar 25, 2023
@inbal2l inbal2l added the scheduled-for-library-evolution This paper has been scheduled for one of the groups: LEWG, LEWG Incubator, or a Mailing List review label Sep 9, 2023
@wg21bot
Copy link
Collaborator Author

wg21bot commented Dec 19, 2023

P1708R8 Basic Statistics (Richard Dosselmann)

@wg21bot wg21bot modified the milestones: 2023-telecon, 2024-telecon Dec 19, 2023
@wg21bot wg21bot changed the title P1708 Simple Statistics functions P1708 R8 Basic Statistics Dec 19, 2023
@inbal2l inbal2l added IS Ship vehicle: IS size - large paper size estimate and removed scheduled-for-library-evolution This paper has been scheduled for one of the groups: LEWG, LEWG Incubator, or a Mailing List review size - medium paper size estimate labels Dec 19, 2023
@inbal2l inbal2l added C++26 Targeted at C++26 scheduled-for-library-evolution This paper has been scheduled for one of the groups: LEWG, LEWG Incubator, or a Mailing List review labels Feb 16, 2024
@ben-craig
Copy link
Collaborator

2024-03-20 Library Evolution Tokyo

P1708R8: Basic Statistics

2024-03-20 Library Evolution Tokyo Minutes

Champion: Richard Dosselmann
Chair: Ben Craig
Minute Taker: Steve Downey

Summary

POLL: Facilities to compute basic statistics (mean, stddev, etc) belong in the standard library

SF WF N WA SA
14 4 5 0 1

Attendance: 20

# of Authors: 1

Author Position: SF

Outcome: Consensus

Comments:

Next Steps

More LEWG review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B3 - addition Bucket 3 as described by P0592: material that is not mentioned in P0592 C++26 Targeted at C++26 IS Ship vehicle: IS LEWG Library Evolution ready-for-library-evolution-meeting-review This paper needs to be discussed at a Library Evolution meeting scheduled-for-library-evolution This paper has been scheduled for one of the groups: LEWG, LEWG Incubator, or a Mailing List review size - large paper size estimate
Projects
Status: No status
Development

No branches or pull requests

9 participants