Proving The Lottery Ticket Hypothesis

The "lottery ticket hypothesis" has showed the existence of such sparse subnetworks at initialization. Provided a fully-connected initialized architecture, our aim is to obtain such "winning ticket" networks, devoid of any coaching data. We initial show the advantages of forming input-output paths, over pruning person connections, to keep away from bottlenecks in gradient propagation. Then, we show that Paths with Greater Edge-Weights at initialization have greater loss gradient magnitude, resulting in more effective instruction.
