Genpact, in collaboration with Envision Racing and MachineHack, is launching a hackathon for data scientists and machine learning professionals called ‘Dare in Reality,’ from November 8 onwards. The hackathon winner will get a chance to win exciting cash prizes and goodies.
Genpact is a global professional services firm that makes business transformation real. With a team of 90,000+ employees, Genpact serves more than 800 clients across 70+ countries. Forbes recently called Genpact out in its list of World’s Best Employers 2021.
With this machine learning hackathon, Genpact aims to see how participants can help the Formula E team improve its performance on the racetrack in the all-electric, international single-seater world championship.
The partnership between Genpact and Envision Racing uses data, innovation, real-time insights, and process excellence to achieve three core goals: boosting the team’s racing performance, climate change, and building and energizing a passionate, socially conscious fan base.
Ready, Steady, Go!
The two-week Dare in Reality Hackathon 2021 is for data science professionals, machine learning engineers, artificial intelligence practitioners, and other tech enthusiasts to showcase their skills and impress the judges. The winners are rewarded with cash prizes.
The challenge starts on November 8, 2021
The Challenge is on…
Besides disrupting motor racing and growing a diverse and global following, Formula E also fights climate change. It is creating a technical and sustainable development testbed that helps countries address mobility and environmental issues.
In each race, 12 teams – each with two drivers – compete in electric, battery-powered cars. Every team is focused on securing the best competitive advantage to cross the finish line first.
Formula E cars have evolved through years of research. And for a team to win, it needs a combination of driver skill and data analytics. But, Several factors affect their performance during a session, including:
- Weather
- Rain
- Wind
- Track temperature
- Ambient temperature
- Track evolution: The way the track changes during and between sessions. As the cars leave rubber and debris on the tracks and weather affects the conditions of the driving environment – ambient temperature, humidity, rainfall, etc. – the track evolves and the time taken to complete a lap changes (increases or decreases based on positive or negative track evolution).
- Driver’s familiarity with the track
So, in this hackathon, participants should use machine learning to predict lap times for a specific Formula E session – the qualifying session that determines where each car will be positioned at the start of the race.
Submission Guidelines
Sklearn models support the predict() method to generate the predicted values.
You should submit a .csv file with exactly 1957 rows with 1 column(LAP_TIME). Your submission will return an Invalid Score if you have extra columns or rows. The file should have exactly 1 column.
Note: Do not shuffle the sequence of the test series
For participants using Pandas:
submission_df.to_csv(‘my_submission_file.csv’, index=False)
Evaluation Criteria:
The submission will be evaluated using the RMSLE metric. Participants can use np.sqrt(mean_squared_log_error(actual, predicted)) to calculate the same
- This hackathon supports private and public leaderboards
- The public leaderboard is evaluated on 30% of Test data
- The private leaderboard will be made available at the end of the hackathon, which will be evaluated on 100% of Test data
- The Final Score represents the score achieved based on the Best Score on the public leaderboard
Prizes:
- First Prize: $7000
- Second Prize: $5000
- Third Prize: $300
- Fourth Prize: iPad Air 128 GB
- Fifth Prize: Apple AirPods
The challenge ends on November 22, 2021
Dataset Description:
- Train.csv – 10276 rows x 25 columns (includes target column as LAP_TIME)
- Test.csv – 420 rows x 25 columns
- Sample Submission.csv — Please check the “Evaluation” section on MachineHack Page for more details on generating a valid submission.
Attribute Description:
- Number
- Driver number
- Lap number
- Lap time
- Lap improvement
- Crossing finish line in pit
- Session 1 (S1)
- S1 improvement
- Session 2 (S2)
- S2 improvement
- Session 3 (S3)
- S3 improvement
- Kph
- Driver name
- Pit time
- Class
- Qualifying group
- Team
- Manufacturer
- Power
- Location
- Event
Input data:
- Data from the practice and qualifying sessions
- Data tables:
- Timing Data: The standings for each driver in a race, lap-by-lap data for each driver for free practice (FP) 1, free practice (FP) 2 and qualifying session (Q)
- Weather: how weather conditions changed during a session
Output:
- Provided data for FP1 and FP2 to build a model to predict the lap times for qualifying sessions
- For the test data, we will provide FP1, FP2 data (timing data and weather). Predict the lap times for qualifying sessions
Skills:
- Multivariate Regression
- Big dataset, underfitting vs overfitting
- Optimizing RMSLE to generalize well on unseen data
Other details:
Situation and Race Conditions
- All participants have the same car and battery/power to consume during the session
- The race runs for 45 mins (fixed) plus one additional lap
- The driver who crosses the finish line first wins the race
- If there are crashes or problems on race day, the safety car enters the track until everything is safe. All drivers must stay behind the safety car, stay below 50kph, and cannot overtake
- If the safety car comes on, there will be fewer laps as the race time is fixed
- For practice sessions, each driver can complete one lap in 250 kW, one lap in 235 kW and all remaining laps in 200 kW
- For qualifying sessions, each driver can complete a maximum of 3 laps, where only one lap can be in 250 kW, and all others need to be 200 kW
- For race sessions, laps are energy limited, as opposed to power limited
- In practice sessions, when drivers complete multiple adjacent laps with similar lap times, they generally practice race laps and, therefore, energy-limited laps
- Standard session durations are as follows:
- FP1: 45 minutes
- FP2: 30 minutes
- Q1 to Q4: 4 minutes per group
- Super pole: 20 minutes maximum
- Race: 45 minutes plus one lap
Suggested Approach:
- Data Preparation: Build a master data frame combining tables from different sessions, drivers, qualifying, and practice races. Join with weather data to get the final data set
- Explore data to understand the significant variables and generate derived variables if needed
- Build a model to predict the lap times for each qualifying lap using the provided test data for a session
Championship Basic structure
- Twenty-four drivers (two per team) compete in each Formula E race
- Each driver ranks separately, but their total scores give the overall team ranking
- There are 12 races in a season held over 8 months
- There are 2 free practice sessions before a race
- After the practice sessions, qualifying sessions run for a very limited number of laps to decide the pole position (who starts in the first row and then the following positions). A super pole can follow multiple qualifying sessions to decide the driver standings at the race start
- The race runs for 45 minutes plus one lap