AutoML your way off the Titanic

ML.NET looks like the easiest solution to get started with machine learning in .NET since numl.
The visual Model Builder makes ML completely dummy proof for non-data scientists by way of “automated ML” or AutoML for short. I decided to test drive the command line version on the Kaggle starter Titanic challenge.


I assume you have installed everything to get started with ML.NET on the command line. Plus you have registered on Kaggle and joined the Titanic challenge.

One simple command

mlnet auto-train --task binary-classification --dataset "train.csv" --label-column-name Survived --ignore-columns PassengerId
  • –task binary-classification
    • We’re trying to predict whether a passenger either survived, or did not survive, so this is binary classification
  • –dataset “train.csv”
  • –label-column-name Survived
    • The survived column labels the rows
  • –ignore-columns PassengerId
    • In the true spirit of AutoML I’m assuming we haven’t done any exploratory research of the data, but I don’t want to make it harder than necessary either. When previewing the data on Kaggle we can safely assume PassengerId are unique identifiers that do not give any insight into whether a passenger might have survived or not

The command will run for 30 minutes, unless you specify –max-exploration-time. More time, means more time to fine tune the resulting model, which could lead to better results.

This is what I assume happens automatically, roughly:

  1. Select a binary classification algorithm
  2. Train the model on 80% of the given training data
    1. This means tuning the selected algorithm until for the given data, (most) of the results match the matching Survived column
    2. Possibly it uses 100% of the training and data and performs cross validation
  3. Test the resulting model on the remaining 20% of the training data
    1. The model hasn’t seen this data, so this is a true test on the quality of the tuned algorithm
  4. Repeat step 1-3 with the other binary classification algorithm available in ML.NET
  5. Compare the results of steps 3 and pick the best model


|                                              Top 5 models explored                                             |
|     Trainer                              Accuracy      AUC    AUPRC  F1-score  Duration #Iteration             |
|1    LinearSvmBinary                        0,8462   0,8937   0,9075    0,8182       0,5          5             |
|2    SgdCalibratedBinary                    0,8462   0,8671   0,8678    0,8125       0,5         24             |
|3    SgdCalibratedBinary                    0,8462   0,8664   0,8637    0,8125       0,7         72             |
|4    LinearSvmBinary                        0,8462   0,8625   0,8656    0,8125       0,9         73             |
|5    SgdCalibratedBinary                    0,8462   0,8671   0,8649    0,8125       0,7         75             |
Generated trained model for consumption: ...\SampleBinaryClassification\SampleBinaryClassification.Model\    
Generated C# code for model consumption: ...\SampleBinaryClassification\SampleBinaryClassification.ConsoleApp
Check out log file for more information: ...\SampleBinaryClassification\logs\debug_log.txt

It says LinearSvmBinary gave the best bang for the buck, and it saved the model in

We can run this model on test.csv and submit the results to Kaggle for scoring, we just need to apply a few code changes:

  1. Open the solution SampleBinaryClassification.sln
  2. Update train.csv to test.csv in SampleBinaryClassification.ConsoleApp.Program.DATA_FILEPATH
  3. Copy ModelInput to TestModelInput and remove property Survived, renumber the LoadColumn attributes accordingly
    1. You’re with me right, test.csv doesn’t have column Survived and we need the model to match the data we’re loading
  4. Add property public float PassengerId { get; set; } to ModelOutput, this is required for the Kaggle submission
  5. Because the trained model expects ModelInput, we need to transform TestModelInput back, plus we want to perform multiple predictions, update your Program like so:
    1.         static void Main(string[] args)
                  MLContext mlContext = new MLContext();
                  var data = mlContext.Data.LoadFromTextFile<TestModelInput>(DATA_FILEPATH,
                      separatorChar: ',', hasHeader: true, allowQuoting: true);
                  var modelInputs = mlContext.Data.LoadFromEnumerable(
                      mlContext.Data.CreateEnumerable(data, true).Select(p => new ModelInput
                          PassengerId = p.PassengerId,
                          Pclass = p.Pclass,
                          Name = p.Name,
                          Sex = p.Sex,
                          Age = p.Age,
                          SibSp = p.SibSp,
                          Parch = p.Parch,
                          Ticket = p.Ticket,
                          Fare = p.Fare,
                          Cabin = p.Cabin,
                          Embarked = p.Embarked
                  var predictionPipeline = mlContext.Model.Load(MODEL_FILEPATH, out DataViewSchema predictionPipelineSchema);
                  var predictions = predictionPipeline.Transform(modelInputs);
                  var survivalPredictions = mlContext.Data.CreateEnumerable(predictions, reuseRowObject: false);
                      new string[] { "PassengerId,Survived" }
                      .Concat(survivalPredictions.Select(p => $"{p.PassengerId},{(p.Prediction ? 1 : 0)}")));
                  Console.WriteLine("=============== End of process, hit any key to finish ===============");

Run the program and find kaggle_submission.csv in the bin folder, submit to Kaggle.

This gives me a score of 0.78468, which is not my best score so far (0.80382), but not bad for a command and a few code changes I think.

Theme music for this blog post

AutoML your way off the Titanic

RTFAQ – Azure App Service request timeout limit

I had to do a one-time POST to an Azure App Service, to trigger some post-release task, which would be performed on that request thread.
The task could take long, but because this was a one-time thing, setting up a mechanism to perform background processing properly, wasn’t worth it.

Everything worked fine on my machine, until it had to run on Azure. The post-release request got aborted after a couple of minutes, while the server kept processing the request.

Increasing the HttpClient.Timeout property didn’t help.

After finding the right keywords to search this problem, it turns out the explanation was hiding in plain sight in the FAQ all along:

Why does my request time out after 230 seconds?

Azure Load Balancer has a default idle timeout setting of four minutes.

This is a hard limit you can’t exceed, I guess Azure does this to protect its environment or at least have a theoretical limit they can build on to know when to scale/trigger ddos protection.

Theme music for this blog post


RTFAQ – Azure App Service request timeout limit

Proxy as a service

Azure Functions is a lightweight way to quickly expose an API, the “serverless” way.

Now I wanted to proxy another API to clients (to avoid having to expose the API key and have a future extension point), so I reached for Azure Functions again. Just create an HTTP trigger to act as the proxy and forward calls to the other API, right?

Apparently this use case is already handled by Azure Functions Proxies

You literally just declare the proxy and it works, nice.
Add a proxies.json in the root of your Azure Functions app (assuming you are programming it, if you use the portal, just go to the Proxies node and follow the self -explanatory UI)

  "$schema": "",
  "proxies": {
    "name_your_proxy": {
      "matchCondition": {
        "methods": [ "POST" ],
        "route": "/api/proxy_call"
      "backendUri": "",
      "requestOverrides": {
        "backend.request.headers.example-secret": "dummy-secret" 

Publish this and you can call your Azure Functions app on /api/proxy_call, which forwards the call to while adding the header with the secret you don’t want to expose to your clients. Simple!

Theme music for this blog post

Proxy as a service

target _blank security issue

I think this is an old issue, but I only just learned about it via a lint error react/jsx-no-target-blank I got in a react project.

Apparently a new window opened by a link, has access to the originating window object.
When this new window is another (malicious) site, it has access to the dom of the site that linked it.

Adding rel=”noopener noreferrer” to your anchor tag mitigates this risk.

Check this much better quick decent explanation:

Theme music for this blog post

target _blank security issue

EF6 VS EF Core inheritance

Lost some time with this subtle gotcha, so maybe this will help you spare time.

In EF6 you can configure inheritance with a single type.

An inheritance hierarchy with a single type might seem pointless, but I saw this get used to automatically filter data, which I found pretty clever (to ignore records marked as deleted, the discriminator column was used to only map records with deleted set to false)

Say you want to map SomeEntity only to rows that have value “SomeEntityType” in the field Discriminator:

public class SomeContext : DbContext
   protected override void OnModelCreating(DbModelBuilder modelBuilder)
         .Map(m => m.Requires("Discriminator")

Now, transporting the EF6 implementation as is to EF Core syntax won’t work:

// EF Core: incorrect

However the gotcha is, that it won’t work, but it won’t crash or warn you either. It will just return all SomeEntities, ignoring the discriminator. Nothing gets added to the WHERE clause in the generated T-SQL.

I spend some time troubleshooting this, until I carefully read the docs again:

EF will only setup inheritance if two or more inherited types are explicitly included in the model

So in the incorrect mapping, the base type gets setup but nothing else, so no discriminator gets applied (it just ignores the faulty configuration apparently).

You actually need have a separate base and derived type, and map them accordingly, like so:

// EF Core: correct

A bit more verbose, but in normal inheritance scenarios you would have had the base type anyway.

Theme music for this blog post

EF6 VS EF Core inheritance

LINQ All: it’s not all true

Pop quiz!

Will the program below throw an exception when executed?

using System;
using System.Collections.Generic;
using System.Linq;
class Review
public int Score { get; set; }
class Program
static void Main(string[] args)
var reviews = new List<Review>();
if (reviews.All(r => r.Score < 0))
throw new Exception("Alert manager");

view raw
hosted with ❤ by GitHub

Now try running itΒ to see the result, you might be surprised πŸ™‚

Not intuitive, now is it.


LINQ All: it’s not all true