Predatory Profits Of “High-Conflict” Divorces

On this page

For a justification of this blog, please consider the following chain of conjectures (referenced “lawyers” would be the divorce lawyer types):

our common law is adversarial, the winner simply takes all
a family, by definition, is not adversarial, as fundamentally both parents love their children
however, with no adversarial conflict, there is no lawsuit and no profits for lawyers
thus our common law applied to families must first turn a family into adversaries
by definition, either unsolved conflicts or perceived lack of resources create adversaries
moreover, sustainably intractable conflicts guarantee adversaries for life or “high-conflict”
however, with no money, i.e. no possible profits for lawyers, there simply cannot be “high-conflicts”
“high-conflict” cases thus are an ambitious, i.e. ruthless, lawyer’s “gold mines” and job-security
lawyers are in overabundance and competition is fierce, as one only needs to be a malicious actor
however, with no “high-conflict”, there are no trendsetting, “interesting” cases
with no trendsetting, there is no ~$500 / hour billing rate for ruthless, and narcissist, “top lawyers”

Accepting the above chain of faultless logic, what can a deeply narcissist divorce lawyer do?

in cases lacking conflicts, he has only one choice: provoke or flat-out fabricate a conflict by blatantly lying, specifically for the Family Court’s eager consumption
if he “leaves money on the table”, and neglects exploiting lucrative cases he has already hooked-onto, he will go hungry with everyone watching!

Direct contradictions and conflicts

In this blog we focus on directly fabricated conflicts, or flat-out, knowingly stated lies to the Family Courts by our lawyers. We are aided by the strict rules of the Court, as all meaningful communication must already be in (or should be convertible to) textual English “inputs”.

Our first goal is to train our computer to “catch the knowingly and directly lying lawyer” by systematically finding direct, irrefutable textual contradictions in all of a lawyer’s communications.

Current state-of-the-art NLP research (see “Attention Is All You Need”) has shown that the various proposed mechanism for answering generic semantic correctness questions are exceedingly promising. We use them to train our elementary arithmetic model in telling us if a simple mathematical expression is correct or not.

Please note that we have too much accumulated code to load into our notebook. For your reference, we sample from the attached local source files: datasets.py, layers.py, main.py, modules.py, samples.py, tasks.py, trafo.py and utils.py.

Synthesized samples

The Samples class randomly generates samples from an on-demand allocated pool of values. The size of the pool is set by the dim_pool param and using it with large values helps with keeping the probability distributions in check.

Currently, Samples can generate a variety of 10 different groups of samples. In this blog we focus on yes-no (YNS), masked (MSK), reversed (REV) and faulty (FIX) samples.

A simple sample generating loop can be as follows:

import samples as qs

groups = tuple('yns ynx msk msx cls clx qas rev gen fix'.split())

YNS, YNX, MSK, MSX, CLS, CLX, QAS, REV, GEN, FIX = groups

def sampler(ps):
    ss = qs.Samples(ps)
    for _ in range(ps.num_samples):
        ss, idx = ss.next_idx
        enc, res, *_ = ss.create(idx)
        dec = tgt = f'[{res}]'
        bad = f'[{ss.other(res)}]'
        yn = ss.yns[0, idx]

        d2 = dec if yn else bad
        yns = dict(enc=enc, dec=d2 + '|_', tgt=d2 + f'|{yn}')

        yield {YNS: yns}

The generated samples are Python dicts with the previously introduced enc (encoder), dec (decoder) and tgt (target) features.

Both dec and tgt features end the sample with | and the yes-no answer is encoded as 1 and 0 (the _ is the place-holder that the decoder needs to solve).

And now we can generate a few samples:

import utils as qu

ps = dict(
    dim_pool=3,
    max_val=100,
    num_samples=4,
)
ps = qu.Params(**ps)

for d in sampler(ps):
    print(f'{d[YNS]}')

  {'enc': 'x=81,y=11;x+y', 'dec': '[10]|_', 'tgt': '[10]|0'}
  {'enc': 'y=-99,x=-58;x+y', 'dec': '[-157]|_', 'tgt': '[-157]|1'}
  {'enc': 'x=13,y=-79;y-x', 'dec': '[-92]|_', 'tgt': '[-92]|1'}
  {'enc': 'y=-33,x=-30;y+x', 'dec': '[-96]|_', 'tgt': '[-96]|0'}

While we don’t show any of the other samples in this blog, the MSK features mask the results at random positions with a ?, the REV samples mix up the order of the variables and FIX samples randomly introduce an error digit in the results.

Model definition

The actual model is largely similar to the models already presented in the previous blogs.

Based on what group of samples we are using, we activate some layers in the model while ignoring others.

A significant consideration is that all the 10 groups of samples contribute (if meaningful) to the same weights (or variables).

We chose to do this this based on the results of the MT-DNN paper. Varying the type and challenge of the samples we effectively cross-train the model.

In order to clearly separate the loss and metric calculations between the groups, we create a new instance of our model for each group of samples. However, we reuse the same layers.

To accomplish this, we define an lru_cache function:

import functools

@functools.lru_cache(maxsize=32)
def layer_for(cls, *pa, **kw):
    return cls(*pa, **kw)

And now, our usual model_for function looks as follows:

def model_for(ps, group):
    x = inputs
    y = layer_for(ql.ToRagged)(x)
    yt = layer_for(ql.Tokens, ps)(y)
    ym = layer_for(ql.Metas, ps)(y)
    xe, xd = yt[:2] + ym[:1], yt[2:] + ym[1:]
    embed = layer_for(ql.Embed, ps)
    ye = layer_for(ql.Encode, ps)(embed(xe))[0]
    decode = layer_for(ql.Decode, ps)
    if group in (qs.YNS, qs.YNX):
        y = decode(embed(xd) + [ye])
        y = layer_for(ql.Debed, ps)(y)
    elif group in (qs.MSK, qs.MSX):
        y = layer_for(ql.Deduce, ps, embed, decode)(xd + [ye])
    if group in (qs.QAS, qs.FIX):
        y = decode(embed(xd) + [ye])
        y = layer_for(ql.Locate, ps, group)(y)
    m = Model(name='trafo', inputs=x, outputs=[y])
    m.compile(optimizer=ps.optimizer, loss=ps.loss, metrics=[ps.metric])
    print(m.summary())
    return m

Expanded parameters

As we have expanded the functionality of our layers and modules from the previous blogs, our params have increased in number.

Also, the subsequent blogs in this section will describe the additions to the model’s extended functionality.

import tensorflow as tf
import datasets as qd
ks = tf.keras

params = dict(
    activ_concl='gelu',
    dim_attn=4,
    dim_attn_qk=None,
    dim_attn_v=None,
    dim_batch=5,
    dim_concl=150,
    dim_hidden=6,
    dim_hist=5,
    dim_metas=len(qd.metas),
    dim_stacks=2,
    dim_vocab=len(qd.vocab),
    drop_attn=None,
    drop_concl=None,
    drop_hidden=0.1,
    initer_stddev=0.02,
    loss=ks.losses.SparseCategoricalCrossentropy(from_logits=True),
    metric=ks.metrics.SparseCategoricalCrossentropy(from_logits=True),
    num_epochs=2,
    num_heads=3,
    num_rounds=2,
    num_shards=2,
    optimizer=ks.optimizers.Adam(),
    width_dec=40,
    width_enc=50,
)

params.update(
    loss=qu.Loss(),
    metric=qu.Metric(),
)

And this is our new main function that loops through all the samples in all of our groups of samples and either trains the model on the samples or performs an evaluation/prediction.

In the follow-on blogs we present the various training/eval/predict functions that our main loop can use:

def main(ps, fn, groups=None, count=None):
    qu.Config.runtime.is_training = True
    groups = groups or qs.groups
    for r in range(ps.num_rounds):
        for g in groups:
            print(f'\nRound {r + 1}, group {g}...\n=======================')
            fn(ps, qd.dset_for(ps, g, count=count), model_for(ps, g))

Generating samples

Before we start a training session, we need to generate some samples.

The code that generates the samples is similar to the following.

The large datasets generate 100 shards containing each 10,000 samples for every every sample group out of the current 10.

The total number of samples for the large dataset can be easily varied, however, with the pictured settings, it amounts to 10 million samples that a server with 40 hyper-threads generates in about 3 hours.

ds_small = dict(
    dim_batch=5,
    dim_pool=10,
    max_val=1000,
    num_samples=20,
    num_shards=2,
)

ds_large = dict(
    dim_batch=1000,
    dim_pool=1024 * 1024,
    max_val=100000,
    num_samples=10000,
    num_shards=100,
)

def dump_ds(kind):
    ps = qu.Params(**(ds_small if kind == 'small' else ds_large))
    ss = [s for s in qd.dump(ps, f'/tmp/q/data/{kind}')]
    ds = qd.load(ps, shards=ss).map(qd.adapter)
    for i, _ in enumerate(ds):
        pass
    print(f'dumped {i + 1} batches of {ps.dim_batch} samples each')

And here is an actual call to generate our small sample set:

!python main.py

  dumping /tmp/q/data/small/cls/shard_0000.tfrecords...
  dumping /tmp/q/data/small/msk/shard_0000.tfrecords...
  dumping /tmp/q/data/small/yns/shard_0000.tfrecords...
  dumping /tmp/q/data/small/qas/shard_0000.tfrecords...
  dumping /tmp/q/data/small/clx/shard_0000.tfrecords...
  dumping /tmp/q/data/small/msx/shard_0000.tfrecords...
  dumping /tmp/q/data/small/ynx/shard_0000.tfrecords...
  dumping /tmp/q/data/small/rev/shard_0000.tfrecords...
  dumping /tmp/q/data/small/gen/shard_0000.tfrecords...
  dumping /tmp/q/data/small/yns/shard_0001.tfrecords...
  dumping /tmp/q/data/small/msk/shard_0001.tfrecords...
  dumping /tmp/q/data/small/cls/shard_0001.tfrecords...
  dumping /tmp/q/data/small/fix/shard_0000.tfrecords...
  dumping /tmp/q/data/small/ynx/shard_0001.tfrecords...
  dumping /tmp/q/data/small/clx/shard_0001.tfrecords...
  dumping /tmp/q/data/small/msx/shard_0001.tfrecords...
  dumping /tmp/q/data/small/qas/shard_0001.tfrecords...
  dumping /tmp/q/data/small/gen/shard_0001.tfrecords...
  dumping /tmp/q/data/small/rev/shard_0001.tfrecords...
  dumping /tmp/q/data/small/fix/shard_0001.tfrecords...
  dumped 80 batches of 5 samples each

Training session

Now we are ready to run a short training session:

!python trafo.py

  Round 1, group yns...
  =======================
  Epoch 1/2
  2/2 [==============================] - 9s 4s/step - loss: 3.2370 - metric: 3.2364
  Epoch 2/2
  2/2 [==============================] - 0s 84ms/step - loss: 3.2212 - metric: 3.2209
  Round 1, group msk...
  =======================
  Epoch 1/2
  2/2 [==============================] - 32s 16s/step - loss: 3.2135 - metric: 3.2134
  Epoch 2/2
  2/2 [==============================] - 0s 119ms/step - loss: 3.2034 - metric: 3.2032
  Round 1, group qas...
  =======================
  Epoch 1/2
  2/2 [==============================] - 7s 4s/step - loss: 3.4434 - metric: 3.4434
  Epoch 2/2
  2/2 [==============================] - 0s 82ms/step - loss: 2.7450 - metric: 2.7450
  Round 2, group yns...
  =======================
  Epoch 1/2
  2/2 [==============================] - 7s 4s/step - loss: 3.2059 - metric: 3.2070
  Epoch 2/2
  2/2 [==============================] - 0s 79ms/step - loss: 3.1923 - metric: 3.1935
  Round 2, group msk...
  =======================
  Epoch 1/2
  2/2 [==============================] - 29s 14s/step - loss: 3.1887 - metric: 3.1887
  Epoch 2/2
  2/2 [==============================] - 0s 130ms/step - loss: 3.1745 - metric: 3.1744
  Round 2, group qas...
  =======================
  Epoch 1/2
  2/2 [==============================] - 10s 5s/step - loss: 1.9412 - metric: 1.9412
  Epoch 2/2
  2/2 [==============================] - 0s 89ms/step - loss: 1.3604 - metric: 1.3604

And this concludes our blog, please click on the next blog for more detail.